I have two boincs clients on the same computer

Message boards : Number crunching : I have two boincs clients on the same computer
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Pag
   

Send message
Joined: 21 Mar 17
Posts: 3
Credit: 426,449
RAC: 0
Message 4954 - Posted: 13 Apr 2017, 20:24:05 UTC

"C:\Program Files\BOINC\boinc.exe" --allow_multiple_clients -dir c:\programdata\boinc_2 -gui_rpc_port 9999"

But wuprop does not recognize this second client.
ID: 4954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
       
Avatar

Send message
Joined: 28 Mar 10
Posts: 588
Credit: 1,220,883
RAC: 237
Message 4955 - Posted: 13 Apr 2017, 23:15:27 UTC

See this thread Multi Clients, it may be of some help.

Conan
ID: 4955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sergey Kovalchuk
         

Send message
Joined: 27 Oct 15
Posts: 201
Credit: 1,649,160
RAC: 0
Message 4956 - Posted: 14 Apr 2017, 5:39:48 UTC - in response to Message 4954.  
Last modified: 14 Apr 2017, 5:41:20 UTC

"C:\Program Files\BOINC\boinc.exe" --allow_multiple_clients -dir c:\programdata\boinc_2 -gui_rpc_port 9999"

c:\programdata\boinc_2\projects\wuprop.boinc-af.org\app_config.xml:

<app_config>
<app_version>
<app_name>data_collect_v4</app_name>
<plan_class>nci</plan_class>
<avg_ncpus>0.01</avg_ncpus>
<cmdline>-p 9999</cmdline>
</app_version>
</app_config>
ID: 4956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Pag
   

Send message
Joined: 21 Mar 17
Posts: 3
Credit: 426,449
RAC: 0
Message 4957 - Posted: 14 Apr 2017, 14:26:51 UTC

Yes ! I added app_config.xml in
c:\programdata\boinc_2\projects\wuprop.boinc-af.org\app_config.xm

It work well !
ID: 4957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5860 - Posted: 30 Dec 2017, 0:32:43 UTC

I'm trying this again but it's not working this time.

Errors:
"12/29/2017 6:10:40 PM | WUProp@Home | Your app_config.xml file refers to an unknown application 'data_collect_v4'. Known applications: None"


And:
"12/29/2017 6:10:56 PM | WUProp@Home | Not accepting requests from this host"
ID: 5860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile STE\/E
             
Avatar

Send message
Joined: 28 Mar 10
Posts: 671
Credit: 3,989,441
RAC: 686
Message 5861 - Posted: 30 Dec 2017, 10:22:25 UTC - in response to Message 5860.  

And:
"12/29/2017 6:10:56 PM | WUProp@Home | Not accepting requests from this host"


The Project Administrator is Blocking the Second BOINC Client so you don't get to many Hr's ...
ID: 5861 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5862 - Posted: 30 Dec 2017, 13:31:03 UTC - in response to Message 5861.  
Last modified: 30 Dec 2017, 13:36:09 UTC

The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files.


Guess I need to ask [AF>WildWildWest]Sebastien for help?
ID: 5862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5863 - Posted: 30 Dec 2017, 13:31:06 UTC - in response to Message 5861.  
Last modified: 30 Dec 2017, 13:31:33 UTC

(accidental double post)
ID: 5863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin
       

Send message
Joined: 22 Aug 16
Posts: 447
Credit: 2,091,759
RAC: 699
Message 5864 - Posted: 30 Dec 2017, 13:39:00 UTC

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.
ID: 5864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5865 - Posted: 30 Dec 2017, 14:38:44 UTC - in response to Message 5864.  
Last modified: 30 Dec 2017, 14:42:35 UTC

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.



A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void.

The split into two clients is very necessary to avoid lost work.
ID: 5865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin
       

Send message
Joined: 22 Aug 16
Posts: 447
Credit: 2,091,759
RAC: 699
Message 5867 - Posted: 30 Dec 2017, 18:45:07 UTC - in response to Message 5865.  

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.



A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void.

The split into two clients is very necessary to avoid lost work.


Oh checkpoints, I read it wrong. Not sure a 2nd client is needed for that.
ID: 5867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5868 - Posted: 30 Dec 2017, 19:44:47 UTC - in response to Message 5867.  
Last modified: 30 Dec 2017, 19:49:15 UTC

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.



A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void.

The split into two clients is very necessary to avoid lost work.


Oh checkpoints, I read it wrong. Not sure a 2nd client is needed for that.


I'm sure.
Unless you know of a way to shutdown boinc.exe that forces all WU to create checkpoints even though the project doesn't support them.

I tried running the project in a VM and it's RAC is 1/4th that of raw so separate client is the best solution. The project remains in RAM and calculating while maintenance is performed on the other projects and 96 to 192 CPU hours are not wasted.
ID: 5868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HAL9000
 
Avatar

Send message
Joined: 14 Dec 11
Posts: 39
Credit: 6,534,689
RAC: 280
Message 5869 - Posted: 30 Dec 2017, 22:50:57 UTC - in response to Message 5862.  

The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files.


Guess I need to ask [AF>WildWildWest]Sebastien for help?

Instead of restarting BOINC you can have it reread the config files to pickup any changes.

Do you intend to put the project with long checkpoints in its own instance of BOINC and then have your other projects on the host in another instance?
That is easily doable. You just have to make sure you have properly limited the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors.
A host with 16 processors can not run more than 16 CPU tasks at once across any number of instances. Otherwise a WUprop server rule will ignore one, or all, of the clients running on that host.
ID: 5869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5870 - Posted: 30 Dec 2017, 23:24:50 UTC - in response to Message 5869.  
Last modified: 30 Dec 2017, 23:48:48 UTC

the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors.


Is WUProps sophisticated enough to count the number of cores used by the multicore Virtual Boxes?

Instead of restarting BOINC you can have it reread the config files to pickup any changes.


That's nice to know (I'm forgetting things. I used to know that.) but it doesn't clear up the "Environment needs to be cleaned up" and 'Postponed" error messages of the Virtual Box WU's where I have to suspend all work and kill the vboxservice to clean up the errors.

Known defect in LHC WU's that they won't address.

Keeping the client separate is still the best solution for this reason.
ID: 5870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin
       

Send message
Joined: 22 Aug 16
Posts: 447
Credit: 2,091,759
RAC: 699
Message 5871 - Posted: 31 Dec 2017, 0:11:32 UTC

I still don't get why you're shutting down a client.
ID: 5871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5872 - Posted: 31 Dec 2017, 0:18:21 UTC - in response to Message 5871.  
Last modified: 31 Dec 2017, 0:27:58 UTC

I still don't get why you're shutting down a client.


but it doesn't clear up the "Environment needs to be cleaned up" and 'Postponed" error messages of the Virtual Box WU's where I have to suspend all work and kill the vboxservice to clean up the errors.

Known defect in LHC WU's that they won't address.


I have to shut down each client and restart Virtual Box minimum every 3 days.

Message discussion here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4526
ID: 5872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5873 - Posted: 31 Dec 2017, 17:56:11 UTC - in response to Message 5871.  

Encountered another situation today.

Moving from 2 core to 1 core WU on LHC VM's.

Core count is virtual in the Boincmgr until the actual VM instance is created and started.

Changed the core count and RAM size in the app_config, told Boincmgr to read the config file and even restarted Boincmgr without stopping the boinc.exe client, but Boincmgr continued to consider the unstarted LHC WU's to still be 2 core instances.

Only after shutting down the boinc.exe client and restarting did Boincmgr properly register the core count of the VM WU's.
ID: 5873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HAL9000
 
Avatar

Send message
Joined: 14 Dec 11
Posts: 39
Credit: 6,534,689
RAC: 280
Message 5874 - Posted: 1 Jan 2018, 8:37:47 UTC - in response to Message 5873.  
Last modified: 1 Jan 2018, 8:38:00 UTC

Encountered another situation today.

Moving from 2 core to 1 core WU on LHC VM's.

Core count is virtual in the Boincmgr until the actual VM instance is created and started.

Changed the core count and RAM size in the app_config, told Boincmgr to read the config file and even restarted Boincmgr without stopping the boinc.exe client, but Boincmgr continued to consider the unstarted LHC WU's to still be 2 core instances.

Only after shutting down the boinc.exe client and restarting did Boincmgr properly register the core count of the VM WU's.

There is a known issue with BOINC not updating the display of resources used by tasks when values are changed in app_config until after it restarts. Even though it is using the new settings.
For example changing the settings
from
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
to
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>
</gpu_versions>
in order to run 2 tasks per GPU will continue to display Running (1 CPU + 1 GPU) with 2 tasks running instead of displaying Running (0.5 CPU + 0.5 GPU)

That could be the same issue you were seeing. Unless there is a separate issue where app_config settings are not correctly applied to VMs after being read.
ID: 5874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,270,748
RAC: 342
Message 5875 - Posted: 2 Jan 2018, 16:22:08 UTC - in response to Message 5874.  
Last modified: 2 Jan 2018, 16:22:50 UTC

A couple of the new WU that showed up in Virtual Box manager were actually 1 core instead of 2, so it was a reporting error in BoincMgr as you proffer. Although I can't be certain it's the same reporting bug, it appears to be.

Too annoying, and not acceptable behavior, while I try and hone in on the optimal settings on the machines.

Keeping the optimized work units separate from a client of work units currently under evaluation is yet another reason to keep the data directories split.
ID: 5875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : I have two boincs clients on the same computer

©2024 Sébastien