I have two boincs clients on the same computer

log in

Advanced search

Message boards : Number crunching : I have two boincs clients on the same computer

Author Message
Profile Pag
   
Send message
Joined: 21 Mar 17
Posts: 3
Credit: 426,449
RAC: 0
Total hours: 559,545
Message 4954 - Posted: 13 Apr 2017, 20:24:05 UTC

"C:\Program Files\BOINC\boinc.exe" --allow_multiple_clients -dir c:\programdata\boinc_2 -gui_rpc_port 9999"

But wuprop does not recognize this second client.

Profile Conan
       
Avatar
Send message
Joined: 28 Mar 10
Posts: 565
Credit: 1,173,713
RAC: 186
Total hours: 3,241,113
Message 4955 - Posted: 13 Apr 2017, 23:15:27 UTC

See this thread Multi Clients, it may be of some help.

Conan
____________

Sergey Kovalchuk
         
Send message
Joined: 27 Oct 15
Posts: 201
Credit: 1,648,308
RAC: 4
Total hours: 2,703,047
Message 4956 - Posted: 14 Apr 2017, 5:39:48 UTC - in response to Message 4954.
Last modified: 14 Apr 2017, 5:41:20 UTC

"C:\Program Files\BOINC\boinc.exe" --allow_multiple_clients -dir c:\programdata\boinc_2 -gui_rpc_port 9999"

c:\programdata\boinc_2\projects\wuprop.boinc-af.org\app_config.xml:

<app_config>
<app_version>
<app_name>data_collect_v4</app_name>
<plan_class>nci</plan_class>
<avg_ncpus>0.01</avg_ncpus>
<cmdline>-p 9999</cmdline>
</app_version>
</app_config>

Profile Pag
   
Send message
Joined: 21 Mar 17
Posts: 3
Credit: 426,449
RAC: 0
Total hours: 559,545
Message 4957 - Posted: 14 Apr 2017, 14:26:51 UTC

Yes ! I added app_config.xml in
c:\programdata\boinc_2\projects\wuprop.boinc-af.org\app_config.xm

It work well !

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5860 - Posted: 30 Dec 2017, 0:32:43 UTC

I'm trying this again but it's not working this time.

Errors:
"12/29/2017 6:10:40 PM | WUProp@Home | Your app_config.xml file refers to an unknown application 'data_collect_v4'. Known applications: None"


And:
"12/29/2017 6:10:56 PM | WUProp@Home | Not accepting requests from this host"

Profile STE\/E
           
Avatar
Send message
Joined: 28 Mar 10
Posts: 642
Credit: 3,866,133
RAC: 462
Total hours: 20,093,787
Message 5861 - Posted: 30 Dec 2017, 10:22:25 UTC - in response to Message 5860.

And:
"12/29/2017 6:10:56 PM | WUProp@Home | Not accepting requests from this host"


The Project Administrator is Blocking the Second BOINC Client so you don't get to many Hr's ...

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5862 - Posted: 30 Dec 2017, 13:31:03 UTC - in response to Message 5861.
Last modified: 30 Dec 2017, 13:36:09 UTC

The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files.


Guess I need to ask [AF>WildWildWest]Sebastien for help?

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5863 - Posted: 30 Dec 2017, 13:31:06 UTC - in response to Message 5861.
Last modified: 30 Dec 2017, 13:31:33 UTC

(accidental double post)

mmonnin
       
Send message
Joined: 22 Aug 16
Posts: 417
Credit: 1,927,699
RAC: 707
Total hours: 9,129,371
Message 5864 - Posted: 30 Dec 2017, 13:39:00 UTC

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5865 - Posted: 30 Dec 2017, 14:38:44 UTC - in response to Message 5864.
Last modified: 30 Dec 2017, 14:42:35 UTC

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.



A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void.

The split into two clients is very necessary to avoid lost work.

mmonnin
       
Send message
Joined: 22 Aug 16
Posts: 417
Credit: 1,927,699
RAC: 707
Total hours: 9,129,371
Message 5867 - Posted: 30 Dec 2017, 18:45:07 UTC - in response to Message 5865.

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.



A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void.

The split into two clients is very necessary to avoid lost work.


Oh checkpoints, I read it wrong. Not sure a 2nd client is needed for that.

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5868 - Posted: 30 Dec 2017, 19:44:47 UTC - in response to Message 5867.
Last modified: 30 Dec 2017, 19:49:15 UTC

Which project is this?

If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do.



A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void.

The split into two clients is very necessary to avoid lost work.


Oh checkpoints, I read it wrong. Not sure a 2nd client is needed for that.


I'm sure.
Unless you know of a way to shutdown boinc.exe that forces all WU to create checkpoints even though the project doesn't support them.

I tried running the project in a VM and it's RAC is 1/4th that of raw so separate client is the best solution. The project remains in RAM and calculating while maintenance is performed on the other projects and 96 to 192 CPU hours are not wasted.

HAL9000
 
Avatar
Send message
Joined: 14 Dec 11
Posts: 33
Credit: 6,479,291
RAC: 316
Total hours: 9,528,401
Message 5869 - Posted: 30 Dec 2017, 22:50:57 UTC - in response to Message 5862.

The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files.


Guess I need to ask [AF>WildWildWest]Sebastien for help?

Instead of restarting BOINC you can have it reread the config files to pickup any changes.

Do you intend to put the project with long checkpoints in its own instance of BOINC and then have your other projects on the host in another instance?
That is easily doable. You just have to make sure you have properly limited the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors.
A host with 16 processors can not run more than 16 CPU tasks at once across any number of instances. Otherwise a WUprop server rule will ignore one, or all, of the clients running on that host.

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5870 - Posted: 30 Dec 2017, 23:24:50 UTC - in response to Message 5869.
Last modified: 30 Dec 2017, 23:48:48 UTC

the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors.


Is WUProps sophisticated enough to count the number of cores used by the multicore Virtual Boxes?

Instead of restarting BOINC you can have it reread the config files to pickup any changes.


That's nice to know (I'm forgetting things. I used to know that.) but it doesn't clear up the "Environment needs to be cleaned up" and 'Postponed" error messages of the Virtual Box WU's where I have to suspend all work and kill the vboxservice to clean up the errors.

Known defect in LHC WU's that they won't address.

Keeping the client separate is still the best solution for this reason.

mmonnin
       
Send message
Joined: 22 Aug 16
Posts: 417
Credit: 1,927,699
RAC: 707
Total hours: 9,129,371
Message 5871 - Posted: 31 Dec 2017, 0:11:32 UTC

I still don't get why you're shutting down a client.

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5872 - Posted: 31 Dec 2017, 0:18:21 UTC - in response to Message 5871.
Last modified: 31 Dec 2017, 0:27:58 UTC

I still don't get why you're shutting down a client.


but it doesn't clear up the "Environment needs to be cleaned up" and 'Postponed" error messages of the Virtual Box WU's where I have to suspend all work and kill the vboxservice to clean up the errors.

Known defect in LHC WU's that they won't address.


I have to shut down each client and restart Virtual Box minimum every 3 days.

Message discussion here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4526

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5873 - Posted: 31 Dec 2017, 17:56:11 UTC - in response to Message 5871.

Encountered another situation today.

Moving from 2 core to 1 core WU on LHC VM's.

Core count is virtual in the Boincmgr until the actual VM instance is created and started.

Changed the core count and RAM size in the app_config, told Boincmgr to read the config file and even restarted Boincmgr without stopping the boinc.exe client, but Boincmgr continued to consider the unstarted LHC WU's to still be 2 core instances.

Only after shutting down the boinc.exe client and restarting did Boincmgr properly register the core count of the VM WU's.

HAL9000
 
Avatar
Send message
Joined: 14 Dec 11
Posts: 33
Credit: 6,479,291
RAC: 316
Total hours: 9,528,401
Message 5874 - Posted: 1 Jan 2018, 8:37:47 UTC - in response to Message 5873.
Last modified: 1 Jan 2018, 8:38:00 UTC

Encountered another situation today.

Moving from 2 core to 1 core WU on LHC VM's.

Core count is virtual in the Boincmgr until the actual VM instance is created and started.

Changed the core count and RAM size in the app_config, told Boincmgr to read the config file and even restarted Boincmgr without stopping the boinc.exe client, but Boincmgr continued to consider the unstarted LHC WU's to still be 2 core instances.

Only after shutting down the boinc.exe client and restarting did Boincmgr properly register the core count of the VM WU's.

There is a known issue with BOINC not updating the display of resources used by tasks when values are changed in app_config until after it restarts. Even though it is using the new settings.
For example changing the settings
from
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
to
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>
</gpu_versions>
in order to run 2 tasks per GPU will continue to display Running (1 CPU + 1 GPU) with 2 tasks running instead of displaying Running (0.5 CPU + 0.5 GPU)

That could be the same issue you were seeing. Unless there is a separate issue where app_config settings are not correctly applied to VMs after being read.

Profile marmot
     
Avatar
Send message
Joined: 13 Dec 15
Posts: 174
Credit: 2,240,208
RAC: 549
Total hours: 5,162,706
Message 5875 - Posted: 2 Jan 2018, 16:22:08 UTC - in response to Message 5874.
Last modified: 2 Jan 2018, 16:22:50 UTC

A couple of the new WU that showed up in Virtual Box manager were actually 1 core instead of 2, so it was a reporting error in BoincMgr as you proffer. Although I can't be certain it's the same reporting bug, it appears to be.

Too annoying, and not acceptable behavior, while I try and hone in on the optimal settings on the machines.

Keeping the optimized work units separate from a client of work units currently under evaluation is yet another reason to keep the data directories split.


Post to thread

Message boards : Number crunching : I have two boincs clients on the same computer


Home | My Account | Message Boards | Results


Copyright © 2024 Sebastien