Message boards :
Number crunching :
I have two boincs clients on the same computer
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Mar 17 Posts: 3 Credit: 426,449 RAC: 0 |
"C:\Program Files\BOINC\boinc.exe" --allow_multiple_clients -dir c:\programdata\boinc_2 -gui_rpc_port 9999" But wuprop does not recognize this second client. |
Send message Joined: 28 Mar 10 Posts: 588 Credit: 1,220,097 RAC: 237 |
|
Send message Joined: 27 Oct 15 Posts: 201 Credit: 1,649,160 RAC: 0 |
"C:\Program Files\BOINC\boinc.exe" --allow_multiple_clients -dir c:\programdata\boinc_2 -gui_rpc_port 9999" c:\programdata\boinc_2\projects\wuprop.boinc-af.org\app_config.xml: <app_config> <app_version> <app_name>data_collect_v4</app_name> <plan_class>nci</plan_class> <avg_ncpus>0.01</avg_ncpus> <cmdline>-p 9999</cmdline> </app_version> </app_config> |
Send message Joined: 21 Mar 17 Posts: 3 Credit: 426,449 RAC: 0 |
Yes ! I added app_config.xml in c:\programdata\boinc_2\projects\wuprop.boinc-af.org\app_config.xm It work well ! |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
I'm trying this again but it's not working this time. Errors: "12/29/2017 6:10:40 PM | WUProp@Home | Your app_config.xml file refers to an unknown application 'data_collect_v4'. Known applications: None" And: "12/29/2017 6:10:56 PM | WUProp@Home | Not accepting requests from this host" |
Send message Joined: 28 Mar 10 Posts: 671 Credit: 3,987,227 RAC: 691 |
And: The Project Administrator is Blocking the Second BOINC Client so you don't get to many Hr's ... |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files. Guess I need to ask [AF>WildWildWest]Sebastien for help? |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
(accidental double post) |
Send message Joined: 22 Aug 16 Posts: 447 Credit: 2,089,557 RAC: 712 |
Which project is this? If you want to run this project that "doesn't make save points for 12 to 18 hours" when there is work available and to keep the CPUs doing something else the rest of the day then you can do this with resource share. Set the 2nd project to 0% resource share and it will only download work when there is nothing else to do. |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
Which project is this? A save point is where the WU saves it's progress while performing the calculations. The CPU's are used 99.99% of the time. If you interrupt the WU before it makes a save point then 4, 8 or up to 18 hours of 16 or 32 cores of CPU cycles are just thrown into the void. The split into two clients is very necessary to avoid lost work. |
Send message Joined: 22 Aug 16 Posts: 447 Credit: 2,089,557 RAC: 712 |
Which project is this? Oh checkpoints, I read it wrong. Not sure a 2nd client is needed for that. |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
Which project is this? I'm sure. Unless you know of a way to shutdown boinc.exe that forces all WU to create checkpoints even though the project doesn't support them. I tried running the project in a VM and it's RAC is 1/4th that of raw so separate client is the best solution. The project remains in RAM and calculating while maintenance is performed on the other projects and 96 to 192 CPU hours are not wasted. |
Send message Joined: 14 Dec 11 Posts: 39 Credit: 6,533,741 RAC: 277 |
The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files. Instead of restarting BOINC you can have it reread the config files to pickup any changes. Do you intend to put the project with long checkpoints in its own instance of BOINC and then have your other projects on the host in another instance? That is easily doable. You just have to make sure you have properly limited the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors. A host with 16 processors can not run more than 16 CPU tasks at once across any number of instances. Otherwise a WUprop server rule will ignore one, or all, of the clients running on that host. |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors. Is WUProps sophisticated enough to count the number of cores used by the multicore Virtual Boxes? Instead of restarting BOINC you can have it reread the config files to pickup any changes. That's nice to know (I'm forgetting things. I used to know that.) but it doesn't clear up the "Environment needs to be cleaned up" and 'Postponed" error messages of the Virtual Box WU's where I have to suspend all work and kill the vboxservice to clean up the errors. Known defect in LHC WU's that they won't address. Keeping the client separate is still the best solution for this reason. |
Send message Joined: 22 Aug 16 Posts: 447 Credit: 2,089,557 RAC: 712 |
I still don't get why you're shutting down a client. |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
I still don't get why you're shutting down a client. but it doesn't clear up the "Environment needs to be cleaned up" and 'Postponed" error messages of the Virtual Box WU's where I have to suspend all work and kill the vboxservice to clean up the errors. I have to shut down each client and restart Virtual Box minimum every 3 days. Message discussion here: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4526 |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
Encountered another situation today. Moving from 2 core to 1 core WU on LHC VM's. Core count is virtual in the Boincmgr until the actual VM instance is created and started. Changed the core count and RAM size in the app_config, told Boincmgr to read the config file and even restarted Boincmgr without stopping the boinc.exe client, but Boincmgr continued to consider the unstarted LHC WU's to still be 2 core instances. Only after shutting down the boinc.exe client and restarting did Boincmgr properly register the core count of the VM WU's. |
Send message Joined: 14 Dec 11 Posts: 39 Credit: 6,533,741 RAC: 277 |
Encountered another situation today. There is a known issue with BOINC not updating the display of resources used by tasks when values are changed in app_config until after it restarts. Even though it is using the new settings. For example changing the settings from <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> to <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.5</cpu_usage> </gpu_versions> in order to run 2 tasks per GPU will continue to display Running (1 CPU + 1 GPU) with 2 tasks running instead of displaying Running (0.5 CPU + 0.5 GPU) That could be the same issue you were seeing. Unless there is a separate issue where app_config settings are not correctly applied to VMs after being read. |
Send message Joined: 13 Dec 15 Posts: 174 Credit: 2,268,868 RAC: 251 |
A couple of the new WU that showed up in Virtual Box manager were actually 1 core instead of 2, so it was a reporting error in BoincMgr as you proffer. Although I can't be certain it's the same reporting bug, it appears to be. Too annoying, and not acceptable behavior, while I try and hone in on the optimal settings on the machines. Keeping the optimized work units separate from a client of work units currently under evaluation is yet another reason to keep the data directories split. |
©2024 Sébastien