Posts by HAL9000

log in
1) Message boards : Number crunching : Not accepting requests from this host? (Message 5880)
Posted 4 Jan 2018 by HAL9000
It looks like that happened around the same time some old unpurged tasks were cleaned up in the database.
There were reports from some users gaining credit for projects they had never created an account for. Your host becoming banned might be another oddity from the cleanup.
2) Message boards : Number crunching : I have two boincs clients on the same computer (Message 5874)
Posted 1 Jan 2018 by HAL9000
Encountered another situation today.

Moving from 2 core to 1 core WU on LHC VM's.

Core count is virtual in the Boincmgr until the actual VM instance is created and started.

Changed the core count and RAM size in the app_config, told Boincmgr to read the config file and even restarted Boincmgr without stopping the boinc.exe client, but Boincmgr continued to consider the unstarted LHC WU's to still be 2 core instances.

Only after shutting down the boinc.exe client and restarting did Boincmgr properly register the core count of the VM WU's.

There is a known issue with BOINC not updating the display of resources used by tasks when values are changed in app_config until after it restarts. Even though it is using the new settings.
For example changing the settings
from
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
to
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.5</cpu_usage>
</gpu_versions>
in order to run 2 tasks per GPU will continue to display Running (1 CPU + 1 GPU) with 2 tasks running instead of displaying Running (0.5 CPU + 0.5 GPU)

That could be the same issue you were seeing. Unless there is a separate issue where app_config settings are not correctly applied to VMs after being read.
3) Message boards : Number crunching : I have two boincs clients on the same computer (Message 5869)
Posted 30 Dec 2017 by HAL9000
The client is split because that particular project doesn't make save points for 12 to 18 hours and I lose 1/2 to 3/4 a day of work shutting down BOINC to deal with the other projects' technical difficulties or to change my app_config files.


Guess I need to ask [AF>WildWildWest]Sebastien for help?

Instead of restarting BOINC you can have it reread the config files to pickup any changes.

Do you intend to put the project with long checkpoints in its own instance of BOINC and then have your other projects on the host in another instance?
That is easily doable. You just have to make sure you have properly limited the number of processors on each instance so the total number of running tasks does not exceed the number of actual processors.
A host with 16 processors can not run more than 16 CPU tasks at once across any number of instances. Otherwise a WUprop server rule will ignore one, or all, of the clients running on that host.
4) Message boards : Number crunching : WUProp Have A New Project URL? 's' added to http. (Message 5851)
Posted 24 Dec 2017 by HAL9000
I added some new hosts December 13 with https://wuprop.boinc-af.org/ and I continue to not have that issue.
If something on the project end was not configured correctly it should effect everyone, but it doesn't appear to be.

Sometimes BOINC has weird issues due to corruption in the client_state.xml. Which the only way to fix is to remove client_state.xml and client_state_prev.xml form the system. It is preferable to detach the host from its active projects first.
5) Message boards : Number crunching : Active BU Apps in Project List (Message 5850)
Posted 24 Dec 2017 by HAL9000
When a host that has tasks from a project that is no longer active is fired up and gets a new WUprop task then that project/app will now be "active".
It seems to occur on a pretty regular basis.
6) Message boards : Number crunching : Unpurged tasks between 2016-07-24 - 2016-07-27 (Message 5817)
Posted 6 Dec 2017 by HAL9000
You can't server abort/purge a task on a computer that's off.

Maybe newer WU's are ignored by the server when they are too old?

The tasks I was originally referring to were stuck in limbo. Most having been validated but not purged from the database.
That can happen if the server crashes or some other weirdness occurs. Which I believe occurred around the time of the dates I had listed.
There is a BOINC mechanism that is suppose to check a task once it hits its original due date. Then try to send it though the validation/purge process again if the minimum quorum is met.

It doesn't look like that worked for those tasks. However it looks like the admin recently, in the past few days, cleaned them up. Probably running some of the processes manually.
7) Message boards : Number crunching : [Ω] New / Intermittent applications - NOT! (Message 5813)
Posted 5 Dec 2017 by HAL9000
I would guess it is related to the stuck tasks from last year getting cleaned out.
8) Message boards : Number crunching : Unpurged tasks between 2016-07-24 - 2016-07-27 (Message 5791)
Posted 30 Nov 2017 by HAL9000
Looks like the servers did not clean up those tasks from last year in the second chance validation and clean up cycle that it does.

I guess they will require manual intervention to clean them from the database.
9) Message boards : News : Grid computing center (Message 5777)
Posted 18 Nov 2017 by HAL9000
That rule is not current either, thus the mention about unwritten policy changes.

I left some clients running while Goofyxgrid@home was down that normally only run that and WuProp. So they were left only running WuProp. After about a week, or maybe two, those clients were banned. I believe that is the only issue I've had under the current rule set, but that could have existed previously. It also seems reasonable to stop accepting work from a host that has only errored tasks for a week or more.
10) Message boards : News : Grid computing center (Message 5774)
Posted 17 Nov 2017 by HAL9000
It's an NCI app and Goofy is grouped with all other NCI projects as far as WUProp is concerned.

There must be a non-NCI project running in the 6hr of WUProp task to get any and all hours counted for. Otherwise its marked as invalid, ignored or something.

If there were 20 NCI apps/projects you could run them all.

You can run 56 thousand clients with Goofy on them as long as WUProp is not attached. Well you could try.

That should answer the question you actually answered. You never actually asked how many client instances because there are only 4 NCI Goofy apps right now and I think that is your real question. So yes, your question was answered by the previous poster. In that case the policy has changed several times, written and unwritten.

I believe your information is a bit out of date.
NCI only hosts are OK so long as they adhere to the current rule.
Here are a few examples:
My i3-390m is running 200 instances of BOINC. Each one of the 200 instance of BOINC is connected to Goofyxgrid@home and running all 4 apps. Out of those 200 instances of BOINC I have 4 of them connected to WuProp. As the system has 4 processors.
For my Pentium M system with 1 processor it has 150 BOINC instances connected to Goofyxgrid and 1 to WuProp.

I believe that if anyone is unable to figure it out they should run however many they like and see what happens.
11) Message boards : News : Grid computing center (Message 5759)
Posted 16 Nov 2017 by HAL9000
WuProp does not limit the number of Monkeys app instances you wish to run per host.
WuProp does limit the number of WuProp app instances you are permitted to run per host. Which has been clearly defined previously in this thread.
12) Message boards : Number crunching : WUProp Have A New Project URL? 's' added to http. (Message 5710)
Posted 9 Nov 2017 by HAL9000
I haven't had any issues since I switched to https in March.
13) Message boards : Number crunching : Max recommended clients? (Message 5673)
Posted 16 Oct 2017 by HAL9000
At present WuProp allows for the total number of BOINC instances on a computer equal to the number of processors in the system.
Your i7-2620M would be permitted to run WuProp on up to 4 of the 100 BOINC instances without the hosts being banned by the server.

Additionally, If you have not already done so. You will need to configure WuProp to communicate with each client. There is a post in multi clients on how to configure an app_info.xml or app_config.xml.
14) Message boards : Number crunching : not accepting requests from this host (Message 5630)
Posted 25 Sep 2017 by HAL9000
On several pcs, e.g. ID: 136192, I get the message "Not accepting requests from this host" and I don't get new wus.
I stopped crunshing XANSONS for about two weeks; now I wanted to continue and get the error message.


Yup...
http://wuprop.boinc-af.org/forum_thread.php?id=441

Adding most clients actually increased production quite a bit for a project I was working on. RAC kept climbing and mine got banned.



The thread you mentioned doesn't apply to my problem, because I don't run more clients than cores I have on any of my pcs.

Yeah it appears that if none of the projects on a host are providing work and only is WUprop running then after a while the host gets banned.

I guess hosts that only run projects with intermittent work are no longer welcome.
15) Message boards : Number crunching : Memory required by WUProp (Message 5483)
Posted 25 Jul 2017 by HAL9000
I didn't make any changes on the server.

The problem occurs only on linux hosts with NVIDIA GPU.

I think it is not a memory leak. The CUDA driver registers all the GPU(s) memory + host memory in a single virtual address space using the kernel's virtual memory system. It isn't actually memory consumption, per se, it is just a "trick" to map all the available address spaces into a linear virtual space for unified addressing.

I saw some CUDA threads in coredump, so it might be. However this does not explain why this memory consumption suddenly dropped to reasonable level. Also that machine where WUProp used 1GB is mysterious, it was either my Windows box or one of ARM devices without Nvidia GPU. I will monitor this memory consumption to see if it will go up again or stay low.

My Windows 7 host has stayed consistent at 33GB and I just checked the notebook I use for work, which I didn't check previously, and it is reporting 2TB in Windows 10.
Since my other machines with Radeon or Intel GPUs and are not exhibiting this behavior it does seem likely to be related to Nvidia GPUs or maybe some implementations of OpenCL.
16) Message boards : Number crunching : Memory required by WUProp (Message 5464)
Posted 19 Jul 2017 by HAL9000
That is interesting. I did a quick check of my systems using Process Explorer and found for most hosts the virtual memory size was ~50-60MB. However I did find on one of my hosts it was 33GB. I closed BOINC and relaunched it. The virtual size was still 33GB for the data collect app. I restarted the system size it had been up for about 45 days & the virtual memory size was still 33GB for the data collect app on the host.
It's not really a problem, just a little odd.
17) Message boards : News : Grid computing center (Message 5390)
Posted 13 Jul 2017 by HAL9000
I don't have any invalids but I have some clients with only GCC/WUProp. My GCC/Radioactive hours have varied quite a bit since the announcement though. Down, back up and now down again about 80 hours each direction. Might have to setup each client to run a a single small CPU task every 6 hours if the admin wants to play games.

I added CPU projects to some of my hosts that previously were GCC only and their WUProp tasks are still being marked as invalid after 18 hours of running CPU work.
I guess they are being marked invalid by a method other than not having a CPU project?


The one I checked says:
Erreur assignation taille application

Google translates to:
Application size error

Maybe reset the project? Del the exe and download another copy? Or copy from another instance.

2.1mil GCC hours. That's crazy.

My hours have been cut in half and invalid but show no error. Poor choice admin. Maybe I'll just run 32 clients with 1 task each and end up getting more points. Looks like other NCI apps don't count.

Also you realize this is killing hours for other apps as well?

I figured I would try detach/attach for WUprop. Then see if those hosts running CPU apps would start having valid tasks. I may need to force the BOINC clients to generate a new CPID before reattaching. Not sure yet.

I've been a fan of the Infinite monkey theorem before the BOINC project for it existed. So far I have put a bit more than 2.1 million hours into the monkey apps. It is closer to about 8.5 million hours to date.

I currently have several dual Xeon E5-2670 systems and was hoping to have a few of them run the monkey CPU apps. Once the BOINC bug preventing NCI and non-NCI apps from the same project is fixed and the project is able to distribute the CPU apps alongside the NCI apps. I wonder if those apps will have their time counted?


I have a E5-2670 v1 2P system as well. I had been running 2 clients at a full 32 threads with some other NCI only clients. I just dropped the 2nd client down about half threads and started running the other clients with 1 thread. Drug Discovery came back which has quick units to run on the other clients. WUProp tasks have begun validating and GCC hours are going back up.

At the moment I am limited to running 10 of my dual Xeon E5-2670 systems at once. At least until I have the unused electric dryer circuit moved from the laundry room to the computer loft. I have that scheduled to be done in Sept.
Then I'll probably only keep the systems running flat out when it is cool enough that the 5KW of heat doesn't make my place to warm. I might have to figure out some creative means of getting rid of all that heat.
18) Message boards : News : Grid computing center (Message 5380)
Posted 12 Jul 2017 by HAL9000
I don't have any invalids but I have some clients with only GCC/WUProp. My GCC/Radioactive hours have varied quite a bit since the announcement though. Down, back up and now down again about 80 hours each direction. Might have to setup each client to run a a single small CPU task every 6 hours if the admin wants to play games.

I added CPU projects to some of my hosts that previously were GCC only and their WUProp tasks are still being marked as invalid after 18 hours of running CPU work.
I guess they are being marked invalid by a method other than not having a CPU project?


The one I checked says:
Erreur assignation taille application

Google translates to:
Application size error

Maybe reset the project? Del the exe and download another copy? Or copy from another instance.

2.1mil GCC hours. That's crazy.

My hours have been cut in half and invalid but show no error. Poor choice admin. Maybe I'll just run 32 clients with 1 task each and end up getting more points. Looks like other NCI apps don't count.

Also you realize this is killing hours for other apps as well?

I figured I would try detach/attach for WUprop. Then see if those hosts running CPU apps would start having valid tasks. I may need to force the BOINC clients to generate a new CPID before reattaching. Not sure yet.

I've been a fan of the Infinite monkey theorem before the BOINC project for it existed. So far I have put a bit more than 2.1 million hours into the monkey apps. It is closer to about 8.5 million hours to date.

I currently have several dual Xeon E5-2670 systems and was hoping to have a few of them run the monkey CPU apps. Once the BOINC bug preventing NCI and non-NCI apps from the same project is fixed and the project is able to distribute the CPU apps alongside the NCI apps. I wonder if those apps will have their time counted?
19) Message boards : News : Grid computing center (Message 5374)
Posted 11 Jul 2017 by HAL9000
I don't have any invalids but I have some clients with only GCC/WUProp. My GCC/Radioactive hours have varied quite a bit since the announcement though. Down, back up and now down again about 80 hours each direction. Might have to setup each client to run a a single small CPU task every 6 hours if the admin wants to play games.

I added CPU projects to some of my hosts that previously were GCC only and their WUProp tasks are still being marked as invalid after 18 hours of running CPU work.
I guess they are being marked invalid by a method other than not having a CPU project?
20) Message boards : Number crunching : Only half (50%) elapsed time registered (Message 5145)
Posted 25 May 2017 by HAL9000
Seems to be a PROBLEM WITH HOW WuProp IS CALULATING ONLY ~33 to 50% of REPORTED CPU RUN-TIMES as shown below in some recent completed WuProp Task:


Problem with BOINC client & Monkeys NCI

(2 or 4 core) / ((2 or 4 real CPU tasks) + 4 NCI monkeys + 0.01 WUprop) = 0.33 or 0.50

Use separate BOINC instance for monkeys NCI & main instance for monkeys CPU + other tasks

It seems there are at least 3 workarounds to have the hours counted correctly until the Monkeys apps are set correctly.
1) Make an app_config for the monkeys project
2) Set the number of CPUs in BOINC higher
3) Run a separate instances of BOINC


Next 20

Home | My Account | Message Boards | Results


Copyright © 2024 Sebastien