Memory required by WUProp

log in

Advanced search

Message boards : Number crunching : Memory required by WUProp

Author Message
Profile [B@P] Daniel
     
Send message
Joined: 2 Oct 16
Posts: 85
Credit: 847,739
RAC: 0
Total hours: 8,364,920
Message 5457 - Posted: 18 Jul 2017, 17:56:00 UTC

I use Boinc Tasks, and recently I added column with Virtual Memory size for tasks. Today I noticed that on one hosts WUProp uses 90 GB of memory - that's a lot! On other hosts usually this is below 100MB. What is more interesting, this particular host runs 4 CPU and 5 NCI tasks only. Look like some bug for me.

Screenshot is here: https://pasteboard.co/GBxrcJu.png
____________

mmonnin
       
Send message
Joined: 22 Aug 16
Posts: 417
Credit: 1,927,925
RAC: 706
Total hours: 9,130,576
Message 5458 - Posted: 18 Jul 2017, 18:49:29 UTC - in response to Message 5457.

Wow thats a lot, especially for so few tasks running.

There was a user on my team that said he turned off WUProp and his MW times dropped a noticeable amount and he gained 400k RAC across several computers. (Several 7990s). He was going to turn it back on after a day or two to see if MW times went back up.

That got me to check out the CPU time on some of my hosts and noticed one went up when I switched from Asteroids over to Rosetta and then to WCG
http://wuprop.boinc-af.org/results.php?hostid=122461

The tasks with CPU time 200+ seconds was with Rosetta, the others with Asteroids and WCG. Same number of tasks running. I didn't check memory usage.

Does the WUProp memory/CPU usage vary depending on how long a CPU task has been running?

Profile [B@P] Daniel
     
Send message
Joined: 2 Oct 16
Posts: 85
Credit: 847,739
RAC: 0
Total hours: 8,364,920
Message 5459 - Posted: 18 Jul 2017, 19:43:20 UTC
Last modified: 18 Jul 2017, 20:14:55 UTC

I have collected coredump from running process (90GB file). Now I am compressing it, xz should do good job here. Seb, you will have a lot of data to analyse :)

Edit: 90GB compressed to 14MB, nice :). Seb, I sent you PM with download link and system details.

Edit2: This is link to this WU: http://wuprop.boinc-af.org/result.php?resultid=72876366. It is still running now, should complete in about 2.5h.
____________

Profile [B@P] Daniel
     
Send message
Joined: 2 Oct 16
Posts: 85
Credit: 847,739
RAC: 0
Total hours: 8,364,920
Message 5460 - Posted: 19 Jul 2017, 18:36:51 UTC
Last modified: 19 Jul 2017, 18:37:53 UTC

Yesterday's WU finished successfully. Today I have another one with 90GB of memory allocated on this machine. Something fishy is going there. I suspect massive memory leak.
____________

Dr Who Fan
     
Avatar
Send message
Joined: 29 Jul 11
Posts: 316
Credit: 1,154,691
RAC: 361
Total hours: 1,595,922
Message 5462 - Posted: 19 Jul 2017, 21:59:52 UTC - in response to Message 5460.

..."90GB of memory" does not sound right. I see you are running Linux on a 32-core PC. The Windows version of Wuprop does not report back about memory usage on the task detail page.

I also use BoincTasks to monitor my multiple PC's & tasks. The largest WuProp task I currently have running is using aprox 77MB (MegaBytes) Virtual Memory and aprox 7.7MB Real/Physical Memory on a 4-core Windows 8.1 PC.
____________

Profile [B@P] Daniel
     
Send message
Joined: 2 Oct 16
Posts: 85
Credit: 847,739
RAC: 0
Total hours: 8,364,920
Message 5463 - Posted: 19 Jul 2017, 23:01:00 UTC - in response to Message 5462.

..."90GB of memory" does not sound right. I see you are running Linux on a 32-core PC. The Windows version of Wuprop does not report back about memory usage on the task detail page.

I also use BoincTasks to monitor my multiple PC's & tasks. The largest WuProp task I currently have running is using aprox 77MB (MegaBytes) Virtual Memory and aprox 7.7MB Real/Physical Memory on a 4-core Windows 8.1 PC.

Please check again. I have one Windows machine, and both BoincTasks and BoincMgr reports virtual memory size for WUProp - in my case about 10MB.
____________

HAL9000
 
Avatar
Send message
Joined: 14 Dec 11
Posts: 33
Credit: 6,479,419
RAC: 318
Total hours: 9,528,737
Message 5464 - Posted: 19 Jul 2017, 23:23:54 UTC

That is interesting. I did a quick check of my systems using Process Explorer and found for most hosts the virtual memory size was ~50-60MB. However I did find on one of my hosts it was 33GB. I closed BOINC and relaunched it. The virtual size was still 33GB for the data collect app. I restarted the system size it had been up for about 45 days & the virtual memory size was still 33GB for the data collect app on the host.
It's not really a problem, just a little odd.

Profile [B@P] Daniel
     
Send message
Joined: 2 Oct 16
Posts: 85
Credit: 847,739
RAC: 0
Total hours: 8,364,920
Message 5471 - Posted: 21 Jul 2017, 17:45:04 UTC

It is strange. Today memory usage dropped to 38MB. What is more interesting this happened not only for this one host, it look like it dropped on other too. I am sure about one other, which was using 1GB - I do not see this number anymore. Other ones also looks smaller to me. I checked app list and I do not see any new app version added there recently, so most probably Seb changed something on server side (probably params passed to app on start). I suspect he recently changed something to get more data and ban more cheaters. Seb, could you tell us more what was going on recently?
____________

Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2677
Credit: 513,719
RAC: 95
Total hours: 1,427,338
Message 5472 - Posted: 21 Jul 2017, 18:36:30 UTC - in response to Message 5471.

I didn't make any changes on the server.

The problem occurs only on linux hosts with NVIDIA GPU.

I think it is not a memory leak. The CUDA driver registers all the GPU(s) memory + host memory in a single virtual address space using the kernel's virtual memory system. It isn't actually memory consumption, per se, it is just a "trick" to map all the available address spaces into a linear virtual space for unified addressing.
____________

Profile [B@P] Daniel
     
Send message
Joined: 2 Oct 16
Posts: 85
Credit: 847,739
RAC: 0
Total hours: 8,364,920
Message 5473 - Posted: 21 Jul 2017, 19:04:57 UTC - in response to Message 5472.

I didn't make any changes on the server.

The problem occurs only on linux hosts with NVIDIA GPU.

I think it is not a memory leak. The CUDA driver registers all the GPU(s) memory + host memory in a single virtual address space using the kernel's virtual memory system. It isn't actually memory consumption, per se, it is just a "trick" to map all the available address spaces into a linear virtual space for unified addressing.

I saw some CUDA threads in coredump, so it might be. However this does not explain why this memory consumption suddenly dropped to reasonable level. Also that machine where WUProp used 1GB is mysterious, it was either my Windows box or one of ARM devices without Nvidia GPU. I will monitor this memory consumption to see if it will go up again or stay low.
____________

HAL9000
 
Avatar
Send message
Joined: 14 Dec 11
Posts: 33
Credit: 6,479,419
RAC: 318
Total hours: 9,528,737
Message 5483 - Posted: 25 Jul 2017, 2:48:19 UTC - in response to Message 5473.
Last modified: 25 Jul 2017, 2:49:47 UTC

I didn't make any changes on the server.

The problem occurs only on linux hosts with NVIDIA GPU.

I think it is not a memory leak. The CUDA driver registers all the GPU(s) memory + host memory in a single virtual address space using the kernel's virtual memory system. It isn't actually memory consumption, per se, it is just a "trick" to map all the available address spaces into a linear virtual space for unified addressing.

I saw some CUDA threads in coredump, so it might be. However this does not explain why this memory consumption suddenly dropped to reasonable level. Also that machine where WUProp used 1GB is mysterious, it was either my Windows box or one of ARM devices without Nvidia GPU. I will monitor this memory consumption to see if it will go up again or stay low.

My Windows 7 host has stayed consistent at 33GB and I just checked the notebook I use for work, which I didn't check previously, and it is reporting 2TB in Windows 10.
Since my other machines with Radeon or Intel GPUs and are not exhibiting this behavior it does seem likely to be related to Nvidia GPUs or maybe some implementations of OpenCL.


Post to thread

Message boards : Number crunching : Memory required by WUProp


Home | My Account | Message Boards | Results


Copyright © 2024 Sebastien