Hours severely underestimated for short GPU tasks

Message boards : Number crunching : Hours severely underestimated for short GPU tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Sellyme
   

Send message
Joined: 21 Jul 13
Posts: 69
Credit: 691,597
RAC: 0
Message 7100 - Posted: 19 Apr 2020, 8:47:29 UTC
Last modified: 19 Apr 2020, 9:05:18 UTC

I'm running the project "Private GFN Server" (which, despite its name, is not private), and this project has two extremely short applications, GFN-13 Prime Search, and GFN-14 Prime Search. In order to ensure that my GPU is fully utilised, I'm running 4 of these at a time, GFN-13s taking about 1 minute each (so 15 seconds per task), GFN-14s taking about 2 minutes each (30 seconds per task).

Looking at my daily hours reported, I'm seeing drastic underestimation of these - particularly GFN-13. If I look at my last day of work on that computer (image available here), you'll see that there's ~16.6 hours of data there based on the NCI tasks. Collats Sieve is running on the integrated GPU, so you can ignore that. The only tasks there running on the dedicated GPU are Asteroids@Home, and GFN-13. Asteroids seems to be tracking correctly @ 5.25 hours, so there's around 11.35 hours left over, all of which were presumably occupied by GFN-13.

Now, theoretically, GFN-13 should be counting 4 hours for every actual hour it runs, since I'm running 4 tasks at a time. There's likely a fair amount of overhead involved in starting and running them considering how short the task is, but this is definitely not enough of a difference to turn ~45 hours into 0.38.

I haven't paid as much attention to GFN-14, but it seems like it may have had similar issues, although certainly far less drastic ones.

I don't really know what's happening here. Maybe WUProp is erroneously tracking CPU time, maybe it just can't cope with such short work units and misses most of them (this would explain GFN-14 tracking better), or maybe it's something else entirely, but the GFN-13 data is definitely massively under-reporting.

These tasks do run at 0.01 CPUs + 0.25 GPUs, so maybe it's recording the CPU time at a 0.01 "multithreaded" multiplier? 0.38 hours * 100 would be almost exactly what I'd expect, with the remaining handful of hours being easily explained by the start/stop overhead.

Looking at the global stats it certainly doesn't seem like the top users there are having that problem, though. I'm just confused at this point.
ID: 7100 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin
       

Send message
Joined: 22 Aug 16
Posts: 448
Credit: 2,093,023
RAC: 693
Message 7101 - Posted: 19 Apr 2020, 11:27:27 UTC

I think I recall WUProp captures the running tasks every 1 minute so short tasks can end up short of hours. The same thing happens at other projects with short tasks. You can load up more than needed as far as GPU util goes to extend the task length.
ID: 7101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sellyme
   

Send message
Joined: 21 Jul 13
Posts: 69
Credit: 691,597
RAC: 0
Message 7103 - Posted: 20 Apr 2020, 4:25:14 UTC

I doubled the number of running tasks from 4 to 8 (doubling their runtime to about 2 minutes), and it now seems to be tracking accurately. Thanks for the suggestion.
ID: 7103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Hours severely underestimated for short GPU tasks

©2024 Sébastien