Hours severely underestimated for short GPU tasks

log in

Advanced search

Message boards : Number crunching : Hours severely underestimated for short GPU tasks

Author Message
Sellyme
   
Send message
Joined: 21 Jul 13
Posts: 69
Credit: 691,597
RAC: 0
Total hours: 717,632
Message 7100 - Posted: 19 Apr 2020, 8:47:29 UTC
Last modified: 19 Apr 2020, 9:05:18 UTC

I'm running the project "Private GFN Server" (which, despite its name, is not private), and this project has two extremely short applications, GFN-13 Prime Search, and GFN-14 Prime Search. In order to ensure that my GPU is fully utilised, I'm running 4 of these at a time, GFN-13s taking about 1 minute each (so 15 seconds per task), GFN-14s taking about 2 minutes each (30 seconds per task).

Looking at my daily hours reported, I'm seeing drastic underestimation of these - particularly GFN-13. If I look at my last day of work on that computer (image available here), you'll see that there's ~16.6 hours of data there based on the NCI tasks. Collats Sieve is running on the integrated GPU, so you can ignore that. The only tasks there running on the dedicated GPU are Asteroids@Home, and GFN-13. Asteroids seems to be tracking correctly @ 5.25 hours, so there's around 11.35 hours left over, all of which were presumably occupied by GFN-13.

Now, theoretically, GFN-13 should be counting 4 hours for every actual hour it runs, since I'm running 4 tasks at a time. There's likely a fair amount of overhead involved in starting and running them considering how short the task is, but this is definitely not enough of a difference to turn ~45 hours into 0.38.

I haven't paid as much attention to GFN-14, but it seems like it may have had similar issues, although certainly far less drastic ones.

I don't really know what's happening here. Maybe WUProp is erroneously tracking CPU time, maybe it just can't cope with such short work units and misses most of them (this would explain GFN-14 tracking better), or maybe it's something else entirely, but the GFN-13 data is definitely massively under-reporting.

These tasks do run at 0.01 CPUs + 0.25 GPUs, so maybe it's recording the CPU time at a 0.01 "multithreaded" multiplier? 0.38 hours * 100 would be almost exactly what I'd expect, with the remaining handful of hours being easily explained by the start/stop overhead.

Looking at the global stats it certainly doesn't seem like the top users there are having that problem, though. I'm just confused at this point.

mmonnin
     
Send message
Joined: 22 Aug 16
Posts: 298
Credit: 1,268,804
RAC: 814
Total hours: 5,363,725
Message 7101 - Posted: 19 Apr 2020, 11:27:27 UTC

I think I recall WUProp captures the running tasks every 1 minute so short tasks can end up short of hours. The same thing happens at other projects with short tasks. You can load up more than needed as far as GPU util goes to extend the task length.

Sellyme
   
Send message
Joined: 21 Jul 13
Posts: 69
Credit: 691,597
RAC: 0
Total hours: 717,632
Message 7103 - Posted: 20 Apr 2020, 4:25:14 UTC

I doubled the number of running tasks from 4 to 8 (doubling their runtime to about 2 minutes), and it now seems to be tracking accurately. Thanks for the suggestion.


Post to thread

Message boards : Number crunching : Hours severely underestimated for short GPU tasks


Home | My Account | Message Boards | Results


Copyright © 2021 Sebastien