Message boards :
Number crunching :
Xtrack beam simulation
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
My machine got 38 of 48 of these an hour ago with a deadline of about 7 days. Paused 37 and letting 1 run to check it's run seconds. Once my 100 hours are done the rest can be aborted to, hopefully, be sent back out to one of you all. I should arrange the best UTC time to do this. A natural nightowl so it'll need to be between 20 UTC till 11 UTC the next day. Of course, they may only run 1 hour each... |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
I got 4 of these and after less than 1 hour they are at 100% but at almost 12 hours they are STILL running, I'm only running 2 at a time since I saw they were stuck at 100% but those 2 are still going with the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size this on a laptop with 64GB of ram so no issues with memory. The stderr.txt files says: 18:48:29 (3060): app started; CPU time 20.000000, flags: |
Send message Joined: 20 Jun 12 Posts: 143 Credit: 343,198 RAC: 64 |
the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size Same as for any other application from any other project: if they are not using any CPU time, try restarting them, if that doesn't help, abort them as they will never finish if they are not doing anything. |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
I got 4 of these and after less than 1 hour they are at 100% but at almost 12 hours they are STILL running, : BOINC client had no information on expected FLOPS so it assigned them as 2 minute jobs. So what you are seeing is the client assigning work progression bar on a WU it has no information on. I probably only need 10 of these to get 100 hours. Will release the rest sometime after I get back from work; probably at 8 UTC. ----- My work progression 1/5th slower than real time because the upgraded Broadwell CPU's aren't hyperthreading properly and need to overbook BOINC just to get the 2nd Xeon to start doing work. I think this BIOS was released when the highest thread count was 56 and doesn't handle 72 threads correctly. I need to sign up for a HuaWei account and d/l the newest BIOS. (The other server works fine with Hyperthreading off, but I was experimenting to get Hyperthreading to work when these WU's came my way). Wonder if the USA anti-HuaWei sanctions will stop me? |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size I tried suspending them for over 30 seconds but they just kept going and are now up over 21 hours for the 2 that were over 8 hours earlier today. Wuprop IS counting the hours so I'm not aborting anything, I have run Climate Prediction tasks for months before so running this or running something else is no difference to me. |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
I would guess you could still get them Canada or with a VPN from overseas as well, there are several free VPN's you can use if you aren't already and several common browsers have them built-in these days as well. |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size Are yours checkpointing? My properties page has the same time since last checkpoint as the starting time. In the earliest versions of SiDock long, that meant, if they leave RAM then the wu started all over. In later versions they would continue from their own proprietary checkpoint, but you lose credit, as BOINC only counts using the checkpoints (if I understood correctly). I'm going to release 30 in groups of 10 and let the other 8 complete (I could go with 4 with 24 hour run time). What's the typical procedure for this so WUProps ppl get a shot at tough to get WU's? |
Send message Joined: 20 Jun 12 Posts: 143 Credit: 343,198 RAC: 64 |
the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size Unless WUprop hours are all that matters to you, you should look in the task properties if the used CPU time is increasing or not each time you open them (or you check that in Windows Task Manager). |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
I see continuous CPU usage from these for the last 18 hours. I had to undue the hyperthreading because it got to a point where BOINC had stalled on any network transactions and WUProps wasn't even running. Discovered two issues with these that I need to report to the dev's at the LHC-dev site. They refuse to suspend, then exit RAM, when boinc.exe sends the shutdown command and they do not appear to have saved their work upon restart (but unsure about this). They start at 0% and 0 minutes upon a client restart. They are not ready for general release. The machine is now stable and they are running again. At least they are hardened against the horrid 'computation error' after I had to terminate them manually with Process Hacker. These are development WU's; they should hit a variety of users machines so I'll abort 5 at a time over the next 2 days. If anyone cares about the timing then send me a PM so BOINC client pings me. |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size The time IS increasing in the Task Properties so I am letting then just run, the deadline isn't until the 9th so they are okay for now. |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
I see continuous CPU usage from these for the last 18 hours. You are right mine have not checkpointed since they started so my suspending and then restarting them didn't work, and since the last checkpoint was the same time as the tasks started I'm not going to try and force it to start over either. They expire on the 9th so it's okay with me to let them just run. I went to the LHC-dev forums and the few people posting all seem to be having problems with them. |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
The 5 I aborted didn't get resent. Just lost. Sorry, I better just run the rest. |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
The 5 I aborted didn't get resent. Just lost. I got to my 100 hours on the first 2 tasks so aborted them and am now running my second 2 tasks, hopefully they are better but probably not. |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
Left my comments on how well these apps are working. Hopefully Lawrence will see them. https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=557&postid=7927#7927 These are going to replace the 6 track app, AKAIK. At least they did not 'error while computing' after being abused heavily from 2 reboots and manual process terminations. Did you have anything to add, Mikey? |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
Left my comments on how well these apps are working. No I aborted another one this morning after running it for another 24 hours with nothing but the clock ticking up, I am now crunching the last one I got in the first send outs but have little hope it will do anything either. |
Send message Joined: 28 Mar 10 Posts: 672 Credit: 4,001,797 RAC: 699 |
anything special you have to do to get theses Wu's ... ??? |
Send message Joined: 20 May 10 Posts: 556 Credit: 1,916,309 RAC: 774 |
anything special you have to do to get theses Wu's ... ??? No but I think they stopped sending them out due to the problems with them, no word from any Admin types yet though on their forums |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
Just got lucky to be working on the machine, with all projects suspended with work fetch still on. They're not expiring, and still on 'Fellowship' sever. Guess I'll go for 5k hours, at some rate, then abort. |
Send message Joined: 13 Dec 15 Posts: 177 Credit: 2,283,568 RAC: 651 |
anything special you have to do to get theses Wu's ... ??? Did you get some? Looks like the ones I aborted, and the ones that went past deadline, actually got resent. Make sure they hit a Windows OS, where they run forever. On a Linux box (these WU's are experimenting with a 1 app fits many OS wrapper) they complete in 2-4 secs. |
Send message Joined: 28 Mar 10 Posts: 672 Credit: 4,001,797 RAC: 699 |
anything special you have to do to get theses Wu's ... ??? No I've never got any so something special must have to be done ... ? https://signature.statseb.fr/sig-1323.png https://stats.free-dc.org/badgesbanner.php?cpid=13a87c3a303bcdca4ba0ed600daebb6b |
©2024 Sébastien