Xtrack beam simulation

Message boards : Number crunching : Xtrack beam simulation
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9726 - Posted: 4 Mar 2023, 0:49:22 UTC
Last modified: 4 Mar 2023, 1:05:24 UTC

My machine got 38 of 48 of these an hour ago with a deadline of about 7 days.

Paused 37 and letting 1 run to check it's run seconds.

Once my 100 hours are done the rest can be aborted to, hopefully, be sent back out to one of you all.

I should arrange the best UTC time to do this.
A natural nightowl so it'll need to be between 20 UTC till 11 UTC the next day.

Of course, they may only run 1 hour each...
ID: 9726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9728 - Posted: 4 Mar 2023, 13:23:15 UTC - in response to Message 9726.  
Last modified: 4 Mar 2023, 13:23:38 UTC

I got 4 of these and after less than 1 hour they are at 100% but at almost 12 hours they are STILL running, I'm only running 2 at a time since I saw they were stuck at 100% but those 2 are still going with the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size this on a laptop with 64GB of ram so no issues with memory. The stderr.txt files says:

18:48:29 (3060): app started; CPU time 20.000000, flags:
ID: 9728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
 
Avatar

Send message
Joined: 20 Jun 12
Posts: 142
Credit: 342,716
RAC: 66
Message 9730 - Posted: 4 Mar 2023, 15:34:08 UTC - in response to Message 9728.  
Last modified: 4 Mar 2023, 15:35:30 UTC

the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size

Same as for any other application from any other project: if they are not using any CPU time, try restarting them, if that doesn't help, abort them as they will never finish if they are not doing anything.
ID: 9730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9731 - Posted: 4 Mar 2023, 20:46:21 UTC - in response to Message 9728.  
Last modified: 4 Mar 2023, 20:52:24 UTC

I got 4 of these and after less than 1 hour they are at 100% but at almost 12 hours they are STILL running, :


BOINC client had no information on expected FLOPS so it assigned them as 2 minute jobs.
So what you are seeing is the client assigning work progression bar on a WU it has no information on.

I probably only need 10 of these to get 100 hours.

Will release the rest sometime after I get back from work; probably at 8 UTC.

-----

My work progression 1/5th slower than real time because the upgraded Broadwell CPU's aren't hyperthreading properly and need to overbook BOINC just to get the 2nd Xeon to start doing work. I think this BIOS was released when the highest thread count was 56 and doesn't handle 72 threads correctly. I need to sign up for a HuaWei account and d/l the newest BIOS. (The other server works fine with Hyperthreading off, but I was experimenting to get Hyperthreading to work when these WU's came my way).
Wonder if the USA anti-HuaWei sanctions will stop me?
ID: 9731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9732 - Posted: 5 Mar 2023, 0:57:38 UTC - in response to Message 9730.  

the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size


Same as for any other application from any other project: if they are not using any CPU time, try restarting them, if that doesn't help, abort them as they will never finish if they are not doing anything.


I tried suspending them for over 30 seconds but they just kept going and are now up over 21 hours for the 2 that were over 8 hours earlier today. Wuprop IS counting the hours so I'm not aborting anything, I have run Climate Prediction tasks for months before so running this or running something else is no difference to me.
ID: 9732 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9733 - Posted: 5 Mar 2023, 1:00:04 UTC - in response to Message 9731.  


My work progression 1/5th slower than real time because the upgraded Broadwell CPU's aren't hyperthreading properly and need to overbook BOINC just to get the 2nd Xeon to start doing work. I think this BIOS was released when the highest thread count was 56 and doesn't handle 72 threads correctly. I need to sign up for a HuaWei account and d/l the newest BIOS. (The other server works fine with Hyperthreading off, but I was experimenting to get Hyperthreading to work when these WU's came my way).
Wonder if the USA anti-HuaWei sanctions will stop me?


I would guess you could still get them Canada or with a VPN from overseas as well, there are several free VPN's you can use if you aren't already and several common browsers have them built-in these days as well.
ID: 9733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9734 - Posted: 5 Mar 2023, 6:16:09 UTC - in response to Message 9732.  

the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size


Same as for any other application from any other project: if they are not using any CPU time, try restarting them, if that doesn't help, abort them as they will never finish if they are not doing anything.


I tried suspending them for over 30 seconds but they just kept going and are now up over 21 hours for the 2 that were over 8 hours earlier today. Wuprop IS counting the hours so I'm not aborting anything, I have run Climate Prediction tasks for months before so running this or running something else is no difference to me.


Are yours checkpointing?

My properties page has the same time since last checkpoint as the starting time. In the earliest versions of SiDock long, that meant, if they leave RAM then the wu started all over. In later versions they would continue from their own proprietary checkpoint, but you lose credit, as BOINC only counts using the checkpoints (if I understood correctly).


I'm going to release 30 in groups of 10 and let the other 8 complete (I could go with 4 with 24 hour run time).
What's the typical procedure for this so WUProps ppl get a shot at tough to get WU's?
ID: 9734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
 
Avatar

Send message
Joined: 20 Jun 12
Posts: 142
Credit: 342,716
RAC: 66
Message 9735 - Posted: 5 Mar 2023, 10:46:49 UTC - in response to Message 9732.  
Last modified: 5 Mar 2023, 10:50:13 UTC

the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size


Same as for any other application from any other project: if they are not using any CPU time, try restarting them, if that doesn't help, abort them as they will never finish if they are not doing anything.


I tried suspending them for over 30 seconds but they just kept going and are now up over 21 hours for the 2 that were over 8 hours earlier today. Wuprop IS counting the hours so I'm not aborting anything, I have run Climate Prediction tasks for months before so running this or running something else is no difference to me.

Unless WUprop hours are all that matters to you, you should look in the task properties if the used CPU time is increasing or not each time you open them (or you check that in Windows Task Manager).
ID: 9735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9736 - Posted: 5 Mar 2023, 11:23:25 UTC

I see continuous CPU usage from these for the last 18 hours.

I had to undue the hyperthreading because it got to a point where BOINC had stalled on any network transactions and WUProps wasn't even running.

Discovered two issues with these that I need to report to the dev's at the LHC-dev site.
They refuse to suspend, then exit RAM, when boinc.exe sends the shutdown command and they do not appear to have saved their work upon restart (but unsure about this).
They start at 0% and 0 minutes upon a client restart.
They are not ready for general release.

The machine is now stable and they are running again.
At least they are hardened against the horrid 'computation error' after I had to terminate them manually with Process Hacker.

These are development WU's; they should hit a variety of users machines so I'll abort 5 at a time over the next 2 days. If anyone cares about the timing then send me a PM so BOINC client pings me.
ID: 9736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9737 - Posted: 5 Mar 2023, 12:39:09 UTC - in response to Message 9735.  

the time clicking by second by second taking about 5MB for the virtual size and almost 11MB for the working size


Same as for any other application from any other project: if they are not using any CPU time, try restarting them, if that doesn't help, abort them as they will never finish if they are not doing anything.


I tried suspending them for over 30 seconds but they just kept going and are now up over 21 hours for the 2 that were over 8 hours earlier today. Wuprop IS counting the hours so I'm not aborting anything, I have run Climate Prediction tasks for months before so running this or running something else is no difference to me.


Unless WUprop hours are all that matters to you, you should look in the task properties if the used CPU time is increasing or not each time you open them (or you check that in Windows Task Manager).


The time IS increasing in the Task Properties so I am letting then just run, the deadline isn't until the 9th so they are okay for now.
ID: 9737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9738 - Posted: 5 Mar 2023, 12:41:50 UTC - in response to Message 9736.  

I see continuous CPU usage from these for the last 18 hours.

I had to undue the hyperthreading because it got to a point where BOINC had stalled on any network transactions and WUProps wasn't even running.

Discovered two issues with these that I need to report to the dev's at the LHC-dev site.
They refuse to suspend, then exit RAM, when boinc.exe sends the shutdown command and they do not appear to have saved their work upon restart (but unsure about this).
They start at 0% and 0 minutes upon a client restart.
They are not ready for general release.

The machine is now stable and they are running again.
At least they are hardened against the horrid 'computation error' after I had to terminate them manually with Process Hacker.

These are development WU's; they should hit a variety of users machines so I'll abort 5 at a time over the next 2 days. If anyone cares about the timing then send me a PM so BOINC client pings me.


You are right mine have not checkpointed since they started so my suspending and then restarting them didn't work, and since the last checkpoint was the same time as the tasks started I'm not going to try and force it to start over either. They expire on the 9th so it's okay with me to let them just run. I went to the LHC-dev forums and the few people posting all seem to be having problems with them.
ID: 9738 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9740 - Posted: 5 Mar 2023, 20:38:50 UTC

The 5 I aborted didn't get resent. Just lost.

Sorry, I better just run the rest.
ID: 9740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9746 - Posted: 6 Mar 2023, 12:24:57 UTC - in response to Message 9740.  

The 5 I aborted didn't get resent. Just lost.

Sorry, I better just run the rest.


I got to my 100 hours on the first 2 tasks so aborted them and am now running my second 2 tasks, hopefully they are better but probably not.
ID: 9746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9755 - Posted: 7 Mar 2023, 11:24:55 UTC

Left my comments on how well these apps are working.
Hopefully Lawrence will see them.

https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=557&postid=7927#7927

These are going to replace the 6 track app, AKAIK.

At least they did not 'error while computing' after being abused heavily from 2 reboots and manual process terminations.

Did you have anything to add, Mikey?
ID: 9755 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9759 - Posted: 7 Mar 2023, 12:10:34 UTC - in response to Message 9755.  

Left my comments on how well these apps are working.
Hopefully Lawrence will see them.

https://lhcathomedev.cern.ch/lhcathome-dev/forum_thread.php?id=557&postid=7927#7927

These are going to replace the 6 track app, AKAIK.

At least they did not 'error while computing' after being abused heavily from 2 reboots and manual process terminations.

Did you have anything to add, Mikey?


No I aborted another one this morning after running it for another 24 hours with nothing but the clock ticking up, I am now crunching the last one I got in the first send outs but have little hope it will do anything either.
ID: 9759 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile STE\/E
             
Avatar

Send message
Joined: 28 Mar 10
Posts: 672
Credit: 3,996,407
RAC: 711
Message 9762 - Posted: 8 Mar 2023, 2:53:42 UTC

anything special you have to do to get theses Wu's ... ???
ID: 9762 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
     
Avatar

Send message
Joined: 20 May 10
Posts: 555
Credit: 1,910,265
RAC: 778
Message 9764 - Posted: 8 Mar 2023, 11:51:18 UTC - in response to Message 9762.  

anything special you have to do to get theses Wu's ... ???


No but I think they stopped sending them out due to the problems with them, no word from any Admin types yet though on their forums
ID: 9764 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9804 - Posted: 14 Mar 2023, 0:14:20 UTC - in response to Message 9762.  

Just got lucky to be working on the machine, with all projects suspended with work fetch still on.

They're not expiring, and still on 'Fellowship' sever.
Guess I'll go for 5k hours, at some rate, then abort.
ID: 9804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marmot
     
Avatar

Send message
Joined: 13 Dec 15
Posts: 176
Credit: 2,278,192
RAC: 628
Message 9828 - Posted: 15 Mar 2023, 22:28:47 UTC - in response to Message 9762.  
Last modified: 15 Mar 2023, 22:28:59 UTC

anything special you have to do to get theses Wu's ... ???


Did you get some?

Looks like the ones I aborted, and the ones that went past deadline, actually got resent.

Make sure they hit a Windows OS, where they run forever. On a Linux box (these WU's are experimenting with a 1 app fits many OS wrapper) they complete in 2-4 secs.
ID: 9828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile STE\/E
             
Avatar

Send message
Joined: 28 Mar 10
Posts: 672
Credit: 3,996,407
RAC: 711
Message 9829 - Posted: 15 Mar 2023, 22:38:29 UTC - in response to Message 9828.  

anything special you have to do to get theses Wu's ... ???


Did you get some?

Looks like the ones I aborted, and the ones that went past deadline, actually got resent.

Make sure they hit a Windows OS, where they run forever. On a Linux box (these WU's are experimenting with a 1 app fits many OS wrapper) they complete in 2-4 secs.

No I've never got any so something special must have to be done ... ?
https://signature.statseb.fr/sig-1323.png
https://stats.free-dc.org/badgesbanner.php?cpid=13a87c3a303bcdca4ba0ed600daebb6b
ID: 9829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Xtrack beam simulation

©2024 Sébastien