Task running for over 26 hours after clock change

Message boards : Number crunching : Task running for over 26 hours after clock change
Message board moderation

To post messages, you must log in.

AuthorMessage
Thyme Lawn
 

Send message
Joined: 11 Apr 10
Posts: 54
Credit: 382,341
RAC: 0
Message 3483 - Posted: 4 Dec 2015, 14:45:53 UTC

I've just spotted that one of my computers has been running a WUProp task for over 26 hours, stuck at 91.135%.

Checking back through the BOINC event log and the NTP log file it looks like I had a rogue NTP clock update yesterday which caused the date to jump forward 3 months for less than 30 seconds. The WUProp task checkpointed once during that period but hasn't made another one since the time was corrected.

Restarting BOINC caused the elapsed time to wind back to its previous checkpoint value of 5:27:00 but the task still made no further checkpoints.

I dug a bit deeper and the problem appears to be the contents of the checkpoint file in the slot directory:

329
1457016105
1449134076

The second line converts to UTC time 14:45:31 on 3rd March 2016.
The last line converts to UTC time 09:14:36 on 3rd December 2015.

At the same time the checkpoint file on my other computer contained:

11
1449237710
1449237110

The second line converts to UTC time 14:01:50 on 4th December 2015.
The last line converts to UTC time 23:51:50 on 4th December 2015.

To test if the checkpoint file was the problem I stopped BOINC, changed the second line to the current time_t value and restarted BOINC. Checkpoints started being made again and the task was reported 33 minutes later, albeit with 24 hours of missing data (task 53970364).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 3483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BilBg
Avatar

Send message
Joined: 20 Jun 12
Posts: 63
Credit: 94,685
RAC: 0
Message 3485 - Posted: 5 Dec 2015, 3:59:15 UTC - in response to Message 3483.  
Last modified: 5 Dec 2015, 4:49:25 UTC

Great analysis!

Jumping forward/back in time can cause problems in many programs
E.g. what happened with the (Windows XP) "Scheduled Tasks"?

I noticed Process Lasso reports that some process "Ran for 0 ms" if I change the clock back and then exit this program/process.


This can also make BOINC itself "crazy"
(e.g. may make the statistics_*.xml files 'bad' - they will need manual edit)

I don't know if the following was fixed in some BOINC version but it exists/existed for many years:
- Set computer clock back a few (5) minutes
- Watch the processes (tasks) started by BOINC (in Windows Task Manager, Process Explorer) - they will exit after 30 seconds but BOINC Manager will continue to show them "Running"
- After 5 minutes they will be restarted

I reported this in 2008 ("Computer Clock back stops BOINC"):
http://setiathome.berkeley.edu/forum_thread.php?id=45717


I use the very small (10 KB) program Neutron
http://keir.net/neutron.html

With the following contents in Neutron.ini

[Options]
AutoSync=0
AutoExit=0
Server=25
Retry=1

[Servers]
0="cuckoo.nevada.edu"
1="ntp.nblug.org"
2="ntp0.cornell.edu"
3="timekeeper.isi.edu"
4="nist1.symmetricom.com"
5="clock.via.net"
6="nist1.aol-ca.truetime.com"
7="nist.expertsmi.com"
8="nist1-dc.WiTime.net"
9="nist1-sj.WiTime.net"
10="time-a.nist.gov"
11="time-a.timefreq.bldrdoc.gov"
12="time-b.nist.gov"
13="time-b.timefreq.bldrdoc.gov"
14="time-c.timefreq.bldrdoc.gov"
15="utcnist.colorado.edu"
16="time.ien.it"
17="time.nrc.ca"
18="time.chu.nrc.ca"
19="clock.psu.edu"
20="tick.greyware.com"
21="ntp1.as34288.net"
22="ntp0.as34288.net"
23="time.ufe.cz"
24="ntp0.fau.de"
25="time.fu-berlin.de"
26="time.windows.com"
27="time.nist.gov"


I mostly use time.fu-berlin.de which is also set in Windows XP (instead of time.windows.com)
Never had problem with wrong time. (I use that server since 2-3 years)


The last line converts to UTC time 23:51:50 on 4th December 2015

UNIX TimeStamp: 1449237110 = 04 Dec 2015 13:51:50 GMT
http://www.onlineconversion.com/unix_time.htm
 



- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 3485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Task running for over 26 hours after clock change

©2024 Sébastien