Checkpointing
log in |
Message boards : Number crunching : Checkpointing
Author | Message |
---|---|
Data collect 1.25 (NCI) normally checkpoints every 5 minutes on my systems but occasionally it runs for longer periods without checkpointing. The current task on my Q6600 XP system has just had the most extreme example of this behaviour that I've seen so far, running for 5 hours without performing a checkpoint: 20/05/2010 12:50:35 WUProp@Home [checkpoint_debug] result wu_1274213743_6997_0 checkpointed
20/05/2010 17:50:36 WUProp@Home [checkpoint_debug] result wu_1274213743_6997_0 checkpointed The output file shows that a cycle was still being performed every 5 minutes, with the checkpoints corresponding to cycles 43 and 103: 12:50:35 (3584): cycle43
17:50:35 (3584): cycle103 | |
ID: 238 · Rating: 0 · rate: / Reply Quote | |
Application checkpoints every 5 minutes. | |
ID: 239 · Rating: 0 · rate: / Reply Quote | |
It appears that the new app Data collect version 3 v3.25 (nci) is failing to checkpoint properly. On a restart the task starts from the beginning. My windows host states ''Erreur assignation project_name (node project)'' and then ''checkpoint failed File exists''every minute in the </stderr_txt>. | |
ID: 616 · Rating: 0 · rate: / Reply Quote | |
Message boards :
Number crunching :
Checkpointing