WU aborted by server (finish file present too long) - why?

Message boards : Number crunching : WU aborted by server (finish file present too long) - why?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Nikolay A. Saharov
   
Help desk expert

Send message
Joined: 28 Mar 10
Posts: 12
Credit: 330,167
RAC: 46
Message 4249 - Posted: 21 Sep 2016, 4:46:17 UTC
Last modified: 21 Sep 2016, 4:52:30 UTC

Hello,
I have strange error.

The task #63839471 was aborted by server after 6 hours running with error "finish file present too long".

Why?

Log:

21.09.2016 7:00:55 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:01:40 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:02:09 | WUProp@Home | Sending scheduler request: Requested by project.
21.09.2016 7:02:09 | WUProp@Home | Not requesting tasks: non CPU intensive
21.09.2016 7:02:10 | WUProp@Home | Scheduler request completed
21.09.2016 7:02:30 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:03:53 | WUProp@Home | [task] task_state=ABORT_PENDING for data_collect_v4_1474189777_61052_0 from request_abort
21.09.2016 7:03:53 | | request_abort(): PID 53416 has 1 descendants
21.09.2016 7:03:53 | | PID 30248
21.09.2016 7:03:53 | WUProp@Home | [task] result state=COMPUTE_ERROR for data_collect_v4_1474189777_61052_0 from
CS::report_result_error
21.09.2016 7:03:53 | WUProp@Home | [task] result state=ABORTED for data_collect_v4_1474189777_61052_0 from abort_task
21.09.2016 7:03:57 | WUProp@Home | [task] Process for data_collect_v4_1474189777_61052_0 exited, exit code 0, task state 5
21.09.2016 7:03:57 | WUProp@Home | [task] task_state=ABORTED for data_collect_v4_1474189777_61052_0 from handle_exited_app
21.09.2016 7:04:02 | WUProp@Home | Computation for task data_collect_v4_1474189777_61052_0 finished
21.09.2016 7:04:02 | WUProp@Home | [task] result state=COMPUTE_ERROR for data_collect_v4_1474189777_61052_0 from CS::app_finished
21.09.2016 7:04:03 | WUProp@Home | Started upload of data_collect_v4_1474189777_61052_0_0
21.09.2016 7:04:11 | WUProp@Home | Finished upload of data_collect_v4_1474189777_61052_0_0
21.09.2016 7:04:17 | WUProp@Home | [error] Couldn't delete file projects/wuprop.boinc-af.org/data_collect_v4_1474189777_61052_0_0.gzt


Task protokol:

Name data_collect_v4_1474189777_61052_0
Workunit 61421895
Created 20 Sep 2016, 14:12:29 UTC
Sent 20 Sep 2016, 22:01:22 UTC
Report deadline 27 Sep 2016, 22:01:22 UTC
Received 21 Sep 2016, 4:31:15 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT
Computer ID 3139
Run time 6 hours 1 min 54 sec
CPU time 1 min 41 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 0.03 GFLOPS
Application version Data collect version 4 v4.20 (nci)
Peak working set size 19.12 MB
Peak swap size 28.97 MB
Peak disk usage 0.01 MB


<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
finish file present too long
</message>
<stderr_txt>
No ATI library found.
No OpenCL library found
04:54:02 (53416): facteur correction: 0.571429 (4/7.403000)
04:57:02 (53416): facteur correction: 0.800000 (4/5.403000)
07:03:39 (53416): called boinc_finish

</stderr_txt>
]]>


ID: 4249 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Pooh Bear 27
 
Avatar

Send message
Joined: 22 Jan 13
Posts: 107
Credit: 801,901
RAC: 46
Message 4250 - Posted: 21 Sep 2016, 9:30:58 UTC
Last modified: 21 Sep 2016, 9:31:10 UTC

Did your Windows 10 update? I hear there are some problems with the last update. I am seeing this on other forums. It's mostly machines with NVidia cards and you may need to reinstall or update you NVidia driver to fix.

My movie https://vimeo.com/manage/videos/502242
ID: 4250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BilBg
Avatar

Send message
Joined: 20 Jun 12
Posts: 63
Credit: 94,685
RAC: 0
Message 4251 - Posted: 21 Sep 2016, 18:38:52 UTC - in response to Message 4249.  

This is BOINC error (bug/feature), not application nor "aborted by server"

http://setiathome.berkeley.edu/forum_thread.php?id=78642
http://setiathome.berkeley.edu/forum_thread.php?id=79721
http://setiathome.berkeley.edu/forum_thread.php?id=78498



- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 4251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nikolay A. Saharov
   
Help desk expert

Send message
Joined: 28 Mar 10
Posts: 12
Credit: 330,167
RAC: 46
Message 4256 - Posted: 21 Sep 2016, 21:16:11 UTC - in response to Message 4251.  
Last modified: 21 Sep 2016, 21:47:45 UTC

Thank you, it's interesting.

So, I didn't restart computer at that moment. Something strange.
It seems that this error has happened several times before.

ID: 4256 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BilBg
Avatar

Send message
Joined: 20 Jun 12
Posts: 63
Credit: 94,685
RAC: 0
Message 4259 - Posted: 22 Sep 2016, 14:52:36 UTC - in response to Message 4256.  
Last modified: 22 Sep 2016, 15:03:11 UTC

From your log:
"
21.09.2016 7:02:30 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:03:53 | WUProp@Home | [task] task_state=ABORT_PENDING for data_collect_v4_1474189777_61052_0 from request_abort
...
21.09.2016 7:03:57 | WUProp@Home | [task] Process for data_collect_v4_1474189777_61052_0 exited, exit code 0, task state 5
...
07:03:39 (53416): called boinc_finish
"

At 07:03:39 the application created boinc_finish_called file and called boinc_finish() - "the last function it calls in application source code and that never returns back to app"
http://setiathome.berkeley.edu/forum_thread.php?id=79721&postid=1792646#1792646
(Raistmer is the developer of OpenCL SETI@home applications)

For some reason BOINC didn't respond to the boinc_finish() and didn't see the file boinc_finish_called for 14 seconds (till 7:03:53)

App exited at 7:03:57 (18 seconds after calling boinc_finish() ) with "exit code 0" = No error
I feel this BOINC check/precaution from hanging apps is too aggressive.

I think the hard-coded timeout was (is?) 10 seconds (at which point BOINC decides to abort the task).


This delay of 14 seconds may be caused by:
- Antivirus scan (lock on the files in BOINC Data directory)
- High HDD activity (paging, defrag)
- Anything that appears to 'hang' the computer for some time

I think the hard-coded timeout was increased (to 60 seconds?) in some BOINC version but don't know which.
Richard Haselgrove or Ageless should know.



- ALF - "Find out what you don't do well ..... then don't do it!" :)
ID: 4259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Nikolay A. Saharov
   
Help desk expert

Send message
Joined: 28 Mar 10
Posts: 12
Credit: 330,167
RAC: 46
Message 4266 - Posted: 26 Sep 2016, 5:00:36 UTC - in response to Message 4259.  

Thanks!

Yes, sometimes my computer is unresponsive, because multicore VMs or GPU BOINC tasks are running.

ID: 4266 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : WU aborted by server (finish file present too long) - why?

©2024 Sébastien