WU aborted by server (finish file present too long) - why?

log in

Advanced search

Message boards : Number crunching : WU aborted by server (finish file present too long) - why?

Author Message
Profile Nikolay A. Saharov
   
Send message
Joined: 28 Mar 10
Posts: 12
Credit: 222,674
RAC: 37
Total hours: 238,895
Message 4249 - Posted: 21 Sep 2016, 4:46:17 UTC
Last modified: 21 Sep 2016, 4:52:30 UTC

Hello,
I have strange error.

The task #63839471 was aborted by server after 6 hours running with error "finish file present too long".

Why?

Log:


21.09.2016 7:00:55 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:01:40 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:02:09 | WUProp@Home | Sending scheduler request: Requested by project.
21.09.2016 7:02:09 | WUProp@Home | Not requesting tasks: non CPU intensive
21.09.2016 7:02:10 | WUProp@Home | Scheduler request completed
21.09.2016 7:02:30 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:03:53 | WUProp@Home | [task] task_state=ABORT_PENDING for data_collect_v4_1474189777_61052_0 from request_abort
21.09.2016 7:03:53 | | request_abort(): PID 53416 has 1 descendants
21.09.2016 7:03:53 | | PID 30248
21.09.2016 7:03:53 | WUProp@Home | [task] result state=COMPUTE_ERROR for data_collect_v4_1474189777_61052_0 from
CS::report_result_error
21.09.2016 7:03:53 | WUProp@Home | [task] result state=ABORTED for data_collect_v4_1474189777_61052_0 from abort_task
21.09.2016 7:03:57 | WUProp@Home | [task] Process for data_collect_v4_1474189777_61052_0 exited, exit code 0, task state 5
21.09.2016 7:03:57 | WUProp@Home | [task] task_state=ABORTED for data_collect_v4_1474189777_61052_0 from handle_exited_app
21.09.2016 7:04:02 | WUProp@Home | Computation for task data_collect_v4_1474189777_61052_0 finished
21.09.2016 7:04:02 | WUProp@Home | [task] result state=COMPUTE_ERROR for data_collect_v4_1474189777_61052_0 from CS::app_finished
21.09.2016 7:04:03 | WUProp@Home | Started upload of data_collect_v4_1474189777_61052_0_0
21.09.2016 7:04:11 | WUProp@Home | Finished upload of data_collect_v4_1474189777_61052_0_0
21.09.2016 7:04:17 | WUProp@Home | [error] Couldn't delete file projects/wuprop.boinc-af.org/data_collect_v4_1474189777_61052_0_0.gzt


Task protokol:

Name data_collect_v4_1474189777_61052_0
Workunit 61421895
Created 20 Sep 2016, 14:12:29 UTC
Sent 20 Sep 2016, 22:01:22 UTC
Report deadline 27 Sep 2016, 22:01:22 UTC
Received 21 Sep 2016, 4:31:15 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 194 (0xc2) EXIT_ABORTED_BY_CLIENT
Computer ID 3139
Run time 6 hours 1 min 54 sec
CPU time 1 min 41 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 0.03 GFLOPS
Application version Data collect version 4 v4.20 (nci)
Peak working set size 19.12 MB
Peak swap size 28.97 MB
Peak disk usage 0.01 MB


<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
finish file present too long
</message>
<stderr_txt>
No ATI library found.
No OpenCL library found
04:54:02 (53416): facteur correction: 0.571429 (4/7.403000)
04:57:02 (53416): facteur correction: 0.800000 (4/5.403000)
07:03:39 (53416): called boinc_finish

</stderr_txt>
]]>

____________

Profile Pooh Bear 27
 
Avatar
Send message
Joined: 22 Jan 13
Posts: 92
Credit: 566,171
RAC: 270
Total hours: 1,149,353
Message 4250 - Posted: 21 Sep 2016, 9:30:58 UTC
Last modified: 21 Sep 2016, 9:31:10 UTC

Did your Windows 10 update? I hear there are some problems with the last update. I am seeing this on other forums. It's mostly machines with NVidia cards and you may need to reinstall or update you NVidia driver to fix.
____________

Profile BilBg
Avatar
Send message
Joined: 20 Jun 12
Posts: 63
Credit: 84,022
RAC: 18
Total hours: 97,636
Message 4251 - Posted: 21 Sep 2016, 18:38:52 UTC - in response to Message 4249.

This is BOINC error (bug/feature), not application nor "aborted by server"

http://setiathome.berkeley.edu/forum_thread.php?id=78642
http://setiathome.berkeley.edu/forum_thread.php?id=79721
http://setiathome.berkeley.edu/forum_thread.php?id=78498
____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Nikolay A. Saharov
   
Send message
Joined: 28 Mar 10
Posts: 12
Credit: 222,674
RAC: 37
Total hours: 238,895
Message 4256 - Posted: 21 Sep 2016, 21:16:11 UTC - in response to Message 4251.
Last modified: 21 Sep 2016, 21:47:45 UTC

Thank you, it's interesting.

So, I didn't restart computer at that moment. Something strange.
It seems that this error has happened several times before.
____________

Profile BilBg
Avatar
Send message
Joined: 20 Jun 12
Posts: 63
Credit: 84,022
RAC: 18
Total hours: 97,636
Message 4259 - Posted: 22 Sep 2016, 14:52:36 UTC - in response to Message 4256.
Last modified: 22 Sep 2016, 15:03:11 UTC

From your log:
"
21.09.2016 7:02:30 | WUProp@Home | [task] result data_collect_v4_1474189777_61052_0 checkpointed
21.09.2016 7:03:53 | WUProp@Home | [task] task_state=ABORT_PENDING for data_collect_v4_1474189777_61052_0 from request_abort
...
21.09.2016 7:03:57 | WUProp@Home | [task] Process for data_collect_v4_1474189777_61052_0 exited, exit code 0, task state 5
...
07:03:39 (53416): called boinc_finish
"

At 07:03:39 the application created boinc_finish_called file and called boinc_finish() - "the last function it calls in application source code and that never returns back to app"
http://setiathome.berkeley.edu/forum_thread.php?id=79721&postid=1792646#1792646
(Raistmer is the developer of OpenCL SETI@home applications)

For some reason BOINC didn't respond to the boinc_finish() and didn't see the file boinc_finish_called for 14 seconds (till 7:03:53)

App exited at 7:03:57 (18 seconds after calling boinc_finish() ) with "exit code 0" = No error
I feel this BOINC check/precaution from hanging apps is too aggressive.

I think the hard-coded timeout was (is?) 10 seconds (at which point BOINC decides to abort the task).


This delay of 14 seconds may be caused by:
- Antivirus scan (lock on the files in BOINC Data directory)
- High HDD activity (paging, defrag)
- Anything that appears to 'hang' the computer for some time

I think the hard-coded timeout was increased (to 60 seconds?) in some BOINC version but don't know which.
Richard Haselgrove or Ageless should know.
____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Profile Nikolay A. Saharov
   
Send message
Joined: 28 Mar 10
Posts: 12
Credit: 222,674
RAC: 37
Total hours: 238,895
Message 4266 - Posted: 26 Sep 2016, 5:00:36 UTC - in response to Message 4259.

Thanks!

Yes, sometimes my computer is unresponsive, because multicore VMs or GPU BOINC tasks are running.
____________


Post to thread

Message boards : Number crunching : WU aborted by server (finish file present too long) - why?


Home | My Account | Message Boards | Results


Copyright © 2017 Sebastien