Message boards :
Number crunching :
WuProp is still killing active or suspended tasks...
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Jul 11 Posts: 334 Credit: 1,240,435 RAC: 321 |
Again, I am seeing WuProp killing active or suspended (in memory/disk) tasks that have or have not checkpointed when it completes and reports a WuProp task or when it communicates with the WuProp servers... I lost a cumulative of over 40 hours on the new QMC tasks that do not checkpoint for many hours and had to decide between running WuProp or QMC on the affected PC's... Only seems to happen if the computers clock is off by more than a few seconds from what WuProp servers time is when the WuProp program "phones home". EXAMPLE: 7/25/2013 3:07:02 PM | | [http] HTTP_OP::init_get(): http://wuprop.boinc-af.org/notices.py?userid=4482&auth=4482_23d09a8fd4c23574a90b20e69a3e802c 7/25/2013 3:07:02 PM | | [http] HTTP_OP::libcurl_exec(): ca-bundle set 7/25/2013 3:07:02 PM | | [http] [ID#0] Info: Connection #0 seems to be dead! 7/25/2013 3:07:02 PM | | [http] [ID#0] Info: Closing connection #0 7/25/2013 3:07:02 PM | | [http] [ID#0] Info: About to connect() to wuprop.boinc-af.org port 80 (#0) 7/25/2013 3:07:02 PM | | [http] [ID#0] Info: Trying 46.105.102.130... 7/25/2013 3:07:03 PM | | [http] [ID#0] Info: Connected to wuprop.boinc-af.org (46.105.102.130) port 80 (#0) 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: GET /notices.py?userid=4482&auth=4482_23d09a8fd4c23574a90b20e69a3e802c HTTP/1.1 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_intelx86 6.12.34) 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: Host: wuprop.boinc-af.org 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: Accept: */* 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded 7/25/2013 3:07:03 PM | | [http] [ID#0] Sent header to server: 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: HTTP/1.1 200 OK 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: Date: Thu, 25 Jul 2013 20:05:53 GMT 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: Server: Apache/2.2.16 (Debian) 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: Expires: "Thu, 25 Jul 2013 21:05:53 GMT" 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: Last-Modified: "Thu, 25 Jul 2013 20:05:53 GMT" 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: Content-Length: 299 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: Content-Type: application/xml 7/25/2013 3:07:03 PM | | [http] [ID#0] Received header from server: 7/25/2013 3:07:03 PM | | [http] [ID#0] Info: Connection #0 to host wuprop.boinc-af.org left intact 7/25/2013 3:12:21 PM | | System clock was turned backwards; clearing timeouts 7/25/2013 3:12:22 PM | QMC@HOME | Task korth001001b_10038070_1 exited with zero status but no 'finished' file The computers clock was NOT turned back as far as I can see, if it was, there should be a notation in the computers SYSTEM log of it. This has been a known problem## with the WuProp task for some time now and needs to be fixed. ##See Message boards : Number crunching : Problem with Linux locking up machines??? --- ... --- specifically message #1131 |
Send message Joined: 20 Jun 12 Posts: 63 Credit: 94,685 RAC: 0 |
This is not WuProp fault, this is long existed BOINC fault. I reported this on 4 Mar 2008 (and maybe there exist older reports): Computer Clock back stops BOINC: http://setiathome.berkeley.edu/forum_thread.php?id=45717 If "the computers clock is off by more than a few seconds" (more precisely 'if the computers clock is off more than 30 seconds') and then for whatever reason the computers clock returns (synchronizes) to the proper current time BOINC stops all apps and restarts them after the 'missing' period of time has passed. (search in Google for: "No heartbeat from core client for 31 sec - exiting") So "tasks that have not checkpointed" lose all progress. (tasks that have checkpointed restart from the last checkpoint) - ALF - "Find out what you don't do well ..... then don't do it!" :) |
Send message Joined: 7 Sep 10 Posts: 453 Credit: 945,109 RAC: 0 |
I've had 3 errors in the last few days (all on the same system). I added a couple of GPU's (to test GPU detection): http://wuprop.boinc-af.org/result.php?resultid=30850512 Name wu_v3_1374391616_297861_0 Workunit 30364010 Created 28 Jul 2013 20:43:51 UTC Sent 28 Jul 2013 20:58:05 UTC Received 28 Jul 2013 21:54:23 UTC Server state Over Outcome Computation error Client state Compute error Exit status -2147483645 (0xffffffff80000003) Computer ID 44207 Report deadline 7 Aug 2013 20:58:05 UTC Run time 780.33 CPU time 0.36 Validate state Invalid Credit 0.00 Application version Data collect version 3 v3.52 (nci) Stderr output <core_client_version>7.2.5</core_client_version> <![CDATA[ <message> One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003) </message> <stderr_txt> 21:58:07 (5236): Nombre de GPU non conforme: 3 21:58:07 (5236): Erreur wu_terminee (wu deja reportee) 21:58:07 (5236): Erreur wu_terminee (wu deja reportee) 22:02:06 (5236): Erreur wu_terminee (wu deja reportee) 22:08:06 (5236): Erreur wu_terminee (wu en cours d'upload) 4 22:09:06 (5236): Erreur wu_terminee (wu deja reportee) ERROR: Invalid parameter detected in function (null). File: (null) Line: 0 ERROR: Expression: (null) Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x76413219 Engaging BOINC Windows Runtime Debugger... Name wu_v3_1374391616_205787_0 Workunit 30271936 Created 26 Jul 2013 11:28:20 UTC Sent 26 Jul 2013 11:30:29 UTC Received 26 Jul 2013 14:58:50 UTC Server state Over Outcome Computation error Client state Compute error Exit status -2147483645 (0xffffffff80000003) Computer ID 44207 Report deadline 5 Aug 2013 11:30:29 UTC Run time 3,363.82 CPU time 5.71 Validate state Invalid Credit 0.00 Application version Data collect version 3 v3.52 (nci) Stderr output <core_client_version>7.2.5</core_client_version> <![CDATA[ <message> One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003) </message> <stderr_txt> 12:30:20 (29912): Nombre de GPU non conforme: 2 12:30:20 (29912): Erreur wu_terminee (wu deja reportee) 12:41:20 (29912): Erreur wu_terminee (wu deja reportee) 12:47:20 (29912): Erreur wu_terminee (wu deja reportee) 12:51:20 (29912): Erreur wu_terminee (wu deja reportee) 12:56:20 (29912): Erreur wu_terminee (wu deja reportee) 12:57:20 (29912): Erreur wu_terminee (wu deja reportee) 12:57:20 (29912): Erreur wu_terminee (wu deja reportee) 13:04:32 (29912): Erreur wu_terminee (wu en cours d'upload) 4 13:05:21 (29912): Erreur wu_terminee (wu deja reportee) 13:14:23 (29912): Erreur wu_terminee (wu deja reportee) 13:19:23 (29912): Erreur wu_terminee (wu deja reportee) 13:26:23 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:23 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:23 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:23 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:23 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:27 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:28 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:29 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:30 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:31 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:32 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:33 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:33 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:34 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:34 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:35 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:35 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:36 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:36 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:36 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:37 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:37 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:37 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:38 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:39 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:39 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:40 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:40 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:40 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:41 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:41 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:41 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:26:41 (29912): Erreur wu_terminee (wu en cours d'upload) 3 13:40:10 (1264): Erreur reception host_info 13:40:41 (1264): Mise en veille (derniere requete date de moins de 30 secondes 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:41 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:42 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:42 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:42 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:42 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:43 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:43 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:43 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:43 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:44 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:44 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:44 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:44 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:45 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:45 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:45 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:45 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:46 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:46 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:46 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:46 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:47 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:47 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:47 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:47 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:48 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:49 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:49 (1264): Erreur wu_terminee (wu en cours d'upload) 3 13:40:49 (1264): Erreur wu_terminee (wu en cours d'upload) 3 ERROR: Invalid parameter detected in function (null). File: (null) Line: 0 ERROR: Expression: (null) Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x762A3219 Engaging BOINC Windows Runtime Debugger... http://wuprop.boinc-af.org/result.php?resultid=30815863 Name wu_v3_1374391616_263505_0 Workunit 30329654 Created 27 Jul 2013 23:05:51 UTC Sent 27 Jul 2013 23:07:44 UTC Received 28 Jul 2013 8:56:58 UTC Server state Over Outcome Computation error Client state Compute error Exit status -2147483645 (0xffffffff80000003) Computer ID 44207 Report deadline 6 Aug 2013 23:07:44 UTC Run time 9,060.93 CPU time 4.48 Validate state Invalid Credit 0.00 Application version Data collect version 3 v3.52 (nci) Stderr output <core_client_version>7.2.5</core_client_version> <![CDATA[ <message> One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003) </message> <stderr_txt> 00:07:45 (1600): Nombre de GPU non conforme: 3 00:07:45 (1600): Erreur wu_terminee (wu deja reportee) 00:09:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:10:45 (1600): Erreur wu_terminee (wu deja reportee) 00:12:05 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:12:45 (1600): Erreur wu_terminee (wu deja reportee) 00:16:45 (1600): Erreur wu_terminee (wu deja reportee) 00:16:45 (1600): Erreur wu_terminee (wu deja reportee) 00:18:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:18:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:19:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:19:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:20:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:20:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:21:45 (1600): Erreur wu_terminee (wu deja reportee) 00:21:45 (1600): Erreur wu_terminee (wu deja reportee) 00:22:45 (1600): Erreur wu_terminee (wu deja reportee) 00:26:45 (1600): Erreur wu_terminee (wu deja reportee) 00:30:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:31:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:32:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:33:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:34:46 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:35:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:36:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:37:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:38:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:39:45 (1600): Erreur wu_terminee (wu en cours d'upload) 3 00:39:45 (1600): Erreur wu_terminee (wu deja reportee) 00:39:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 00:40:45 (1600): Erreur wu_terminee (wu deja reportee) 00:40:45 (1600): Erreur wu_terminee (wu deja reportee) 00:40:45 (1600): Erreur wu_terminee (wu deja reportee) 00:47:45 (1600): Erreur wu_terminee (wu deja reportee) 00:52:45 (1600): Erreur wu_terminee (wu deja reportee) 00:53:45 (1600): Erreur wu_terminee (wu deja reportee) 00:53:45 (1600): Erreur wu_terminee (wu deja reportee) 00:54:45 (1600): Erreur wu_terminee (wu deja reportee) 01:08:45 (1600): Erreur wu_terminee (wu deja reportee) 01:11:04 (1600): Erreur wu_terminee (wu en cours d'upload) 4 01:23:45 (1600): Erreur wu_terminee (wu deja reportee) 01:25:45 (1600): Erreur wu_terminee (wu deja reportee) 01:37:45 (1600): Erreur wu_terminee (wu deja reportee) 01:38:45 (1600): Erreur wu_terminee (wu deja reportee) 01:44:45 (1600): Erreur wu_terminee (wu deja reportee) 01:53:45 (1600): Erreur wu_terminee (wu deja reportee) 01:55:22 (1600): Erreur wu_terminee (wu en cours d'upload) 4 01:55:45 (1600): Erreur wu_terminee (wu deja reportee) 02:08:45 (1600): Erreur wu_terminee (wu deja reportee) 02:10:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 02:11:45 (1600): Erreur wu_terminee (wu deja reportee) 02:20:45 (1600): Erreur wu_terminee (wu en cours d'upload) 4 02:21:45 (1600): Erreur wu_terminee (wu deja reportee) 02:22:45 (1600): Erreur wu_terminee (wu deja reportee) 02:33:45 (1600): Erreur wu_terminee (wu deja reportee) 02:38:45 (1600): Erreur wu_terminee (wu deja reportee) ERROR: Invalid parameter detected in function (null). File: (null) Line: 0 ERROR: Expression: (null) Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x75363219 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.12.34 Dump Timestamp : 07/28/13 09:56:20 Install Directory : Data Directory : C:\ProgramData\BOINC Project Symstore : LoadLibraryA( C:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:\ProgramData\BOINC\version.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:\ProgramData\BOINC\slots\1;C:\ProgramData\BOINC\projects\wuprop.boinc-af.org ModLoad: 0000000000370000 0000000000086000 C:\ProgramData\BOINC\projects\wuprop.boinc-af.org\data_collect_v3_3.52_windows_intelx86__nci.exe (-nosymbols- Symbols Loaded) Linked PDB Filename : C:\Documents and Settings\SEBASTIEN\Bureau\data_collect\Release\data_collect.pdb |
Send message Joined: 7 Sep 10 Posts: 453 Credit: 945,109 RAC: 0 |
...and another - Each time Windows restarts and when I log on I see a WUProp error! http://wuprop.boinc-af.org/result.php?resultid=30875667 =sticky tape= |
Send message Joined: 28 Mar 10 Posts: 2871 Credit: 538,685 RAC: 133 |
...and another - Each time Windows restarts and when I log on I see a WUProp error! When the application crashes, can you send me a copy of file cache to this e-mail: sebastien[at]statseb[dot]fr |
©2024 Sébastien