Server availability at around 0000 UTC

Message boards : Number crunching : Server availability at around 0000 UTC
Message board moderation

To post messages, you must log in.

AuthorMessage
Thyme Lawn
 

Send message
Joined: 11 Apr 10
Posts: 54
Credit: 382,341
RAC: 0
Message 1271 - Posted: 1 Jun 2013, 12:53:30 UTC

I've just spotted that the upload server and scheduler seem to be inaccessible for a period at around 0000 UTC every night (times in the messages below are UTC +0100), and when they become accessible again there's a period when the database can't be opened.

The significant events in the following message sequence (times translated to UTC) are:

  • Task completed at 23:50:30.
  • Upload fails at 23:51:01.
  • Scheduler request fails at 23:51:03.
  • Project servers are still inaccessible at 00:12:20 the last scheduler request failure).
  • Scheduler request succeeds with a "Server can't open database" response at 00:25:29. The project forces a 1 hour project backoff at this point, but the BOINC backoff must have been longer.
  • Upload succeeds at 00:27:50.
  • Task reported and a new one allocated at 02:16:33 (i.e. 2:26:33 after the previous task was completed). I've no idea why this was done so long after 01:25:29.


31/05/2013 00:50:36	WUProp@Home	Computation for task wu_v3_1368949308_505910_0 finished
31/05/2013 00:50:39	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:50:40	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 00:50:40	WUProp@Home	Requesting new tasks
31/05/2013 00:51:01		Project communication failed: attempting access to reference site
31/05/2013 00:51:01	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 00:51:01	WUProp@Home	Backing off 1 min 0 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:51:03		Internet access OK - project servers may be temporarily down.
31/05/2013 00:51:03	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 00:52:02	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:52:03	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 00:52:03	WUProp@Home	Requesting new tasks
31/05/2013 00:52:24		Project communication failed: attempting access to reference site
31/05/2013 00:52:24	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 00:52:24	WUProp@Home	Backing off 1 min 0 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:52:26		Internet access OK - project servers may be temporarily down.
31/05/2013 00:52:26	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 00:53:25	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:53:27	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 00:53:27	WUProp@Home	Requesting new tasks
31/05/2013 00:53:49		Project communication failed: attempting access to reference site
31/05/2013 00:53:49	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 00:53:50		Internet access OK - project servers may be temporarily down.
31/05/2013 00:54:49	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 00:54:49	WUProp@Home	Requesting new tasks
31/05/2013 00:55:10		Project communication failed: attempting access to reference site
31/05/2013 00:55:10	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 00:55:12		Internet access OK - project servers may be temporarily down.
31/05/2013 00:56:11	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 00:56:11	WUProp@Home	Requesting new tasks
31/05/2013 00:57:21		Project communication failed: attempting access to reference site
31/05/2013 00:57:21	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 00:57:22		Internet access OK - project servers may be temporarily down.
31/05/2013 00:58:28		Project communication failed: attempting access to reference site
31/05/2013 00:58:28	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 00:58:28	WUProp@Home	Backing off 1 min 0 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:58:29		Internet access OK - project servers may be temporarily down.
31/05/2013 00:59:29	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 00:59:37	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 00:59:37	WUProp@Home	Requesting new tasks
31/05/2013 00:59:59		Project communication failed: attempting access to reference site
31/05/2013 00:59:59	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 01:00:01		Internet access OK - project servers may be temporarily down.
31/05/2013 01:00:39		Project communication failed: attempting access to reference site
31/05/2013 01:00:39	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 01:00:39	WUProp@Home	Backing off 1 min 0 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:00:40		Internet access OK - project servers may be temporarily down.
31/05/2013 01:01:39	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:02:02		Project communication failed: attempting access to reference site
31/05/2013 01:02:02	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 01:02:02	WUProp@Home	Backing off 1 min 41 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:02:03		Internet access OK - project servers may be temporarily down.
31/05/2013 01:03:51	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:04:07	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 01:04:07	WUProp@Home	Requesting new tasks
31/05/2013 01:04:14		Project communication failed: attempting access to reference site
31/05/2013 01:04:14	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 01:04:14	WUProp@Home	Backing off 4 min 54 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:04:15		Internet access OK - project servers may be temporarily down.
31/05/2013 01:09:09	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:09:16		Project communication failed: attempting access to reference site
31/05/2013 01:09:16	WUProp@Home	Scheduler request failed: Timeout was reached
31/05/2013 01:09:17		Internet access OK - project servers may be temporarily down.
31/05/2013 01:09:31		Project communication failed: attempting access to reference site
31/05/2013 01:09:31	WUProp@Home	Temporarily failed upload of wu_v3_1368949308_505910_0_0: HTTP error
31/05/2013 01:09:31	WUProp@Home	Backing off 6 min 59 sec on upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:09:33		Internet access OK - project servers may be temporarily down.
31/05/2013 01:11:57	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 01:11:57	WUProp@Home	Requesting new tasks
31/05/2013 01:12:19		Project communication failed: attempting access to reference site
31/05/2013 01:12:19	WUProp@Home	Scheduler request failed: Failure when receiving data from the peer
31/05/2013 01:12:20		Internet access OK - project servers may be temporarily down.
31/05/2013 01:25:28	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 01:25:28	WUProp@Home	Requesting new tasks
31/05/2013 01:25:29	WUProp@Home	Scheduler request completed: got 0 new tasks
31/05/2013 01:25:29	WUProp@Home	Message from server: Server can't open database
31/05/2013 01:27:44	WUProp@Home	Started upload of wu_v3_1368949308_505910_0_0
31/05/2013 01:27:50	WUProp@Home	Finished upload of wu_v3_1368949308_505910_0_0
31/05/2013 03:16:31	WUProp@Home	Sending scheduler request: Requested by project.
31/05/2013 03:16:31	WUProp@Home	Reporting 1 completed tasks, requesting new tasks
31/05/2013 03:16:33	WUProp@Home	Scheduler request completed: got 1 new tasks

"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 1271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thyme Lawn
 

Send message
Joined: 11 Apr 10
Posts: 54
Credit: 382,341
RAC: 0
Message 1272 - Posted: 1 Jun 2013, 13:03:09 UTC - in response to Message 1271.  

Checking back through the event log I've also spotted an instance where the project initiated scheduler request at 66.667% was made during the server outage.

The significant events in the following message sequence (times translated to UTC) are:

  • Project initiated scheduler request at 66.667% fails at 23:59:48.
  • The scheduler was still inaccessible at 00:12:31 (last request failure).
  • Scheduler request succeeds with a "Server can't open database" response at 00:24:10.
  • Task completed at 01:01:11.
  • Upload completed at 01:01:18.
  • Task reported and a new one allocated at 01:24:18, 23 minutes after the task was ready to report.


30-May-2013 00:59:26 [WUProp@Home] Sending scheduler request: Requested by project.
30-May-2013 00:59:26 [WUProp@Home] Not reporting or requesting tasks
30-May-2013 00:59:48 [WUProp@Home] Scheduler request failed: Couldn't connect to server
30-May-2013 00:59:50 [---] Project communication failed: attempting access to reference site
30-May-2013 00:59:52 [---] Internet access OK - project servers may be temporarily down.
30-May-2013 01:00:58 [WUProp@Home] Sending scheduler request: Requested by project.
30-May-2013 01:00:58 [WUProp@Home] Not reporting or requesting tasks
30-May-2013 01:01:20 [WUProp@Home] Scheduler request failed: Couldn't connect to server
30-May-2013 01:01:21 [---] Project communication failed: attempting access to reference site
30-May-2013 01:01:22 [---] Internet access OK - project servers may be temporarily down.
30-May-2013 01:05:10 [WUProp@Home] Sending scheduler request: Requested by project.
30-May-2013 01:05:10 [WUProp@Home] Not reporting or requesting tasks
30-May-2013 01:05:32 [WUProp@Home] Scheduler request failed: Couldn't connect to server
30-May-2013 01:05:33 [---] Project communication failed: attempting access to reference site
30-May-2013 01:05:34 [---] Internet access OK - project servers may be temporarily down.
30-May-2013 01:12:04 [WUProp@Home] Sending scheduler request: Requested by project.
30-May-2013 01:12:04 [WUProp@Home] Not reporting or requesting tasks
30-May-2013 01:12:29 [WUProp@Home] Scheduler request failed: Failure when receiving data from the peer
30-May-2013 01:12:30 [---] Project communication failed: attempting access to reference site
30-May-2013 01:12:31 [---] Internet access OK - project servers may be temporarily down.
30-May-2013 01:23:57 [WUProp@Home] Sending scheduler request: Requested by project.
30-May-2013 01:23:57 [WUProp@Home] Not reporting or requesting tasks
30-May-2013 01:24:10 [WUProp@Home] Scheduler request completed
30-May-2013 01:24:10 [WUProp@Home] Server can't open database
30-May-2013 02:01:11 [WUProp@Home] Computation for task wu_v3_1368949308_463338_0 finished
30-May-2013 02:01:13 [WUProp@Home] Started upload of wu_v3_1368949308_463338_0_0
30-May-2013 02:01:18 [WUProp@Home] Finished upload of wu_v3_1368949308_463338_0_0
30-May-2013 02:24:14 [WUProp@Home] Sending scheduler request: Requested by project.
30-May-2013 02:24:14 [WUProp@Home] Reporting 1 completed tasks, requesting new tasks for CPU
30-May-2013 02:24:18 [WUProp@Home] Scheduler request completed: got 1 new tasks

"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 1272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
       
Avatar

Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Message 1510 - Posted: 8 Aug 2013, 20:47:08 UTC

OT but too odd for a new thread.
After removing and reconnecting an Ethernet cable I got a WUProp backoff time of over 4h.
I had to go to projects and Update to report the WU; just clicking the Transfers Tab and Retry Now didn't work (Ubuntu rig).
ID: 1510 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Server availability at around 0000 UTC

©2024 Sébastien