Why has the project been down for multiple consecutive hours in last number of days?

Author	Message
Dr Who Fan Send message Joined: 29 Jul 11 Posts: 370 Credit: 1,357,659 RAC: 274	Message 9869 - Posted: 26 Mar 2023, 5:50:22 UTC This was the 4th time the project was down in less than a week for multiple consecutive hours without any explanation. And now it's back online with all the projects I have worked on in last day reporting negative hours. Did the project have another database problem or something else break? ID: 9869 · Rating: 0 · rate: / Reply Quote

Paul Send message Joined: 9 Feb 14 Posts: 32 Credit: 1,190,478 RAC: 95	Message 9870 - Posted: 26 Mar 2023, 15:28:37 UTC - in response to Message 9869. Last modified: 26 Mar 2023, 16:24:30 UTC My hours are not increasing but no negatives this time. Edit: Now incrementing. Better to return to 6 hour work units to reduce server load. Paul. ID: 9870 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 20 Jun 12 Posts: 163 Credit: 368,988 RAC: 72	Message 9905 - Posted: 16 Apr 2023, 14:15:10 UTC Last modified: 16 Apr 2023, 14:20:17 UTC Seem to have been down for about 7 hours and this time I have negative hours. I agree, better return to the 6 hours WUs (or even better: let us choose like Rosetta), that will not only reduce the server load, we will also get better chance to make it thru the outage without loosing any hours. ID: 9905 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 20 May 10 Posts: 638 Credit: 2,183,269 RAC: 871	Message 9906 - Posted: 16 Apr 2023, 20:09:45 UTC - in response to Message 9905. Seem to have been down for about 7 hours and this time I have negative hours. I agree, better return to the 6 hours WUs (or even better: let us choose like Rosetta), that will not only reduce the server load, we will also get better chance to make it thru the outage without loosing any hours. I like this idea as it might make it alot better for Android devices with the smaller tasks while desktops, Servers and laptop can easily handle the 6 hour tasks with a lot less bandwidth and wear and tear on the wuprop hardware. BUT what I'd like to really know is if there is anything we users can do to help alleviate any future unplanned outages ie help with newer and bigger hard drives, more memory for the Server or pc(s) it runs on, more bandwidth etc etc. ID: 9906 · Rating: 0 · rate: / Reply Quote

Dr Who Fan Send message Joined: 29 Jul 11 Posts: 370 Credit: 1,357,659 RAC: 274	Message 9907 - Posted: 19 Apr 2023, 7:59:33 UTC - in response to Message 9906. Last modified: 19 Apr 2023, 8:02:29 UTC Another 6 to 8 hour outage today. The multi-hour outages appear to come in waves aprox Two days apart in a week. Loosing many hours on apps due to the extended outages. Agree Time to go back to a 6 hour task across the board. * EDIT TO ADD >>> AND THE NEGATIVE HOURS ARE BACK ALSO <<< * ID: 9907 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 20 May 10 Posts: 638 Credit: 2,183,269 RAC: 871	Message 9908 - Posted: 19 Apr 2023, 11:15:20 UTC - in response to Message 9907. Another 6 to 8 hour outage today. The multi-hour outages appear to come in waves aprox Two days apart in a week. Loosing many hours on apps due to the extended outages. Agree Time to go back to a 6 hour task across the board. * EDIT TO ADD >>> AND THE NEGATIVE HOURS ARE BACK ALSO <<< * Another thought I had was to treat the tasks like other Boinc Projects and just send out several at a time, that way if the Server is down we just move to the next task, and then the task after that etc. Boinc uses a first in first out basis for it's tasks, which can then be affected by return times, but instead of banging the Server we would just hold the completed tasks and not lose any time until the Server is ready to take them all back again. ID: 9908 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 20 Jun 12 Posts: 163 Credit: 368,988 RAC: 72	Message 9909 - Posted: 19 Apr 2023, 16:08:01 UTC - in response to Message 9908. Boinc uses a first in first out basis for it's tasks, which can then be affected by return times, but instead of banging the Server we would just hold the completed tasks and not lose any time until the Server is ready to take them all back again. Won't it run them all at once, since they are NCI? Goofyxgrid has also send just one task per app and they were all running concurrently. ID: 9909 · Rating: 0 · rate: / Reply Quote

Dr Who Fan Send message Joined: 29 Jul 11 Posts: 370 Credit: 1,357,659 RAC: 274	Message 9910 - Posted: 19 Apr 2023, 16:27:29 UTC - in response to Message 9909. Boinc uses a first in first out basis for it's tasks, which can then be affected by return times, but instead of banging the Server we would just hold the completed tasks and not lose any time until the Server is ready to take them all back again. Won't it run them all at once, since they are NCI? Goofyxgrid has also send just one task per app and they were all running concurrently. More than likely. I don't think BOINC NCI works that way - all or nothing run. Every NCI project I have attached & completed work on has only sent one task per computer except the old GoofyGrid would send multiples at times and occasionally I get multiple tasks running at once on iThena's main NCI project. ID: 9910 · Rating: 0 · rate: / Reply Quote

Dr Who Fan Send message Joined: 29 Jul 11 Posts: 370 Credit: 1,357,659 RAC: 274	Message 9912 - Posted: 20 Apr 2023, 5:03:42 UTC And we're back from another multi hour outage along with negative hours for the third day now..... ID: 9912 · Rating: 0 · rate: / Reply Quote

Werinbert Send message Joined: 9 May 13 Posts: 100 Credit: 863,235 RAC: 283	Message 9913 - Posted: 20 Apr 2023, 5:36:46 UTC I am happy that we have this project even if it is occasionally intermittent. Complaining each and every time that the server hiccups doesn't do anyone any good. Sebastien needs sleep just like the rest of us, cut him some slack. ID: 9913 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 20 May 10 Posts: 638 Credit: 2,183,269 RAC: 871	Message 9914 - Posted: 20 Apr 2023, 10:24:42 UTC - in response to Message 9913. I am happy that we have this project even if it is occasionally intermittent. Complaining each and every time that the server hiccups doesn't do anyone any good. Sebastien needs sleep just like the rest of us, cut him some slack. I don't think it's the complaining so much as that there's been no explanation of why and if anyone of us can help solve the problem. I know Sebastien got some helpers awhile back when the Project was on the verge of collapsing, does he need more? Does he need some hardware that keeps failing? Is it a software problem? Is it an ISP problem? In short people are trying to figure out if they can help but with no word coming from 'the Team, it's kinda hard. ID: 9914 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 20 Jun 12 Posts: 163 Credit: 368,988 RAC: 72	Message 9915 - Posted: 20 Apr 2023, 14:27:26 UTC - in response to Message 9913. Last modified: 20 Apr 2023, 14:29:22 UTC Complaining each and every time that the server hiccups doesn't do anyone any good. We are not complaining, at least I'm not, just reporting an issue, which he might even not notice otherwise if it's always fixing itself after couple of hours. I don't see anything wrong with reporting bugs or other issues to the admin/developer, that's how admins/devs get to know there's something wrong at all with their servers/software/whatever in most cases, they might not notice the issue from their end without the reports. ID: 9915 · Rating: 0 · rate: / Reply Quote

Steve Dodd Send message Joined: 28 Jan 13 Posts: 41 Credit: 1,609,980 RAC: 468	Message 9916 - Posted: 20 Apr 2023, 16:09:12 UTC - in response to Message 9869. Getting a couple of projects reporting negative hours myself. ID: 9916 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 20 May 10 Posts: 638 Credit: 2,183,269 RAC: 871	Message 9918 - Posted: 20 Apr 2023, 23:18:49 UTC - in response to Message 9916. Getting a couple of projects reporting negative hours myself. I just got a huge update on some of my projects!! ID: 9918 · Rating: 0 · rate: / Reply Quote

Steve Dodd Send message Joined: 28 Jan 13 Posts: 41 Credit: 1,609,980 RAC: 468	Message 9919 - Posted: 20 Apr 2023, 23:25:43 UTC - in response to Message 9918. Same, Mikey ID: 9919 · Rating: 0 · rate: / Reply Quote

Dr Who Fan Send message Joined: 29 Jul 11 Posts: 370 Credit: 1,357,659 RAC: 274	Message 9920 - Posted: 21 Apr 2023, 0:25:35 UTC - in response to Message 9918. Same here. It appears the "extra" hours are counting about 2.5 to 3 calendar days of back reporting of work based on my last 24 hours per device page. ID: 9920 · Rating: 0 · rate: / Reply Quote

WezH Send message Joined: 8 Oct 12 Posts: 37 Credit: 2,162,028 RAC: 657	Message 9922 - Posted: 21 Apr 2023, 12:19:45 UTC Last modified: 21 Apr 2023, 12:20:58 UTC Now it is "Server error: feeder not running" EDIT: and back online again ID: 9922 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 20 Jun 12 Posts: 163 Credit: 368,988 RAC: 72	Message 9940 - Posted: 29 Apr 2023, 11:14:15 UTC - in response to Message 9907. Agree Time to go back to a 6 hour task across the board. If server load because of too many clients connecting the scheduler is the issue, than perhaps increasing <next_rpc_delay> might help a bit at least, currently I see this stupid behavior for each WU: 29/04/2023 12:38:19 \| WUProp@Home \| Sending scheduler request: Requested by project. 29/04/2023 12:38:19 \| WUProp@Home \| Not requesting tasks: non CPU intensive 29/04/2023 12:38:20 \| WUProp@Home \| Scheduler request completed 29/04/2023 12:38:31 \| WUProp@Home \| Computation for task data_collect_v4_1682702101_52217_0 finished 29/04/2023 12:38:33 \| WUProp@Home \| Started upload of data_collect_v4_1682702101_52217_0_0 29/04/2023 12:38:35 \| WUProp@Home \| Finished upload of data_collect_v4_1682702101_52217_0_0 29/04/2023 12:38:35 \| WUProp@Home \| Sending scheduler request: To report completed tasks. 29/04/2023 12:38:35 \| WUProp@Home \| Reporting 1 completed tasks 29/04/2023 12:38:35 \| WUProp@Home \| Requesting new tasks for CPU 29/04/2023 12:38:36 \| WUProp@Home \| Scheduler request completed: got 1 new tasks 29/04/2023 12:38:38 \| WUProp@Home \| Started download of data_collect_v4_1682702101_43298 29/04/2023 12:38:39 \| WUProp@Home \| Finished download of data_collect_v4_1682702101_43298 29/04/2023 12:38:39 \| WUProp@Home \| Starting task data_collect_v4_1682702101_43298_1 The first request is completely unnecessary and without <report_results_immediately/> in app_config.xml it slows down getting a new WU by few seconds. To avoid this while still keeping the function of forced scheduler requests <next_rpc_delay> should be increased from the current 3600 to 3700 seconds. ID: 9940 · Rating: 0 · rate: / Reply Quote

Dr Who Fan Send message Joined: 29 Jul 11 Posts: 370 Credit: 1,357,659 RAC: 274	Message 9941 - Posted: 29 Apr 2023, 16:26:20 UTC I see we had Two "short" outages in about 48 hours.. WuProp database is back up along with the usual NEGATIVE HOURS ID: 9941 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 13 Dec 15 Posts: 182 Credit: 2,357,518 RAC: 192	Message 9952 - Posted: 2 May 2023, 10:32:38 UTC I'm just glad the project is still with us. Is there a Patreon donation link? ID: 9952 · Rating: 0 · rate: / Reply Quote