Message boards :
Number crunching :
Problem with Linux locking up machines???
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I have a problem running it on a Linux box in that it generates compute errors and LOCKS UP my router clobbering all my boxes on it. I've had this problem for months, blaming Linux and discovered it maybe doesn't seem to be a generic LINUX problem after all. Since I took WUProp off the Linux machines, I've had no more problems. Something buggy with Linux version.?.?.? since it is related to uncontrolled polling of the LAN and seems to affect all machines at once and seems to lock them up generating a Compute Error at the same time. (seen after machines reset) This is very more repeatable running a SINGLE PROJECT with long WU's on one multi-core CPU. In other words, it seems to happen less often when running several different projects with shorter/longer tasks mixed in. Doesn't matter if one Linux box is running or six... it will lock up my Linksys router, generate a compute error and clobbers my LAN requiring me to reset all the machines. It does this with 2 different routers... I have no idea if it is something inside Linux that WUProp causes to happen or visa versa... Who knows, hard to tell when the boxes and LAN are hung up and no way to talk/access their current state without a reset... but after months, removing WUProp cured the problem... HELP!!! 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I was reading other posts looking for similar problems and wonder if the long run bug or whatever could also be a factor... In all other WU's I run, if a WU's finishes, it stops running and enters the upload step. If it can't upload or the network is down, what happens in Linux? As I stated before, when this happens, it clobbers the entire lan and locks up every machine AND I run this on every box... well, not the linux boxes as of a few days ago. So far, everything fine. But, I can verify that in the past, I dedicated ALL my Linux boxes to ONE project and this lockup problem ALWAYS happened... (Running Optima@home at the time) I stopped running that project AND Linux on those boxes for a while because of this highly repeatable problem and only recently with the new kernal updates tried Linux again.. I'm running only Asteroids@home now on Linux 64b boxes and it works perfectly for days and days so long as WUProp isn't running! Buggers up if I let WUProp run on those Linux boxes... but NEVER had a problem on Windows 7 64b machines and still don't. 8-) |
Send message Joined: 28 Mar 10 Posts: 2875 Credit: 539,231 RAC: 136 |
Could you test this application? For testing the application:
|
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Could you test this application? I would be happy to test it... but not sure how??? I see it includes an app_info file as well... Can you give this Linux novice some detailed help on testing it? Thanks! 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Umm, I got it figured out... just installed it in the /var/lib/boinc-client/projects/wuprop.boinc-af.org directory... I'm testing it now... It's on a machine as before, runs one project on all 6 cores and nothing else.. I must admit, took me a bit to chmod the permissions to get access... but no problemo after that. I'll let you know if I experience more probs... 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Normally by now, I would have had some sort of error, but nothing has happened yet. I installed the files on a second Linux box also running a single project on 6 cores... so far so good. If it runs well the next couple of days, I would call it good. I can Remote desktop into the computers fine, do LAN I/O fine, all running fine so far. Crossing fingers! 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I don't know what you did, but I haven't had a single problem at all so far and I am positive I would have... In this instance, maybe it is safe to call it good???? Working perfectly, no hangups of weird LAN polling or anything... THANKS! 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Early this morning, all eight of my Linux boxes crashed taking out several tasks in progress and locking up the network. It is definitely something that has to do with internet interruptions. The cable service I have goes out once in a while late at night for maintenance, or else just dies... When the internet connection is lost, WUProp goes nuts, locks up BOINC and then I observed tasks starting and stopping... Nothing else could run until the net came back up, then every box had a WUProp computation error and also many crashed WU's from projects. It didn't matter if the box was on the LAN or wireless and only happened on Linux systems, not the Windows 7 boxes... This is very annoying... has been happening less often with the last update, but early this morning it really clobbered everything badly. I thought his was fixed... guess not.. |
Send message Joined: 28 Mar 10 Posts: 588 Credit: 1,221,647 RAC: 237 |
G'Day Tex, Just wondering if updating your kernel to a later version may be of assistance to you working out your problem? I run Linux (Fedora 16 64 bit) with a kernel of 3.6, whereas yours is version 3.0. Just a thought as I am not having the issues you are having, I use a Netgear router which seems to work fine with both my Linux and Windows computers. Conan |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Possibly that may work. The problem is, I'm a total noob to Linux and if it don't install by itself, I have no clue what to do. To date, I've tried about 25 other different distros and the only one that works on all my boxes is ubuntu... much as I hate it. I tried Fedora 18 in a virtual box and tried to get their 7.x.x version of boinc client going and no luck their either... but it was close!!! Arch is one I been holding back on because it's all manual... but may try that. The only common thing on the Linux boxes is the version of Linux and boinc client... so maybe something their.. 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I tried several things, was able to get the kernal udated a couple times, but in both instances, it ended up breaking the Nvidia driver with no cure working. I tried Fedora 18 and couldn't get any Boinc running. I tried Arch and it's a major pain in a VM, so gave up on that. I've tried 25 flavors of Linux 64b and can't find a single one that will work properly... Soo, I suppose I'm stuck for now... But I did like the new kernals! They fix a lot of bugs, especially with regards to LM-Senors operation and such... Thing is, I am only TESTING GPU's on Linux for now... so I could update the kernal later maybe... if it will drive the onboard video properly... Anyways, Linux burns me out... I would PAY someone to make me a custom version in the future if I can't get what I want going properly... 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Had it happen in front of me today... first one system, then slowly it started to kill the other systems one by one. Seems there was a project update request that started the trouble, but can't be certain... scenario looks like this... I'm not sure at this point which machine errored first, I think it was the first set... right after a server update request, it jams up... meanwhile, all other projects working fine until the LAN locks up... 6143 WUProp@Home 3/23/2013 1:05:44 PM Sending scheduler request: Requested by project. 6144 WUProp@Home 3/23/2013 1:05:44 PM Not reporting or requesting tasks 6145 WUProp@Home 3/23/2013 1:05:46 PM Scheduler request completed 6303 WUProp@Home 3/23/2013 1:33:40 PM Computation for task wu_v3_1363211664_405728_0 finished 6310 WUProp@Home 3/23/2013 1:33:42 PM Started upload of wu_v3_1363211664_405728_0_0 6314 WUProp@Home 3/23/2013 1:35:03 PM Temporarily failed upload of wu_v3_1363211664_405728_0_0: can't resolve hostname 6315 WUProp@Home 3/23/2013 1:35:03 PM Backing off 14 min 23 sec on upload of wu_v3_1363211664_405728_0_0 3975 WUProp@Home 3/23/2013 1:28:24 PM Computation for task wu_v3_1363211664_405860_0 finished 3985 WUProp@Home 3/23/2013 1:29:48 PM Temporarily failed upload of wu_v3_1363211664_405860_0_0: can't resolve hostname 3986 WUProp@Home 3/23/2013 1:29:48 PM Backing off 13 min 57 sec on upload of wu_v3_1363211664_405860_0_0 3993 WUProp@Home 3/23/2013 1:31:09 PM Scheduler request failed: Couldn't resolve host name 4012 3/23/2013 1:31:39 PM Project communication failed: attempting access to reference site 4015 3/23/2013 1:33:01 PM BOINC can't access Internet - check network connection or proxy configuration. Notice it was supposed to backoff 13 minutes in second machine but did not! It's like the App itself is forcing communications or something.. then it locks up the LAN somehow... Look how they all get computation errors at the same time! WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405860_0 02:15:55 (00:00:03) 3/23/2013 1:40:38 PM 3/23/2013 1:41:22 PM 0.04 Reported: Computation error (11,) Linux-2600K WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405850_0 02:20:24 (00:00:06) 3/23/2013 1:40:38 PM 3/23/2013 1:47:54 PM 0.07 Reported: Computation error (11,) Linux-F1 WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405825_0 02:20:55 (00:00:00) 3/23/2013 1:40:38 PM 3/23/2013 1:41:22 PM 0.00 Reported: Computation error (11,) Linux-F12 WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405728_0 02:26:36 (00:00:06) 3/23/2013 1:40:38 PM 3/23/2013 1:42:12 PM 0.07 Reported: Computation error (11,) Linux-F13 I'll keep checking, and I don't think it's my Kernal now since others run kernals 2.6 up to 3.4... 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Not certain if this the first, but this is how BOINC crashes restarting tasks.. Notice the network is fine until the WUProp WU completes (it actually crashed with computational error). After the crash, the network is killed... then BOINC goes crazy restarting tasks... Linux-F13 6296 OPTIMA@HOME 3/23/2013 1:23:03 PM Sending scheduler request: To fetch work. 6297 OPTIMA@HOME 3/23/2013 1:23:03 PM Requesting new tasks for CPU 6298 OPTIMA@HOME 3/23/2013 1:23:05 PM Scheduler request completed: got 0 new tasks 6299 OPTIMA@HOME 3/23/2013 1:23:05 PM No work sent 6300 OPTIMA@HOME 3/23/2013 1:23:05 PM (reached limit of 40 tasks) 6301 OPTIMA@HOME 3/23/2013 1:32:18 PM Sending scheduler request: To fetch work. 6302 OPTIMA@HOME 3/23/2013 1:32:18 PM Requesting new tasks for CPU 6303 WUProp@Home 3/23/2013 1:33:40 PM Computation for task wu_v3_1363211664_405728_0 finished 6304 OPTIMA@HOME 3/23/2013 1:33:40 PM Scheduler request failed: Couldn't resolve host name 6305 Einstein@Home 3/23/2013 1:33:41 PM Task p2030.20121223.G202.81-01.04.C.b5s0g0.00000_3128_1 exited with zero status but no 'finished' file 6306 Einstein@Home 3/23/2013 1:33:41 PM If this happens repeatedly you may need to reset the project. 6307 Einstein@Home 3/23/2013 1:33:41 PM Restarting task p2030.20121223.G202.81-01.04.C.b5s0g0.00000_3128_1 using einsteinbinary_BRP4 version 133 6308 Einstein@Home 3/23/2013 1:33:42 PM Task p2030.20121223.G202.81-01.04.C.b4s0g0.00000_3152_1 exited with zero status but no 'finished' file 6309 Einstein@Home 3/23/2013 1:33:42 PM If this happens repeatedly you may need to reset the project. 6310 WUProp@Home 3/23/2013 1:33:42 PM Started upload of wu_v3_1363211664_405728_0_0 6311 Einstein@Home 3/23/2013 1:33:42 PM Restarting task p2030.20121223.G202.81-01.04.C.b4s0g0.00000_3152_1 using einsteinbinary_BRP4 version 133 6312 OPTIMA@HOME 3/23/2013 1:35:03 PM Task smallexp_s1_ss3_120_2_n58973_0 exited with zero status but no 'finished' file 6313 OPTIMA@HOME 3/23/2013 1:35:03 PM If this happens repeatedly you may need to reset the project. 6314 WUProp@Home 3/23/2013 1:35:03 PM Temporarily failed upload of wu_v3_1363211664_405728_0_0: can't resolve hostname 6315 WUProp@Home 3/23/2013 1:35:03 PM Backing off 14 min 23 sec on upload of wu_v3_1363211664_405728_0_0 6316 OPTIMA@HOME 3/23/2013 1:35:03 PM Restarting task smallexp_s1_ss3_120_2_n58973_0 using smallexp version 103 |
Send message Joined: 28 Mar 10 Posts: 672 Credit: 3,991,805 RAC: 698 |
Don't know if this will work but try running this in your terminal >>> sudo apt-get install openssh-server+ gdebi+ libwxgtk2.8-0+ libXss1+ freeglut3+ gnome-applets+ cpufrequtils+ ia32-libs+ It was given to me by someone else & I use it for every install of Linux & BOINC, it will Download & install everything needed to run BOINC properly for4 UBUNTU 12.10 or .04 ... https://signature.statseb.fr/sig-1323.png https://stats.free-dc.org/badgesbanner.php?cpid=13a87c3a303bcdca4ba0ed600daebb6b |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Don't know if this will work but try running this in your terminal >>> sudo apt-get install openssh-server+ gdebi+ libwxgtk2.8-0+ libXss1+ freeglut3+ gnome-applets+ cpufrequtils+ ia32-libs+ I'll try that in a bit... Thanks! I am trying Ubuntu Server 12.04 now. I got it running in a VM with BOINC sorta running. Problem is, graphic windows are messed up with the Gnome desktop I installed. Possibly it's vnc4server conflicting with desktop sharing??? Don't know yet... BUT, I like 12.04 server!! No crap, basic, works... I'll let you know. Next is to uninstall the 7.0.27 and try the 7.0.28! 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Don't know if this will work but try running this in your terminal >>> sudo apt-get install openssh-server+ gdebi+ libwxgtk2.8-0+ libXss1+ freeglut3+ gnome-applets+ cpufrequtils+ ia32-libs+ STE\/E [BADger] You are a lifesaver!!! BLESS YOU!!!! I've tried for a YEAR to make BOINC work on something other than ubuntu because I wanted to run undated versions for OpenCL support and other obvious reasons. NOBODY could help me with the library problem thing and weeks of Google searching turned up nothing. For the first time ever, I've got it working on something besides Ubuntu Desktop!!! Fifty cheers and 75 virgins to you!!!! GAWD I can't believe it! IT WORKS!!! I'm playing now with a couple things, the server version may be my easiest method... but I want an updated kernal too, so maybe Fedora or ARCH or something... in any case, phase one is use what works which is Ubuntu server 12.04 with gnome GUI... THANK YOU A MILLION TIMES!!! 8-) PS: And this may solve wuprop problems!!! |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Okay, Ubuntu Broke the server releases in that they won't detect/use my Asus WiFi USB Dongle... so, tried Linux Mint Mate and that was bad... Tried a few others and they had various problems... Another thing is Unbuntu 11.10 and lower are missing a certain library that has to be compiled against the kernal and it's not an easy task... so something like kernal 3.3 or higher is needed.. I loaded Linux Mint 14 "Nadia" Cinnamon because it HAD the proper features (like desktop sharing etc.) that were broken in the "MATE" version and that works great... late version kernal and all!!! I have various flavors of BOINC 7.0.28,56,58 running on 4 boxes now under Mint... We will see how things go... if no problems, I'll upgrade the other 4 boxes... Thanks for the help! So far so good! 9-) |
Send message Joined: 28 Mar 10 Posts: 672 Credit: 3,991,805 RAC: 698 |
:) ... Glad it worked for you, the line I gave you is a Life Saver for me too, Thanks goes to Zombie for that ... https://signature.statseb.fr/sig-1323.png https://stats.free-dc.org/badgesbanner.php?cpid=13a87c3a303bcdca4ba0ed600daebb6b |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
All the Linux boxes are now upgraded and so far not a hint of any problem... Sooo, it would appear the combination of an older kernal and older BOINC client may have caused the problem... Anyways, so far so good... Thanks again! 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I consider the problem solved. Everything is working perfectly, the remote desktop is 10 times faster connecting and updating than with anything Ubuntu... All the 7.x.xx clients I have tried work! My post about this experience is HERE. Thanks again to Ste\/e and I suggest everyone having problems with a BOINC targeted Linux install use Linux Mint "Nadia" 64b version! Enjoy! 8-) |
©2024 Sébastien