1)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1539)
Posted 31 Aug 2013 by Tex1954 Post: As of today, I still do not run WUProp anymore and have experienced ZERO LAN/Internet/system crashes... So not running this app cures all my problems, BUT, I see others still have a LOT of problems... Guess I'm done on this... Until I see zero problems... 8-) |
2)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1125)
Posted 31 Mar 2013 by Tex1954 Post: ZERO problems, ZERO glitches, ZERO weird things happening since I stopped running WUProp tasks.. What a breath of fresh air... no more random WiFi dropouts, no more random errors... everything running so perfectly and stable that I can't believe it... I can finally stop monitoring the systems so much... they all work flawlessly now. 8-) |
3)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1124)
Posted 29 Mar 2013 by Tex1954 Post: I'm not running it anymore and will not. If you trace the IP's it connects to when it runs, it goes to the statistics website where I am sure there is some code to talk to it. But even so, even if the net crashes or the ISP goes down or the router breaks, no other projects crash! They just keep on crunching with delayed UL/DL status as WU's are completed. Not so with WuProp... it clobbers BOINC and the LAN for some reason whenever a realtime network glitch happens or something times out. I swear the way it clobbers things so badly that I bet there is some code errors in there, like references to && or *p without external defines or something. Really stinks of memory leak/execution pathway/stack corruption... I won't tolerate that anymore at all... some LONG WU's like The Lattice Project and RNA can take a week or more to complete and wuprop trashes them. Not a happy camper anymore with this project on Linux. 8-( |
4)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1118)
Posted 29 Mar 2013 by Tex1954 Post: I give up... It is still generating errors that clobber other project WU's and seems related to the internet speed. Two routers, stock and with DD-WRT installed and get same problems. There is something going on that may be time related... may a response is required either from the WU or from the website it reports to in a certain time frame or something, but whatever it is, it surely HATES WiFi connections. I have 7 systems connected with WiFi and have no other problems with any other project... just WUProp... I give up now and set NNT on all the systems. Maybe later when I get them rack mounted on a switch it will work better, but for now, I just give up. In a nutshell, this is what I have done: Install/Upgrade 7 systems with 3.5.x kernal Install several flavors of BOINC from 7.0.28 to 7.0.58 on 8 systems Tried every good WiFi channel (1,6,11) plus various bandwidth and power settings Tried 2 different WiFi routers with stock and DD-WRT firmware with each setup Rebooted 7 systems 60+ times at least... by hand with my ONE portable monitor. In all this, it also randomly fails on the 2 windows boxes, but that seems to happen only when the network gets clobbered by another box. WUProp is for sure identified as the nasty WU that clobbers my systems. I won't run it anymore until a positive reason for this behavior is deduced and a positive fix implemented. 8-) |
5)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1114)
Posted 28 Mar 2013 by Tex1954 Post: Update: ONE system had an error with a WuProp task with my continued fiddling, http://wuprop.boinc-af.org/result.php?resultid=25668779 BUT, it didn't lock up the network as before and no other failures... so DD-WRT seems to fix something in the WRT320N router. I'm in the process of upgrading the Asus RT-N66U router with DD-WRT now... |
6)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1113)
Posted 28 Mar 2013 by Tex1954 Post: Okay, new experiment.. The router I've been using is a new ASUS RT-N66U with stock firmware. I went back to my Linksys WRT320N with DD-WRT firmware, restarted all the computer links, restarted all the WuProp stuff, then set about trying to MAKE IT screw up. I've reset the router, reset the modem, changed the Global MTU and changed Bandwidth from 20 to 40 and back to 20 and NOTHING is messing up! Soo, now I'm going to see about putting DD-WRT into the new router and see if that makes a diff.. I'll report back. 8-) |
7)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1112)
Posted 27 Mar 2013 by Tex1954 Post: I had internet issues as well just a while ago... had several WuProp WU's error out and lock up the network again... I thought it was fixed, guess not. NO OTHER PROJECT WORK UNITS LOCK UP OR ERROR OUT!!!!!! WUProp has GOT to fix this... I'm letting what is there run out... but I set NNT on all machines now... Sigh... like 16 compute errors all today... http://wuprop.boinc-af.org/results.php?userid=4388&offset=0&show_names=0&state=5 99% it's the linux boxes that have the problem, but today, one Windows 7 box did it too. I've done everything I can on my end, it's up to the folks here to fix it now. I can't imagine WHY an internet interruption would cause a WU failure... doesn't make sense! |
8)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1108)
Posted 26 Mar 2013 by Tex1954 Post: I consider the problem solved. Everything is working perfectly, the remote desktop is 10 times faster connecting and updating than with anything Ubuntu... All the 7.x.xx clients I have tried work! My post about this experience is HERE. Thanks again to Ste\/e and I suggest everyone having problems with a BOINC targeted Linux install use Linux Mint "Nadia" 64b version! Enjoy! 8-) |
9)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1104)
Posted 26 Mar 2013 by Tex1954 Post: All the Linux boxes are now upgraded and so far not a hint of any problem... Sooo, it would appear the combination of an older kernal and older BOINC client may have caused the problem... Anyways, so far so good... Thanks again! 8-) |
10)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1100)
Posted 25 Mar 2013 by Tex1954 Post: Okay, Ubuntu Broke the server releases in that they won't detect/use my Asus WiFi USB Dongle... so, tried Linux Mint Mate and that was bad... Tried a few others and they had various problems... Another thing is Unbuntu 11.10 and lower are missing a certain library that has to be compiled against the kernal and it's not an easy task... so something like kernal 3.3 or higher is needed.. I loaded Linux Mint 14 "Nadia" Cinnamon because it HAD the proper features (like desktop sharing etc.) that were broken in the "MATE" version and that works great... late version kernal and all!!! I have various flavors of BOINC 7.0.28,56,58 running on 4 boxes now under Mint... We will see how things go... if no problems, I'll upgrade the other 4 boxes... Thanks for the help! So far so good! 9-) |
11)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1099)
Posted 24 Mar 2013 by Tex1954 Post: Don't know if this will work but try running this in your terminal >>> sudo apt-get install openssh-server+ gdebi+ libwxgtk2.8-0+ libXss1+ freeglut3+ gnome-applets+ cpufrequtils+ ia32-libs+ STE\/E [BADger] You are a lifesaver!!! BLESS YOU!!!! I've tried for a YEAR to make BOINC work on something other than ubuntu because I wanted to run undated versions for OpenCL support and other obvious reasons. NOBODY could help me with the library problem thing and weeks of Google searching turned up nothing. For the first time ever, I've got it working on something besides Ubuntu Desktop!!! Fifty cheers and 75 virgins to you!!!! GAWD I can't believe it! IT WORKS!!! I'm playing now with a couple things, the server version may be my easiest method... but I want an updated kernal too, so maybe Fedora or ARCH or something... in any case, phase one is use what works which is Ubuntu server 12.04 with gnome GUI... THANK YOU A MILLION TIMES!!! 8-) PS: And this may solve wuprop problems!!! |
12)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1097)
Posted 23 Mar 2013 by Tex1954 Post: Don't know if this will work but try running this in your terminal >>> sudo apt-get install openssh-server+ gdebi+ libwxgtk2.8-0+ libXss1+ freeglut3+ gnome-applets+ cpufrequtils+ ia32-libs+ I'll try that in a bit... Thanks! I am trying Ubuntu Server 12.04 now. I got it running in a VM with BOINC sorta running. Problem is, graphic windows are messed up with the Gnome desktop I installed. Possibly it's vnc4server conflicting with desktop sharing??? Don't know yet... BUT, I like 12.04 server!! No crap, basic, works... I'll let you know. Next is to uninstall the 7.0.27 and try the 7.0.28! 8-) |
13)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1095)
Posted 23 Mar 2013 by Tex1954 Post: Not certain if this the first, but this is how BOINC crashes restarting tasks.. Notice the network is fine until the WUProp WU completes (it actually crashed with computational error). After the crash, the network is killed... then BOINC goes crazy restarting tasks... Linux-F13 6296 OPTIMA@HOME 3/23/2013 1:23:03 PM Sending scheduler request: To fetch work. 6297 OPTIMA@HOME 3/23/2013 1:23:03 PM Requesting new tasks for CPU 6298 OPTIMA@HOME 3/23/2013 1:23:05 PM Scheduler request completed: got 0 new tasks 6299 OPTIMA@HOME 3/23/2013 1:23:05 PM No work sent 6300 OPTIMA@HOME 3/23/2013 1:23:05 PM (reached limit of 40 tasks) 6301 OPTIMA@HOME 3/23/2013 1:32:18 PM Sending scheduler request: To fetch work. 6302 OPTIMA@HOME 3/23/2013 1:32:18 PM Requesting new tasks for CPU 6303 WUProp@Home 3/23/2013 1:33:40 PM Computation for task wu_v3_1363211664_405728_0 finished 6304 OPTIMA@HOME 3/23/2013 1:33:40 PM Scheduler request failed: Couldn't resolve host name 6305 Einstein@Home 3/23/2013 1:33:41 PM Task p2030.20121223.G202.81-01.04.C.b5s0g0.00000_3128_1 exited with zero status but no 'finished' file 6306 Einstein@Home 3/23/2013 1:33:41 PM If this happens repeatedly you may need to reset the project. 6307 Einstein@Home 3/23/2013 1:33:41 PM Restarting task p2030.20121223.G202.81-01.04.C.b5s0g0.00000_3128_1 using einsteinbinary_BRP4 version 133 6308 Einstein@Home 3/23/2013 1:33:42 PM Task p2030.20121223.G202.81-01.04.C.b4s0g0.00000_3152_1 exited with zero status but no 'finished' file 6309 Einstein@Home 3/23/2013 1:33:42 PM If this happens repeatedly you may need to reset the project. 6310 WUProp@Home 3/23/2013 1:33:42 PM Started upload of wu_v3_1363211664_405728_0_0 6311 Einstein@Home 3/23/2013 1:33:42 PM Restarting task p2030.20121223.G202.81-01.04.C.b4s0g0.00000_3152_1 using einsteinbinary_BRP4 version 133 6312 OPTIMA@HOME 3/23/2013 1:35:03 PM Task smallexp_s1_ss3_120_2_n58973_0 exited with zero status but no 'finished' file 6313 OPTIMA@HOME 3/23/2013 1:35:03 PM If this happens repeatedly you may need to reset the project. 6314 WUProp@Home 3/23/2013 1:35:03 PM Temporarily failed upload of wu_v3_1363211664_405728_0_0: can't resolve hostname 6315 WUProp@Home 3/23/2013 1:35:03 PM Backing off 14 min 23 sec on upload of wu_v3_1363211664_405728_0_0 6316 OPTIMA@HOME 3/23/2013 1:35:03 PM Restarting task smallexp_s1_ss3_120_2_n58973_0 using smallexp version 103 |
14)
Message boards :
Number crunching :
Computation errors
(Message 1094)
Posted 23 Mar 2013 by Tex1954 Post: I've been trying to find a stable later version Linux to substitute for Ubuntu that works... no joy so far. 100% of the time, I try to install/run BOINC in the /Home directory and every time I run it I get some sort of library error... This morning I tried Fubuntu (which I sorta like though it's bloated), added the wxwidgets libs and such and still get more missing library errors.. I've tried for weeks to get ONE 7.0.28 BOINC Linux version running on some flavor of Linux (25 of them?), following tons of web instructions, error fixes etc. with ZERO luck. The ONLY thing I've ever been able to run is Ubuntu 11.10... And people wonder why folks don't LOVE Linux.. Sheesh... 8-) |
15)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1093)
Posted 23 Mar 2013 by Tex1954 Post: Had it happen in front of me today... first one system, then slowly it started to kill the other systems one by one. Seems there was a project update request that started the trouble, but can't be certain... scenario looks like this... I'm not sure at this point which machine errored first, I think it was the first set... right after a server update request, it jams up... meanwhile, all other projects working fine until the LAN locks up... 6143 WUProp@Home 3/23/2013 1:05:44 PM Sending scheduler request: Requested by project. 6144 WUProp@Home 3/23/2013 1:05:44 PM Not reporting or requesting tasks 6145 WUProp@Home 3/23/2013 1:05:46 PM Scheduler request completed 6303 WUProp@Home 3/23/2013 1:33:40 PM Computation for task wu_v3_1363211664_405728_0 finished 6310 WUProp@Home 3/23/2013 1:33:42 PM Started upload of wu_v3_1363211664_405728_0_0 6314 WUProp@Home 3/23/2013 1:35:03 PM Temporarily failed upload of wu_v3_1363211664_405728_0_0: can't resolve hostname 6315 WUProp@Home 3/23/2013 1:35:03 PM Backing off 14 min 23 sec on upload of wu_v3_1363211664_405728_0_0 3975 WUProp@Home 3/23/2013 1:28:24 PM Computation for task wu_v3_1363211664_405860_0 finished 3985 WUProp@Home 3/23/2013 1:29:48 PM Temporarily failed upload of wu_v3_1363211664_405860_0_0: can't resolve hostname 3986 WUProp@Home 3/23/2013 1:29:48 PM Backing off 13 min 57 sec on upload of wu_v3_1363211664_405860_0_0 3993 WUProp@Home 3/23/2013 1:31:09 PM Scheduler request failed: Couldn't resolve host name 4012 3/23/2013 1:31:39 PM Project communication failed: attempting access to reference site 4015 3/23/2013 1:33:01 PM BOINC can't access Internet - check network connection or proxy configuration. Notice it was supposed to backoff 13 minutes in second machine but did not! It's like the App itself is forcing communications or something.. then it locks up the LAN somehow... Look how they all get computation errors at the same time! WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405860_0 02:15:55 (00:00:03) 3/23/2013 1:40:38 PM 3/23/2013 1:41:22 PM 0.04 Reported: Computation error (11,) Linux-2600K WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405850_0 02:20:24 (00:00:06) 3/23/2013 1:40:38 PM 3/23/2013 1:47:54 PM 0.07 Reported: Computation error (11,) Linux-F1 WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405825_0 02:20:55 (00:00:00) 3/23/2013 1:40:38 PM 3/23/2013 1:41:22 PM 0.00 Reported: Computation error (11,) Linux-F12 WUProp@Home 3.42 Data collect version 3 (nci) wu_v3_1363211664_405728_0 02:26:36 (00:00:06) 3/23/2013 1:40:38 PM 3/23/2013 1:42:12 PM 0.07 Reported: Computation error (11,) Linux-F13 I'll keep checking, and I don't think it's my Kernal now since others run kernals 2.6 up to 3.4... 8-) |
16)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1091)
Posted 23 Mar 2013 by Tex1954 Post: I tried several things, was able to get the kernal udated a couple times, but in both instances, it ended up breaking the Nvidia driver with no cure working. I tried Fedora 18 and couldn't get any Boinc running. I tried Arch and it's a major pain in a VM, so gave up on that. I've tried 25 flavors of Linux 64b and can't find a single one that will work properly... Soo, I suppose I'm stuck for now... But I did like the new kernals! They fix a lot of bugs, especially with regards to LM-Senors operation and such... Thing is, I am only TESTING GPU's on Linux for now... so I could update the kernal later maybe... if it will drive the onboard video properly... Anyways, Linux burns me out... I would PAY someone to make me a custom version in the future if I can't get what I want going properly... 8-) |
17)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1089)
Posted 21 Mar 2013 by Tex1954 Post: Possibly that may work. The problem is, I'm a total noob to Linux and if it don't install by itself, I have no clue what to do. To date, I've tried about 25 other different distros and the only one that works on all my boxes is ubuntu... much as I hate it. I tried Fedora 18 in a virtual box and tried to get their 7.x.x version of boinc client going and no luck their either... but it was close!!! Arch is one I been holding back on because it's all manual... but may try that. The only common thing on the Linux boxes is the version of Linux and boinc client... so maybe something their.. 8-) |
18)
Message boards :
Number crunching :
Computation errors
(Message 1088)
Posted 21 Mar 2013 by Tex1954 Post: Well, I also bought a new router recently to test that, an Asus RT-N66U and it hasn't helped either. My friend put DD-WRT on my other Linksys router and that didn't help. Also, I have 4 boxes on wired LAN and the rest on WiFi and it makes no difference where they are, all Linux boxes mess up. Someone suggested I try to upgrade the Kernal in my boxes since they all run Ubuntu 11.10... I may try that and try to upgrade the version of BOINC client too.. 8-) |
19)
Message boards :
Number crunching :
Problem with Linux locking up machines???
(Message 1083)
Posted 21 Mar 2013 by Tex1954 Post: Early this morning, all eight of my Linux boxes crashed taking out several tasks in progress and locking up the network. It is definitely something that has to do with internet interruptions. The cable service I have goes out once in a while late at night for maintenance, or else just dies... When the internet connection is lost, WUProp goes nuts, locks up BOINC and then I observed tasks starting and stopping... Nothing else could run until the net came back up, then every box had a WUProp computation error and also many crashed WU's from projects. It didn't matter if the box was on the LAN or wireless and only happened on Linux systems, not the Windows 7 boxes... This is very annoying... has been happening less often with the last update, but early this morning it really clobbered everything badly. I thought his was fixed... guess not.. |
20)
Message boards :
Number crunching :
Computation errors
(Message 1082)
Posted 21 Mar 2013 by Tex1954 Post: Early this morning, all eight of my Linux boxes crashed taking out several tasks in progress and locking up the network. It is definitely something that has to do with internet interruptions. The cable service I have goes out once in a while late at night for maintenance, or else just dies... When the internet connection is lost, WUProp goes nuts, locks up BOINC and then I observed tasks starting and stopping... Nothing else could run until the net came back up, then every box had a WUProp computation error and also many crashed WU's from projects. It didn't matter if the box was on the LAN or wireless and only happened on Linux systems, not the Windows 7 boxes... This is very annoying... has been happening less often with the last update, but early this morning it really clobbered everything badly. |
©2025 Sébastien