Message boards :
Number crunching :
Problem with Linux locking up machines???
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 28 Mar 10 Posts: 671 Credit: 3,989,197 RAC: 685 |
lol ... Had one of my Linux Box's lock up several times this morning because of Internet issues with it, lost about 150 Hr's of SLinCA Wu's to Computation Errors when it locked up, I moved the Wireless pickup on the Box & seems okay for now ... https://signature.statseb.fr/sig-1323.png https://stats.free-dc.org/badgesbanner.php?cpid=13a87c3a303bcdca4ba0ed600daebb6b |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I had internet issues as well just a while ago... had several WuProp WU's error out and lock up the network again... I thought it was fixed, guess not. NO OTHER PROJECT WORK UNITS LOCK UP OR ERROR OUT!!!!!! WUProp has GOT to fix this... I'm letting what is there run out... but I set NNT on all machines now... Sigh... like 16 compute errors all today... http://wuprop.boinc-af.org/results.php?userid=4388&offset=0&show_names=0&state=5 99% it's the linux boxes that have the problem, but today, one Windows 7 box did it too. I've done everything I can on my end, it's up to the folks here to fix it now. I can't imagine WHY an internet interruption would cause a WU failure... doesn't make sense! |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Okay, new experiment.. The router I've been using is a new ASUS RT-N66U with stock firmware. I went back to my Linksys WRT320N with DD-WRT firmware, restarted all the computer links, restarted all the WuProp stuff, then set about trying to MAKE IT screw up. I've reset the router, reset the modem, changed the Global MTU and changed Bandwidth from 20 to 40 and back to 20 and NOTHING is messing up! Soo, now I'm going to see about putting DD-WRT into the new router and see if that makes a diff.. I'll report back. 8-) |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
Update: ONE system had an error with a WuProp task with my continued fiddling, http://wuprop.boinc-af.org/result.php?resultid=25668779 BUT, it didn't lock up the network as before and no other failures... so DD-WRT seems to fix something in the WRT320N router. I'm in the process of upgrading the Asus RT-N66U router with DD-WRT now... |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I give up... It is still generating errors that clobber other project WU's and seems related to the internet speed. Two routers, stock and with DD-WRT installed and get same problems. There is something going on that may be time related... may a response is required either from the WU or from the website it reports to in a certain time frame or something, but whatever it is, it surely HATES WiFi connections. I have 7 systems connected with WiFi and have no other problems with any other project... just WUProp... I give up now and set NNT on all the systems. Maybe later when I get them rack mounted on a switch it will work better, but for now, I just give up. In a nutshell, this is what I have done: Install/Upgrade 7 systems with 3.5.x kernal Install several flavors of BOINC from 7.0.28 to 7.0.58 on 8 systems Tried every good WiFi channel (1,6,11) plus various bandwidth and power settings Tried 2 different WiFi routers with stock and DD-WRT firmware with each setup Rebooted 7 systems 60+ times at least... by hand with my ONE portable monitor. In all this, it also randomly fails on the 2 windows boxes, but that seems to happen only when the network gets clobbered by another box. WUProp is for sure identified as the nasty WU that clobbers my systems. I won't run it anymore until a positive reason for this behavior is deduced and a positive fix implemented. 8-) |
Send message Joined: 28 Mar 10 Posts: 671 Credit: 3,989,197 RAC: 685 |
Same here Tex >>> http://wuprop.boinc-af.org/forum_thread.php?id=176 I've lost over 300 Hr's of SLinCA Wu's this morning due to the FUBAR WUProp Wu erring & taking every running Wu with it on my Linux Box's ... It's like I have to run 1000 Hr's of Wu's & hope 100 Hr's of them make it without the WUProp Wu freaking everything up. I'm getting to the point it's not worth running this Project any more with all the lost work it creates on the other Projects ... https://signature.statseb.fr/sig-1323.png https://stats.free-dc.org/badgesbanner.php?cpid=13a87c3a303bcdca4ba0ed600daebb6b |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
I'm not running it anymore and will not. If you trace the IP's it connects to when it runs, it goes to the statistics website where I am sure there is some code to talk to it. But even so, even if the net crashes or the ISP goes down or the router breaks, no other projects crash! They just keep on crunching with delayed UL/DL status as WU's are completed. Not so with WuProp... it clobbers BOINC and the LAN for some reason whenever a realtime network glitch happens or something times out. I swear the way it clobbers things so badly that I bet there is some code errors in there, like references to && or *p without external defines or something. Really stinks of memory leak/execution pathway/stack corruption... I won't tolerate that anymore at all... some LONG WU's like The Lattice Project and RNA can take a week or more to complete and wuprop trashes them. Not a happy camper anymore with this project on Linux. 8-( |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
ZERO problems, ZERO glitches, ZERO weird things happening since I stopped running WUProp tasks.. What a breath of fresh air... no more random WiFi dropouts, no more random errors... everything running so perfectly and stable that I can't believe it... I can finally stop monitoring the systems so much... they all work flawlessly now. 8-) |
Send message Joined: 29 Jul 11 Posts: 334 Credit: 1,240,513 RAC: 321 |
I am also seeing some STRANGE problems with this project trashing work-in-progress and TURNING BACK THE CLOCK. *Only* is happening on WINDOWS PC's. EXAMPLE: 3/31/2013 7:59:58 PM | WUProp@Home | Sending scheduler request: To fetch work. |
Send message Joined: 20 Feb 13 Posts: 34 Credit: 653,713 RAC: 0 |
Same problem here on WCG tasks. |
Send message Joined: 1 Jul 11 Posts: 29 Credit: 126,167 RAC: 0 |
As of today, I still do not run WUProp anymore and have experienced ZERO LAN/Internet/system crashes... So not running this app cures all my problems, BUT, I see others still have a LOT of problems... Guess I'm done on this... Until I see zero problems... 8-) |
©2024 Sébastien