Bug in version 2.36 for Windows

log in

Advanced search

Message boards : News : Bug in version 2.36 for Windows

Author Message
Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2677
Credit: 513,691
RAC: 95
Total hours: 1,427,163
Message 470 - Posted: 17 Feb 2011, 17:14:22 UTC

There is a bug in the version 2.36 for Windows.
The application slows down or blocks the network.
You should abort the workunit if you run this version.

Sorry for the inconvenience.
____________

Bok
           
Send message
Joined: 28 Mar 10
Posts: 132
Credit: 782,198
RAC: 0
Total hours: 2,666,641
Message 472 - Posted: 17 Feb 2011, 17:44:56 UTC - in response to Message 470.

Can you provide any details at all?

I say this because I had some problems with my networking yesterday on two windows pc's and hadn't quite figured out the issue, though I did workaround it.

Falconet
 
Send message
Joined: 2 May 10
Posts: 5
Credit: 312,776
RAC: 30
Total hours: 672,240
Message 473 - Posted: 17 Feb 2011, 18:55:23 UTC

Oh I'm glad to have found out about this.I was about to format windows because repairing it didn't work

ebahapo
   
Avatar
Send message
Joined: 6 Apr 10
Posts: 41
Credit: 471,109
RAC: 0
Total hours: 351,238
Message 474 - Posted: 17 Feb 2011, 19:28:03 UTC

It would be best if the project would request the clients to abort all such WUs.

TIA

____________

Profile DrBob
   
Send message
Joined: 10 Apr 10
Posts: 1
Credit: 977,167
RAC: 0
Total hours: 1,496,796
Message 475 - Posted: 17 Feb 2011, 19:59:21 UTC - in response to Message 474.

It would be best if the project would request the clients to abort all such WUs.

TIA
That would be nice but most clients running the bad app have no network access unless the user has re-booted their machine.



I would like to hear more details from the Project Admin on what exactly the 2.36 application did to our boxen to mess up our networks as bad as it did.

WUProp set to NNW for now.

Profile skgiven
       
Avatar
Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Total hours: 2,101,570
Message 476 - Posted: 17 Feb 2011, 20:01:49 UTC - in response to Message 474.

On one system I am running a Data collect 2.31 (nci) work unit; wu_1295715071_140338_0

It has run for 216h and says it has another 109h to go.

A different app (2.34) on another system will take about as long.

They all use to run for 12h and now its about 335h, that is some change, and not one I welcome.

Note my settings:
Run test applications? no

There is a bug in the version 2.36 for Windows.
The application slows down or blocks the network.
You should abort the workunit if you run this version.


It is not good that I find my networks blocked because of such problems, especially as I did not subscribe to running test applications! The least you could have done was email people to make them aware of this serious problem.

Not a happy cruncher :(

ebahapo
   
Avatar
Send message
Joined: 6 Apr 10
Posts: 41
Credit: 471,109
RAC: 0
Total hours: 351,238
Message 477 - Posted: 17 Feb 2011, 20:06:38 UTC - in response to Message 475.

Actually, it seems to have intermittent network access when the issue occurs. But, yes, this is a major snafu.

____________

Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2677
Credit: 513,691
RAC: 95
Total hours: 1,427,163
Message 478 - Posted: 17 Feb 2011, 20:50:38 UTC - in response to Message 472.

Can you provide any details at all?

I say this because I had some problems with my networking yesterday on two windows pc's and hadn't quite figured out the issue, though I did workaround it.


I made an error in the code.

Normally, the application opens a connection to BOINC client when it is begins and closes the connection when it finishes. Every seconds, the application is looking for file transfering.

The bugged application opened a connection to BOINC client, was looking for file transfering and closed the connection every seconds.
____________

Bok
           
Send message
Joined: 28 Mar 10
Posts: 132
Credit: 782,198
RAC: 0
Total hours: 2,666,641
Message 479 - Posted: 17 Feb 2011, 21:13:59 UTC

So, in effect it was saturating the network?

Profile skgiven
       
Avatar
Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Total hours: 2,101,570
Message 482 - Posted: 17 Feb 2011, 23:09:44 UTC - in response to Message 479.

What is the situation with the present app?

Falconet
 
Send message
Joined: 2 May 10
Posts: 5
Credit: 312,776
RAC: 30
Total hours: 672,240
Message 486 - Posted: 18 Feb 2011, 14:17:40 UTC - in response to Message 482.

Version 2.37 is problem-free on my computer

ChertseyAl
 
Send message
Joined: 30 Mar 10
Posts: 13
Credit: 597,344
RAC: 47
Total hours: 595,815
Message 490 - Posted: 18 Feb 2011, 18:43:18 UTC - in response to Message 470.


The application slows down or blocks the network.


Arg! I spent hours yesterday trying to diagnose what was killing port 80 on one of my machines. Got as far as proving it was only happening when BOINC was running and then ran out of time. This morning everything was back to normal.

At least it's nice to know what was causing it :)

Al.

Pepo
Avatar
Send message
Joined: 10 May 10
Posts: 15
Credit: 55,797
RAC: 0
Total hours: 8,885
Message 493 - Posted: 18 Feb 2011, 22:01:45 UTC - in response to Message 476.
Last modified: 18 Feb 2011, 22:04:28 UTC

On one system I am running a Data collect 2.31 (nci) work unit; wu_1295715071_140338_0
It has run for 216h and says it has another 109h to go.
A different app (2.34) on another system will take about as long.
They all use to run for 12h and now its about 335h, that is some change, and not one I welcome.

The bug is hopefully eliminated. See also the Run on tasks thread, it's about it.

Peter

Thyme Lawn
 
Send message
Joined: 11 Apr 10
Posts: 54
Credit: 382,341
RAC: 0
Total hours: 639,907
Message 494 - Posted: 19 Feb 2011, 2:14:29 UTC - in response to Message 490.


The application slows down or blocks the network.


Arg! I spent hours yesterday trying to diagnose what was killing port 80 on one of my machines. Got as far as proving it was only happening when BOINC was running and then ran out of time. This morning everything was back to normal.

At least it's nice to know what was causing it :)

Same here. I lost count of the number of time I repaired my network connections and rebooted my systems during the 24 hours they were running v2.36.

Unfortunately that clashed with me testing a major change to the inter-process communications mechanism in some of my software and I spent all day trying to work out why I'd introduced a totally unexpected set of problems (couldn't connect to MySQL, blank or incomplete pages in Firefox, SMTP connection failures, traffic which should have gone directly to my ISP was being routed down a VPN connection). Fortunately I didn't yell at the office networking guru for changing the rules pushed down by the VPN server, but I did resort to reinstalling the VPN software (to no effect) after I came to the conclusion that had to be the cause.

All of the problems seem to be resolved running v2.37.

BobCat13
 
Send message
Joined: 6 Apr 10
Posts: 27
Credit: 520,180
RAC: 47
Total hours: 545,117
Message 495 - Posted: 19 Feb 2011, 4:47:19 UTC - in response to Message 479.

Bok wrote:
So, in effect it was saturating the network?

What it was doing was opening multiple local tcp/ip connections every second or two, adding up to thousands of open connections after an hour or so. Windows XP is limited to ports 1025-5000 for ephemeral (short-lived) connections. When all of those ports had been used, it caused other programs to use ports above 5000 and was coming up with a winsock error 10055.

Browsers were not displaying this error, they just wouldn't connect to anything. My nntp client was the one that displayed the 10055 error, which led me to use netstat where I discovered what the problem was.

Profile skgiven
       
Avatar
Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Total hours: 2,101,570
Message 501 - Posted: 22 Feb 2011, 15:08:28 UTC - in response to Message 495.

[AF>WildWildWest], 2.37 also runs long; hundreds of hours. You should server abort any remaining 2.37 tasks.

You should have reported this problem to Berkeley, as network outages impact on all other projects.


Post to thread

Message boards : News : Bug in version 2.36 for Windows


Home | My Account | Message Boards | Results


Copyright © 2024 Sebastien