Bug in version 2.36 for Windows

Message boards : News : Bug in version 2.36 for Windows
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile [AF>WildWildWest] Sébastie...
     
Project administrator
Avatar

Send message
Joined: 28 Mar 10
Posts: 2847
Credit: 534,313
RAC: 130
Message 470 - Posted: 17 Feb 2011, 17:14:22 UTC

There is a bug in the version 2.36 for Windows.
The application slows down or blocks the network.
You should abort the workunit if you run this version.

Sorry for the inconvenience.
ID: 470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bok
           

Send message
Joined: 28 Mar 10
Posts: 132
Credit: 784,594
RAC: 44
Message 472 - Posted: 17 Feb 2011, 17:44:56 UTC - in response to Message 470.  

Can you provide any details at all?

I say this because I had some problems with my networking yesterday on two windows pc's and hadn't quite figured out the issue, though I did workaround it.
ID: 472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Falconet
 
Help desk expert

Send message
Joined: 2 May 10
Posts: 5
Credit: 315,776
RAC: 0
Message 473 - Posted: 17 Feb 2011, 18:55:23 UTC

Oh I'm glad to have found out about this.I was about to format windows because repairing it didn't work
ID: 473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
   
Avatar

Send message
Joined: 6 Apr 10
Posts: 41
Credit: 471,539
RAC: 0
Message 474 - Posted: 17 Feb 2011, 19:28:03 UTC

It would be best if the project would request the clients to abort all such WUs.

TIA

ID: 474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DrBob
   

Send message
Joined: 10 Apr 10
Posts: 1
Credit: 977,167
RAC: 0
Message 475 - Posted: 17 Feb 2011, 19:59:21 UTC - in response to Message 474.  

It would be best if the project would request the clients to abort all such WUs.

TIA
That would be nice but most clients running the bad app have no network access unless the user has re-booted their machine.



I would like to hear more details from the Project Admin on what exactly the 2.36 application did to our boxen to mess up our networks as bad as it did.

WUProp set to NNW for now.
ID: 475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
       
Avatar

Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Message 476 - Posted: 17 Feb 2011, 20:01:49 UTC - in response to Message 474.  

On one system I am running a Data collect 2.31 (nci) work unit; wu_1295715071_140338_0

It has run for 216h and says it has another 109h to go.

A different app (2.34) on another system will take about as long.

They all use to run for 12h and now its about 335h, that is some change, and not one I welcome.

Note my settings:
Run test applications? no

There is a bug in the version 2.36 for Windows.
The application slows down or blocks the network.
You should abort the workunit if you run this version.


It is not good that I find my networks blocked because of such problems, especially as I did not subscribe to running test applications! The least you could have done was email people to make them aware of this serious problem.

Not a happy cruncher :(
ID: 476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
   
Avatar

Send message
Joined: 6 Apr 10
Posts: 41
Credit: 471,539
RAC: 0
Message 477 - Posted: 17 Feb 2011, 20:06:38 UTC - in response to Message 475.  

Actually, it seems to have intermittent network access when the issue occurs. But, yes, this is a major snafu.

ID: 477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>WildWildWest] Sébastie...
     
Project administrator
Avatar

Send message
Joined: 28 Mar 10
Posts: 2847
Credit: 534,313
RAC: 130
Message 478 - Posted: 17 Feb 2011, 20:50:38 UTC - in response to Message 472.  

Can you provide any details at all?

I say this because I had some problems with my networking yesterday on two windows pc's and hadn't quite figured out the issue, though I did workaround it.


I made an error in the code.

Normally, the application opens a connection to BOINC client when it is begins and closes the connection when it finishes. Every seconds, the application is looking for file transfering.

The bugged application opened a connection to BOINC client, was looking for file transfering and closed the connection every seconds.
ID: 478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bok
           

Send message
Joined: 28 Mar 10
Posts: 132
Credit: 784,594
RAC: 44
Message 479 - Posted: 17 Feb 2011, 21:13:59 UTC

So, in effect it was saturating the network?

ID: 479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
       
Avatar

Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Message 482 - Posted: 17 Feb 2011, 23:09:44 UTC - in response to Message 479.  

What is the situation with the present app?
ID: 482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Falconet
 
Help desk expert

Send message
Joined: 2 May 10
Posts: 5
Credit: 315,776
RAC: 0
Message 486 - Posted: 18 Feb 2011, 14:17:40 UTC - in response to Message 482.  

Version 2.37 is problem-free on my computer
ID: 486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl
 

Send message
Joined: 30 Mar 10
Posts: 13
Credit: 606,696
RAC: 48
Message 490 - Posted: 18 Feb 2011, 18:43:18 UTC - in response to Message 470.  


The application slows down or blocks the network.


Arg! I spent hours yesterday trying to diagnose what was killing port 80 on one of my machines. Got as far as proving it was only happening when BOINC was running and then ran out of time. This morning everything was back to normal.

At least it's nice to know what was causing it :)

Al.
ID: 490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pepo
Avatar

Send message
Joined: 10 May 10
Posts: 15
Credit: 55,797
RAC: 0
Message 493 - Posted: 18 Feb 2011, 22:01:45 UTC - in response to Message 476.  
Last modified: 18 Feb 2011, 22:04:28 UTC

On one system I am running a Data collect 2.31 (nci) work unit; wu_1295715071_140338_0
It has run for 216h and says it has another 109h to go.
A different app (2.34) on another system will take about as long.
They all use to run for 12h and now its about 335h, that is some change, and not one I welcome.

The bug is hopefully eliminated. See also the Run on tasks thread, it's about it.

Peter
ID: 493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thyme Lawn
 

Send message
Joined: 11 Apr 10
Posts: 54
Credit: 382,341
RAC: 0
Message 494 - Posted: 19 Feb 2011, 2:14:29 UTC - in response to Message 490.  


The application slows down or blocks the network.


Arg! I spent hours yesterday trying to diagnose what was killing port 80 on one of my machines. Got as far as proving it was only happening when BOINC was running and then ran out of time. This morning everything was back to normal.

At least it's nice to know what was causing it :)

Same here. I lost count of the number of time I repaired my network connections and rebooted my systems during the 24 hours they were running v2.36.

Unfortunately that clashed with me testing a major change to the inter-process communications mechanism in some of my software and I spent all day trying to work out why I'd introduced a totally unexpected set of problems (couldn't connect to MySQL, blank or incomplete pages in Firefox, SMTP connection failures, traffic which should have gone directly to my ISP was being routed down a VPN connection). Fortunately I didn't yell at the office networking guru for changing the rules pushed down by the VPN server, but I did resort to reinstalling the VPN software (to no effect) after I came to the conclusion that had to be the cause.

All of the problems seem to be resolved running v2.37.
ID: 494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BobCat13
 

Send message
Joined: 6 Apr 10
Posts: 27
Credit: 528,326
RAC: 47
Message 495 - Posted: 19 Feb 2011, 4:47:19 UTC - in response to Message 479.  

Bok wrote:
So, in effect it was saturating the network?

What it was doing was opening multiple local tcp/ip connections every second or two, adding up to thousands of open connections after an hour or so. Windows XP is limited to ports 1025-5000 for ephemeral (short-lived) connections. When all of those ports had been used, it caused other programs to use ports above 5000 and was coming up with a winsock error 10055.

Browsers were not displaying this error, they just wouldn't connect to anything. My nntp client was the one that displayed the 10055 error, which led me to use netstat where I discovered what the problem was.
ID: 495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile skgiven
       
Avatar

Send message
Joined: 7 Sep 10
Posts: 453
Credit: 945,109
RAC: 0
Message 501 - Posted: 22 Feb 2011, 15:08:28 UTC - in response to Message 495.  

[AF>WildWildWest], 2.37 also runs long; hundreds of hours. You should server abort any remaining 2.37 tasks.

You should have reported this problem to Berkeley, as network outages impact on all other projects.
ID: 501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : Bug in version 2.36 for Windows

©2024 Sébastien