Bug in version 2.36 for Windows


Advanced search

Message boards : News : Bug in version 2.36 for Windows

AuthorMessage
Profile [AF>WildWildWest]Sebastien

Project administrator
Avatar
Send message
Joined: Mar 28 10
Posts: 155
Credit: 128,429
RAC: 124
Message 470 - Posted 17 Feb 2011 17:14:22 UTC

    There is a bug in the version 2.36 for Windows.
    The application slows down or blocks the network.
    You should abort the workunit if you run this version.

    Sorry for the inconvenience.
    ____________

    Bok
    Send message
    Joined: Mar 28 10
    Posts: 4
    Credit: 113,497
    RAC: 98
    Message 472 - Posted 17 Feb 2011 17:44:56 UTC - in response to Message 470.

      Can you provide any details at all?

      I say this because I had some problems with my networking yesterday on two windows pc's and hadn't quite figured out the issue, though I did workaround it.

      Falconet
      Volunteer translator
      Send message
      Joined: May 2 10
      Posts: 5
      Credit: 23,647
      RAC: 24
      Message 473 - Posted 17 Feb 2011 18:55:23 UTC

        Oh I'm glad to have found out about this.I was about to format windows because repairing it didn't work

        Augustine
        Avatar
        Send message
        Joined: Apr 6 10
        Posts: 37
        Credit: 124,282
        RAC: 160
        Message 474 - Posted 17 Feb 2011 19:28:03 UTC

          It would be best if the project would request the clients to abort all such WUs.

          TIA

          ____________

          Profile DrBob

          Send message
          Joined: Apr 10 10
          Posts: 1
          Credit: 403,828
          RAC: 382
          Message 475 - Posted 17 Feb 2011 19:59:21 UTC - in response to Message 474.

            It would be best if the project would request the clients to abort all such WUs.

            TIA
            That would be nice but most clients running the bad app have no network access unless the user has re-booted their machine.



            I would like to hear more details from the Project Admin on what exactly the 2.36 application did to our boxen to mess up our networks as bad as it did.

            WUProp set to NNW for now.

            Profile skgiven

            Avatar
            Send message
            Joined: Sep 7 10
            Posts: 58
            Credit: 410,430
            RAC: 465
            Message 476 - Posted 17 Feb 2011 20:01:49 UTC - in response to Message 474.

              On one system I am running a Data collect 2.31 (nci) work unit; wu_1295715071_140338_0

              It has run for 216h and says it has another 109h to go.

              A different app (2.34) on another system will take about as long.

              They all use to run for 12h and now its about 335h, that is some change, and not one I welcome.

              Note my settings:
              Run test applications? no

              There is a bug in the version 2.36 for Windows.
              The application slows down or blocks the network.
              You should abort the workunit if you run this version.


              It is not good that I find my networks blocked because of such problems, especially as I did not subscribe to running test applications! The least you could have done was email people to make them aware of this serious problem.

              Not a happy cruncher :(

              Augustine
              Avatar
              Send message
              Joined: Apr 6 10
              Posts: 37
              Credit: 124,282
              RAC: 160
              Message 477 - Posted 17 Feb 2011 20:06:38 UTC - in response to Message 475.

                Actually, it seems to have intermittent network access when the issue occurs. But, yes, this is a major snafu.

                ____________

                Profile [AF>WildWildWest]Sebastien

                Project administrator
                Avatar
                Send message
                Joined: Mar 28 10
                Posts: 155
                Credit: 128,429
                RAC: 124
                Message 478 - Posted 17 Feb 2011 20:50:38 UTC - in response to Message 472.

                  Can you provide any details at all?

                  I say this because I had some problems with my networking yesterday on two windows pc's and hadn't quite figured out the issue, though I did workaround it.


                  I made an error in the code.

                  Normally, the application opens a connection to BOINC client when it is begins and closes the connection when it finishes. Every seconds, the application is looking for file transfering.

                  The bugged application opened a connection to BOINC client, was looking for file transfering and closed the connection every seconds.
                  ____________

                  Bok
                  Send message
                  Joined: Mar 28 10
                  Posts: 4
                  Credit: 113,497
                  RAC: 98
                  Message 479 - Posted 17 Feb 2011 21:13:59 UTC

                    So, in effect it was saturating the network?

                    Profile skgiven

                    Avatar
                    Send message
                    Joined: Sep 7 10
                    Posts: 58
                    Credit: 410,430
                    RAC: 465
                    Message 482 - Posted 17 Feb 2011 23:09:44 UTC - in response to Message 479.

                      What is the situation with the present app?

                      Falconet
                      Volunteer translator
                      Send message
                      Joined: May 2 10
                      Posts: 5
                      Credit: 23,647
                      RAC: 24
                      Message 486 - Posted 18 Feb 2011 14:17:40 UTC - in response to Message 482.

                        Version 2.37 is problem-free on my computer

                        ChertseyAl
                        Send message
                        Joined: Mar 30 10
                        Posts: 6
                        Credit: 74,445
                        RAC: 103
                        Message 490 - Posted 18 Feb 2011 18:43:18 UTC - in response to Message 470.


                          The application slows down or blocks the network.


                          Arg! I spent hours yesterday trying to diagnose what was killing port 80 on one of my machines. Got as far as proving it was only happening when BOINC was running and then ran out of time. This morning everything was back to normal.

                          At least it's nice to know what was causing it :)

                          Al.

                          Pepo
                          Avatar
                          Send message
                          Joined: May 10 10
                          Posts: 15
                          Credit: 49,025
                          RAC: 0
                          Message 493 - Posted 18 Feb 2011 22:01:45 UTC - in response to Message 476.

                            Last modified: 18 Feb 2011 22:04:28 UTC

                            On one system I am running a Data collect 2.31 (nci) work unit; wu_1295715071_140338_0
                            It has run for 216h and says it has another 109h to go.
                            A different app (2.34) on another system will take about as long.
                            They all use to run for 12h and now its about 335h, that is some change, and not one I welcome.

                            The bug is hopefully eliminated. See also the Run on tasks thread, it's about it.

                            Peter

                            Thyme Lawn
                            Send message
                            Joined: Apr 11 10
                            Posts: 12
                            Credit: 119,115
                            RAC: 101
                            Message 494 - Posted 19 Feb 2011 2:14:29 UTC - in response to Message 490.


                              The application slows down or blocks the network.


                              Arg! I spent hours yesterday trying to diagnose what was killing port 80 on one of my machines. Got as far as proving it was only happening when BOINC was running and then ran out of time. This morning everything was back to normal.

                              At least it's nice to know what was causing it :)

                              Same here. I lost count of the number of time I repaired my network connections and rebooted my systems during the 24 hours they were running v2.36.

                              Unfortunately that clashed with me testing a major change to the inter-process communications mechanism in some of my software and I spent all day trying to work out why I'd introduced a totally unexpected set of problems (couldn't connect to MySQL, blank or incomplete pages in Firefox, SMTP connection failures, traffic which should have gone directly to my ISP was being routed down a VPN connection). Fortunately I didn't yell at the office networking guru for changing the rules pushed down by the VPN server, but I did resort to reinstalling the VPN software (to no effect) after I came to the conclusion that had to be the cause.

                              All of the problems seem to be resolved running v2.37.

                              BobCat13

                              Send message
                              Joined: Apr 6 10
                              Posts: 5
                              Credit: 117,892
                              RAC: 91
                              Message 495 - Posted 19 Feb 2011 4:47:19 UTC - in response to Message 479.

                                Bok wrote:
                                So, in effect it was saturating the network?

                                What it was doing was opening multiple local tcp/ip connections every second or two, adding up to thousands of open connections after an hour or so. Windows XP is limited to ports 1025-5000 for ephemeral (short-lived) connections. When all of those ports had been used, it caused other programs to use ports above 5000 and was coming up with a winsock error 10055.

                                Browsers were not displaying this error, they just wouldn't connect to anything. My nntp client was the one that displayed the 10055 error, which led me to use netstat where I discovered what the problem was.

                                Profile skgiven

                                Avatar
                                Send message
                                Joined: Sep 7 10
                                Posts: 58
                                Credit: 410,430
                                RAC: 465
                                Message 501 - Posted 22 Feb 2011 15:08:28 UTC - in response to Message 495.

                                  [AF>WildWildWest], 2.37 also runs long; hundreds of hours. You should server abort any remaining 2.37 tasks.

                                  You should have reported this problem to Berkeley, as network outages impact on all other projects.

                                  Post to thread

                                  Message boards : News : Bug in version 2.36 for Windows


                                  Home | My Account | Message Boards | Results


                                  Copyright © 2013 Sebastien