Invalid WU's

log in

Advanced search

Message boards : Number crunching : Invalid WU's

1 · 2 · Next
Author Message
Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 245 - Posted: 14 Jun 2010, 14:13:26 UTC

I started three computers on this project and two are working just fine. The third, with computer ID: 5522, keeps getting marked "Completed, marked as invalid".

This is a partial list of the stderr out:
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<stderr_txt>
GPIPE: write on a pipe with no reader
19:30:42 (2573): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
19:30:43 (2573): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
19:30:44 (2573): Interrogation impossible Broken pipe.

...

SIGPIPE: write on a pipe with no reader
07:33:03 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
07:33:05 (14831): Interrogation impossible Broken pipe.
07:33:06 (14831): Deconnection impossible Transport endpoint is not connected.
07:33:06 (14831): Connection impossible Connection refused.
SIGPIPE: write on a pipe with no reader
07:33:06 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader

...

SIGPIPE: write on a pipe with no reader
07:43:03 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
07:43:04 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
07:43:05 (14831): Interrogation impossible Broken pipe.
07:43:06 (14831): Deconnection impossible Transport endpoint is not connected.
07:43:06 (14831): called boinc_finish

</stderr_txt>
]]>

Can someone explain what is going on here and if there is anything I should be doing my end to fix the problem. This computer is running several projects and only CPU WUs.
____________

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 288 - Posted: 16 Aug 2010, 20:22:41 UTC

I tried a few of the newest WU's and got the same messages again. Can anyone help me figure this out?

Thanks

John
____________

Profile Conan
       
Avatar
Send message
Joined: 28 Mar 10
Posts: 565
Credit: 1,173,915
RAC: 186
Total hours: 3,243,046
Message 289 - Posted: 17 Aug 2010, 8:59:55 UTC - in response to Message 245.
Last modified: 17 Aug 2010, 9:01:20 UTC

I had something like this on a remote computer and my two home Windows computers, I found that the Anti Virus programme I am running had blocked the WU when it had been updated and the file version name changed from 1.37 to 1.38.

This happens each time the version number changes and the WU sits there spinning its wheels increasing run time but not running as the Anti Virus programme is waiting for me to say the programme is safe to run.

It buggered up on three computers this time as the version number changed twice very quickly from 1.36 to 1.37 then to 1.38.

The Linux computers have been unaffected by this issue.
So I will have to check out my Anti Virus programme (Zone Alarm).

Conan.
____________

Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2678
Credit: 513,787
RAC: 95
Total hours: 1,427,760
Message 290 - Posted: 18 Aug 2010, 5:08:50 UTC - in response to Message 288.

I tried a few of the newest WU's and got the same messages again. Can anyone help me figure this out?

Thanks

John


Which distribution did you use?
Did you use SELinux?

I will try to reproduce the problem.
____________

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 292 - Posted: 18 Aug 2010, 15:30:25 UTC - in response to Message 290.

This machine is running Debian linux I will get the exact details when I get home tonight and let you know what they are. I do know it is a very recent version of Debian.

Thanks.

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 293 - Posted: 18 Aug 2010, 16:39:47 UTC - in response to Message 292.

OK, A few more details, The OS and kernel that are failing is Debian Sid and either kernel 2.6.32-trunk-686 (Debian 2.6.32-5) or 2.6.32-5-686 (Debian 2.6.32-20), both were used at various time and I'm not sure which was used when. Now this machine is running the 2.6.32-5-686-bigmem kernel so I will enable this project again and see what it does.

I will keep you informed of the progress.

jna

Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2678
Credit: 513,787
RAC: 95
Total hours: 1,427,760
Message 295 - Posted: 18 Aug 2010, 19:36:50 UTC - in response to Message 293.

I installed Debian Sid 32bits in Virtual machine.
The data collecting run well.
____________

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 298 - Posted: 20 Aug 2010, 15:24:35 UTC - in response to Message 295.

Started running WUs again and got the same error messages. This machine is running Debian sid and Linux kernel 2.6.32-5-686-bigmem. So there is something different about this machine that causes problems. Are there any ports that this application uses that I could be blocking that could cause this problem?

Thanks

jna

Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2678
Credit: 513,787
RAC: 95
Total hours: 1,427,760
Message 300 - Posted: 20 Aug 2010, 18:13:25 UTC - in response to Message 298.

The application uses the same port as the BOINC Manager (31416)
In a shell could you type the below commands?

telnet 127.0.0.1 31416 <get_cc_status>

You should obtain


2
0
0
0
2
2
0.000000
2
2
0.000000
2
2
0.000000
0
0s



____________

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 301 - Posted: 21 Aug 2010, 1:00:27 UTC - in response to Message 300.

Here is the result of the Telnet command:

telnet 127.0.0.1 31416
(UNKNOWN) [127.0.0.1] 31416 (?) : Connection refused

telnet 127.0.0.1 31415
<get_cc_status>
<boinc_gui_rpc_reply>
<cc_status>
<network_status>2</network_status>
<ams_password_error>0</ams_password_error>
<task_suspend_reason>0</task_suspend_reason>
<network_suspend_reason>0</network_suspend_reason>
<task_mode>2</task_mode>
<task_mode_perm>2</task_mode_perm>
<task_mode_delay>0.000000</task_mode_delay>
<gpu_mode>2</gpu_mode>
<gpu_mode_perm>2</gpu_mode_perm>
<gpu_mode_delay>0.000000</gpu_mode_delay>
<network_mode>2</network_mode>
<network_mode_perm>2</network_mode_perm>
<network_mode_delay>0.000000</network_mode_delay>
<disallow_attach>0</disallow_attach>
<simple_gui_only>0s</simple_gui_only>
</cc_status>
</boinc_gui_rpc_reply>

It just occured to me that I am using a non-standard port for Boinc. This is probably the cause of all the problems. I'm not sure if you want to fix your app to handle non-standard ports or not but I will wait for your reply before trying the standard port.

jna

Profile [AF>WildWildWest] Sebastien
     
Dictator
Avatar
Send message
Joined: 28 Mar 10
Posts: 2678
Credit: 513,787
RAC: 95
Total hours: 1,427,760
Message 302 - Posted: 21 Aug 2010, 19:06:09 UTC - in response to Message 301.

I modify the application
To use a non-standard port for Boinc, you should download the version 1.39 by aborting the current WU. You should also add a file named app_info.xml in the directory wuprop.boinc-af.org. The content of app_info.xml should be similar to this:



data_collect
Data Collect


data_collect_1.39_i686-pc-linux-gnu__nci



data_collect
139
0.01
0.01
31415

data_collect_1.39_i686-pc-linux-gnu__nci




You should restart the BOINC client.
____________

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 303 - Posted: 22 Aug 2010, 3:39:36 UTC - in response to Message 302.

Ca marche!

The file stderr.txt contains:

Utilisation du port 31415

after 15 minutes of runtime! Yay!

Merci beaucoup!
(la blonde de jna)

Profile mscharmack
 
Avatar
Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Total hours: 128,584
Message 308 - Posted: 23 Aug 2010, 4:47:54 UTC

All I can say is that it is a friggen waste of computer time to have a work unit trudge along just to find out that after 12 hours of run time to have it come up as invalid. Again a friggen waste of my computer time. Can I get a rebate on my electric bill, Please!

Profile mscharmack
 
Avatar
Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Total hours: 128,584
Message 309 - Posted: 23 Aug 2010, 4:48:31 UTC
Last modified: 23 Aug 2010, 4:52:14 UTC

Yes

Profile mscharmack
 
Avatar
Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Total hours: 128,584
Message 310 - Posted: 23 Aug 2010, 4:48:32 UTC
Last modified: 23 Aug 2010, 4:53:01 UTC

YES, a complete waste of my time.

ai5000
 
Send message
Joined: 4 May 10
Posts: 8
Credit: 238,189
RAC: 0
Total hours: 184,352
Message 311 - Posted: 23 Aug 2010, 18:23:54 UTC - in response to Message 308.

All I can say is that it is a friggen waste of computer time to have a work unit trudge along just to find out that after 12 hours of run time to have it come up as invalid. Again a friggen waste of my computer time. Can I get a rebate on my electric bill, Please!


Well, considering that these work units are non-cpu intensive their impact on your electric bill is quite minimal, to say the least. What other CPU projects were you running at that time on this computer?

Consider yourself lucky you that you never ran an Orbit work unit for 12 days and then have it come up invalid.

Profile Sorceress
Avatar
Send message
Joined: 25 Jul 10
Posts: 13
Credit: 33,946
RAC: 0
Total hours: 469
Message 312 - Posted: 24 Aug 2010, 0:51:54 UTC - in response to Message 311.
Last modified: 24 Aug 2010, 0:52:40 UTC



Consider yourself lucky you that you never ran an Orbit work unit for 12 days and then have it come up invalid.


I ran 3 Orbit Wu and all three crashed. Must be Orbit's std MOD. They're off my radar for that. lol.

Profile mscharmack
 
Avatar
Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Total hours: 128,584
Message 313 - Posted: 24 Aug 2010, 2:56:51 UTC

The work unit doesn't get done when the computer is off and dumbfoundly when you do turn off your computer off in the middle of a work unit and you pass that 12 hour mark the result is sent in no matter hou far along you got the previous day. So -- power on uses electricity and the work unit trudges along to completion after 12 hours work. Hopefully it passes muster and you get your 25 credits. The same power/electricity is being used for 12 hours only to get an "INVALID" upon completion. Anyway you read it, the computer was powered up for those 12 hours of time that went down the drain. A total waste of the computers processing time and electricity.

ai5000
 
Send message
Joined: 4 May 10
Posts: 8
Credit: 238,189
RAC: 0
Total hours: 184,352
Message 314 - Posted: 24 Aug 2010, 15:54:37 UTC - in response to Message 313.

The work unit doesn't get done when the computer is off and dumbfoundly when you do turn off your computer off in the middle of a work unit and you pass that 12 hour mark the result is sent in no matter hou far along you got the previous day. So -- power on uses electricity and the work unit trudges along to completion after 12 hours work. Hopefully it passes muster and you get your 25 credits. The same power/electricity is being used for 12 hours only to get an "INVALID" upon completion. Anyway you read it, the computer was powered up for those 12 hours of time that went down the drain. A total waste of the computers processing time and electricity.


Please don't tell me this computer was turned on for the sole purpose of crunching work units here.

Jnargus
 
Send message
Joined: 11 Jun 10
Posts: 12
Credit: 985,897
RAC: 47
Total hours: 1,536,729
Message 315 - Posted: 26 Aug 2010, 18:11:15 UTC

I'm not sure what mscharmack is talking about but the machine I was having problems with was running 6 other projects. So the fact that this projects WUs were coming back invalid did not bother me in the least, and I did not consider it a waste of electricity in the least. (OK it bothered me enough to try to find out why they were coming back invalid and fix the problem.)

If mscharmack is running just this project on his computers that IS a waste of time and electricity as it will not give credit if no other projects are running. Besides this project does not award large amounts of credit per day anyway. You are much better off with just about any other project if all you want is credit.

I try to find projects that are doing something interesting that I like first and then I try to maximize credit after. All I need to really start bringing on the credit is a GPU but that will have to wait until I finish renovating my house.

jna

1 · 2 · Next
Post to thread

Message boards : Number crunching : Invalid WU's


Home | My Account | Message Boards | Results


Copyright © 2024 Sebastien