Invalid WU's

Message boards : Number crunching : Invalid WU's
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 245 - Posted: 14 Jun 2010, 14:13:26 UTC

I started three computers on this project and two are working just fine. The third, with computer ID: 5522, keeps getting marked "Completed, marked as invalid".

This is a partial list of the stderr out:
<core_client_version>6.10.17</core_client_version>
<![CDATA[
<stderr_txt>
GPIPE: write on a pipe with no reader
19:30:42 (2573): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
19:30:43 (2573): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
19:30:44 (2573): Interrogation impossible Broken pipe.

...

SIGPIPE: write on a pipe with no reader
07:33:03 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
07:33:05 (14831): Interrogation impossible Broken pipe.
07:33:06 (14831): Deconnection impossible Transport endpoint is not connected.
07:33:06 (14831): Connection impossible Connection refused.
SIGPIPE: write on a pipe with no reader
07:33:06 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader

...

SIGPIPE: write on a pipe with no reader
07:43:03 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
07:43:04 (14831): Interrogation impossible Broken pipe.
SIGPIPE: write on a pipe with no reader
07:43:05 (14831): Interrogation impossible Broken pipe.
07:43:06 (14831): Deconnection impossible Transport endpoint is not connected.
07:43:06 (14831): called boinc_finish

</stderr_txt>
]]>

Can someone explain what is going on here and if there is anything I should be doing my end to fix the problem. This computer is running several projects and only CPU WUs.
ID: 245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 288 - Posted: 16 Aug 2010, 20:22:41 UTC

I tried a few of the newest WU's and got the same messages again. Can anyone help me figure this out?

Thanks

John
ID: 288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
       
Avatar

Send message
Joined: 28 Mar 10
Posts: 588
Credit: 1,220,621
RAC: 237
Message 289 - Posted: 17 Aug 2010, 8:59:55 UTC - in response to Message 245.  
Last modified: 17 Aug 2010, 9:01:20 UTC

I had something like this on a remote computer and my two home Windows computers, I found that the Anti Virus programme I am running had blocked the WU when it had been updated and the file version name changed from 1.37 to 1.38.

This happens each time the version number changes and the WU sits there spinning its wheels increasing run time but not running as the Anti Virus programme is waiting for me to say the programme is safe to run.

It buggered up on three computers this time as the version number changed twice very quickly from 1.36 to 1.37 then to 1.38.

The Linux computers have been unaffected by this issue.
So I will have to check out my Anti Virus programme (Zone Alarm).

Conan.
ID: 289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>WildWildWest] Sébastie...
     
Project administrator
Avatar

Send message
Joined: 28 Mar 10
Posts: 2871
Credit: 538,613
RAC: 132
Message 290 - Posted: 18 Aug 2010, 5:08:50 UTC - in response to Message 288.  

I tried a few of the newest WU's and got the same messages again. Can anyone help me figure this out?

Thanks

John


Which distribution did you use?
Did you use SELinux?

I will try to reproduce the problem.
ID: 290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 292 - Posted: 18 Aug 2010, 15:30:25 UTC - in response to Message 290.  

This machine is running Debian linux I will get the exact details when I get home tonight and let you know what they are. I do know it is a very recent version of Debian.

Thanks.
ID: 292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 293 - Posted: 18 Aug 2010, 16:39:47 UTC - in response to Message 292.  

OK, A few more details, The OS and kernel that are failing is Debian Sid and either kernel 2.6.32-trunk-686 (Debian 2.6.32-5) or 2.6.32-5-686 (Debian 2.6.32-20), both were used at various time and I'm not sure which was used when. Now this machine is running the 2.6.32-5-686-bigmem kernel so I will enable this project again and see what it does.

I will keep you informed of the progress.

jna
ID: 293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>WildWildWest] Sébastie...
     
Project administrator
Avatar

Send message
Joined: 28 Mar 10
Posts: 2871
Credit: 538,613
RAC: 132
Message 295 - Posted: 18 Aug 2010, 19:36:50 UTC - in response to Message 293.  

I installed Debian Sid 32bits in Virtual machine.
The data collecting run well.
ID: 295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 298 - Posted: 20 Aug 2010, 15:24:35 UTC - in response to Message 295.  

Started running WUs again and got the same error messages. This machine is running Debian sid and Linux kernel 2.6.32-5-686-bigmem. So there is something different about this machine that causes problems. Are there any ports that this application uses that I could be blocking that could cause this problem?

Thanks

jna
ID: 298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>WildWildWest] Sébastie...
     
Project administrator
Avatar

Send message
Joined: 28 Mar 10
Posts: 2871
Credit: 538,613
RAC: 132
Message 300 - Posted: 20 Aug 2010, 18:13:25 UTC - in response to Message 298.  

The application uses the same port as the BOINC Manager (31416)
In a shell could you type the below commands?
telnet 127.0.0.1 31416
<get_cc_status>

You should obtain


2
0
0
0
2
2
0.000000
2
2
0.000000
2
2
0.000000
0
0s



ID: 300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 301 - Posted: 21 Aug 2010, 1:00:27 UTC - in response to Message 300.  

Here is the result of the Telnet command:

telnet 127.0.0.1 31416
(UNKNOWN) [127.0.0.1] 31416 (?) : Connection refused

telnet 127.0.0.1 31415
<get_cc_status>
<boinc_gui_rpc_reply>
<cc_status>
<network_status>2</network_status>
<ams_password_error>0</ams_password_error>
<task_suspend_reason>0</task_suspend_reason>
<network_suspend_reason>0</network_suspend_reason>
<task_mode>2</task_mode>
<task_mode_perm>2</task_mode_perm>
<task_mode_delay>0.000000</task_mode_delay>
<gpu_mode>2</gpu_mode>
<gpu_mode_perm>2</gpu_mode_perm>
<gpu_mode_delay>0.000000</gpu_mode_delay>
<network_mode>2</network_mode>
<network_mode_perm>2</network_mode_perm>
<network_mode_delay>0.000000</network_mode_delay>
<disallow_attach>0</disallow_attach>
<simple_gui_only>0s</simple_gui_only>
</cc_status>
</boinc_gui_rpc_reply>

It just occured to me that I am using a non-standard port for Boinc. This is probably the cause of all the problems. I'm not sure if you want to fix your app to handle non-standard ports or not but I will wait for your reply before trying the standard port.

jna
ID: 301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>WildWildWest] Sébastie...
     
Project administrator
Avatar

Send message
Joined: 28 Mar 10
Posts: 2871
Credit: 538,613
RAC: 132
Message 302 - Posted: 21 Aug 2010, 19:06:09 UTC - in response to Message 301.  

I modify the application
To use a non-standard port for Boinc, you should download the version 1.39 by aborting the current WU. You should also add a file named app_info.xml in the directory wuprop.boinc-af.org. The content of app_info.xml should be similar to this:


data_collect
Data Collect


data_collect_1.39_i686-pc-linux-gnu__nci



data_collect
139
0.01
0.01
31415

data_collect_1.39_i686-pc-linux-gnu__nci




You should restart the BOINC client.
ID: 302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 303 - Posted: 22 Aug 2010, 3:39:36 UTC - in response to Message 302.  

Ca marche!

The file stderr.txt contains:

Utilisation du port 31415

after 15 minutes of runtime! Yay!

Merci beaucoup!
(la blonde de jna)
ID: 303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
 
Avatar

Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Message 308 - Posted: 23 Aug 2010, 4:47:54 UTC

All I can say is that it is a friggen waste of computer time to have a work unit trudge along just to find out that after 12 hours of run time to have it come up as invalid. Again a friggen waste of my computer time. Can I get a rebate on my electric bill, Please!
ID: 308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
 
Avatar

Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Message 309 - Posted: 23 Aug 2010, 4:48:31 UTC
Last modified: 23 Aug 2010, 4:52:14 UTC

Yes
ID: 309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
 
Avatar

Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Message 310 - Posted: 23 Aug 2010, 4:48:32 UTC
Last modified: 23 Aug 2010, 4:53:01 UTC

YES, a complete waste of my time.
ID: 310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ai5000
 

Send message
Joined: 4 May 10
Posts: 8
Credit: 238,189
RAC: 0
Message 311 - Posted: 23 Aug 2010, 18:23:54 UTC - in response to Message 308.  

All I can say is that it is a friggen waste of computer time to have a work unit trudge along just to find out that after 12 hours of run time to have it come up as invalid. Again a friggen waste of my computer time. Can I get a rebate on my electric bill, Please!


Well, considering that these work units are non-cpu intensive their impact on your electric bill is quite minimal, to say the least. What other CPU projects were you running at that time on this computer?

Consider yourself lucky you that you never ran an Orbit work unit for 12 days and then have it come up invalid.
ID: 311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sorceress
Avatar

Send message
Joined: 25 Jul 10
Posts: 13
Credit: 33,946
RAC: 0
Message 312 - Posted: 24 Aug 2010, 0:51:54 UTC - in response to Message 311.  
Last modified: 24 Aug 2010, 0:52:40 UTC



Consider yourself lucky you that you never ran an Orbit work unit for 12 days and then have it come up invalid.


I ran 3 Orbit Wu and all three crashed. Must be Orbit's std MOD. They're off my radar for that. lol.
ID: 312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
 
Avatar

Send message
Joined: 14 May 10
Posts: 5
Credit: 96,576
RAC: 0
Message 313 - Posted: 24 Aug 2010, 2:56:51 UTC

The work unit doesn't get done when the computer is off and dumbfoundly when you do turn off your computer off in the middle of a work unit and you pass that 12 hour mark the result is sent in no matter hou far along you got the previous day. So -- power on uses electricity and the work unit trudges along to completion after 12 hours work. Hopefully it passes muster and you get your 25 credits. The same power/electricity is being used for 12 hours only to get an "INVALID" upon completion. Anyway you read it, the computer was powered up for those 12 hours of time that went down the drain. A total waste of the computers processing time and electricity.
ID: 313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ai5000
 

Send message
Joined: 4 May 10
Posts: 8
Credit: 238,189
RAC: 0
Message 314 - Posted: 24 Aug 2010, 15:54:37 UTC - in response to Message 313.  

The work unit doesn't get done when the computer is off and dumbfoundly when you do turn off your computer off in the middle of a work unit and you pass that 12 hour mark the result is sent in no matter hou far along you got the previous day. So -- power on uses electricity and the work unit trudges along to completion after 12 hours work. Hopefully it passes muster and you get your 25 credits. The same power/electricity is being used for 12 hours only to get an "INVALID" upon completion. Anyway you read it, the computer was powered up for those 12 hours of time that went down the drain. A total waste of the computers processing time and electricity.


Please don't tell me this computer was turned on for the sole purpose of crunching work units here.
ID: 314 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jnargus
 

Send message
Joined: 11 Jun 10
Posts: 12
Credit: 996,347
RAC: 48
Message 315 - Posted: 26 Aug 2010, 18:11:15 UTC

I'm not sure what mscharmack is talking about but the machine I was having problems with was running 6 other projects. So the fact that this projects WUs were coming back invalid did not bother me in the least, and I did not consider it a waste of electricity in the least. (OK it bothered me enough to try to find out why they were coming back invalid and fix the problem.)

If mscharmack is running just this project on his computers that IS a waste of time and electricity as it will not give credit if no other projects are running. Besides this project does not award large amounts of credit per day anyway. You are much better off with just about any other project if all you want is credit.

I try to find projects that are doing something interesting that I like first and then I try to maximize credit after. All I need to really start bringing on the credit is a GPU but that will have to wait until I finish renovating my house.

jna
ID: 315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Invalid WU's

©2024 Sébastien