Bad workunits

Message boards : Number crunching : Bad workunits
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,794,455
RAC: 8,636
Message 939 - Posted: 1 May 2019, 19:23:52 UTC - in response to Message 936.  
Last modified: 1 May 2019, 19:26:30 UTC

Hello Dingo!
Still coming through all my work is aborting: ..

I should add some explanation. For example, two workunits: R9_020263562 and R9_020266751. Both workunits initially generated from "broken files" several days ago. Yesterday first workunit was regenerated, but before the detection of group of workunits that based on incorrect files. (Many files were generated normally, mistakes arised during generation workunits, after generation of base files). New replica of first workunit (which made yesterday), of course also produce tasks, that fallen immediately after the start of computation. These tasks will be generated for workunit like this until count of errors reaches 8. After count of failed tasks for workunit reach 8, workunits mark as completed with an error. Thereafter, we can found these workunits in database and corretly delete this information and create a new copy from correctly generated workunit file. The second workunit listed above - sample of correctly regenerated workunit.

Now in project database present about 5400 workunits generated from incorrect files (from my previous post more than 200 workunits completed its lifecycle). Most of them near to 8 errors in tasks and in next 2 or 3 days will complete its lifecycle and we can correctly replace these on new replicas. But in these days on computers of participants can arrive incorrect tasks. But incorrect tasks does not consume a CPU time and falls immediately after start. The most correct way now - simple wait while workunits with incorrect tasks simply reach the end of its life. My computers receive incorrect tasks also. :)

Simultaneously in project database adds new correct workunits that send to computers also.

Thank you for attention!
ID: 939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pschoefer

Send message
Joined: 1 Jan 19
Posts: 4
Credit: 32,381,006
RAC: 12,838
Message 940 - Posted: 1 May 2019, 19:26:36 UTC

I spotted another kind of bad WU the other day: https://rake.boincfast.ru/rakesearch/workunit.php?wuid=21236432

It caught my attention because it was already running for more than 2 hours on my fastest computer. On all other hosts, it crashed with EXIT_TIME_LIMIT_EXCEEDED after more than a day, so I decided to abort it.
ID: 940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,794,455
RAC: 8,636
Message 941 - Posted: 1 May 2019, 20:42:09 UTC - in response to Message 940.  

I spotted another kind of bad WU the other day: https://rake.boincfast.ru/rakesearch/workunit.php?wuid=21236432

It caught my attention because it was already running for more than 2 hours on my fastest computer. On all other hosts, it crashed with EXIT_TIME_LIMIT_EXCEEDED after more than a day, so I decided to abort it.

You did absolutely right! We found only one(!) workunit like this in the project database and we mark it as invalid for future recreation.

Thank you for attention!
ID: 941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 193,078,564
RAC: 2,335
Message 942 - Posted: 2 May 2019, 10:50:35 UTC

Hi, good work , you try fix this problem.
i getting now some : https://rake.boincfast.ru/rakesearch/results.php?userid=67&offset=0&show_names=0&state=6&appid=
can you see that failed Wus and stderr?
ID: 942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,794,455
RAC: 8,636
Message 946 - Posted: 2 May 2019, 13:12:02 UTC - in response to Message 942.  

Hi, good work , you try fix this problem.
i getting now some : https://rake.boincfast.ru/rakesearch/results.php?userid=67&offset=0&show_names=0&state=6&appid=
can you see that failed Wus and stderr?

Hi!

We watched a list of 2990WX tasks with errors - these produced by workunits from incorrect files, most of its workunits near to the end of life and in next 1-2 days these will be ready to replacement.

Thank you!
ID: 946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JugNut

Send message
Joined: 6 Jan 18
Posts: 7
Credit: 16,825,117
RAC: 9,311
Message 948 - Posted: 2 May 2019, 19:45:50 UTC - in response to Message 946.  

They certainly are a pain. I just noticed this box that received 56 bad WU's in a row. Because of all the Comp errors Boinc chucked a hissy fit and put in a 24 hr delay.
When I just now found it, the PC had already been idle for 10 hrs. :( https://rake.boincfast.ru/rakesearch/results.php?hostid=2197&offset=0&show_names=0&state=6&appid=
ID: 948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,794,455
RAC: 8,636
Message 949 - Posted: 3 May 2019, 15:18:18 UTC

Problem is mostly solved. Some hundreds of incorrect workunits not completed, but due to existence 1 or 2 results that were sent to computers and doesn't reported until now.
If you see a massive bunch of errors for new tasks - please post to this thread.
ID: 949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Bad workunits

©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences