Posts by pschoefer

1) Message boards : Number crunching : Bad workunits (Message 940)
Posted 1 May 2019 by pschoefer
Post:
I spotted another kind of bad WU the other day: https://rake.boincfast.ru/rakesearch/workunit.php?wuid=21236432

It caught my attention because it was already running for more than 2 hours on my fastest computer. On all other hosts, it crashed with EXIT_TIME_LIMIT_EXCEEDED after more than a day, so I decided to abort it.
2) Message boards : Number crunching : Bad workunits (Message 915)
Posted 25 Apr 2019 by pschoefer
Post:
There's another bunch of bad workunits being distributed right now, almost all tasks on my computers crash immediately (e.g., https://rake.boincfast.ru/rakesearch/results.php?hostid=5553&offset=0&show_names=0&state=6) and crashed with the same 0x80000003 exception on other computers.
3) Message boards : Number crunching : upload problem (Message 877)
Posted 13 Apr 2019 by pschoefer
Post:
Well, from a purely scientific point of view, it doesn't really matter who crunches the tasks and when. It would be a problem if a project server completely goes down for extended periods of time so that the net amount of work done becomes negative. Yes, those who have been crunching along here before that sprint started (like you and me) are doing less work now than they used to, and those who (re-)joined the project because of the sprint are not doing as much work as they potentially could. But this is not a net loss of computing power, and yesterday was in fact the best day in the history of RakeSearch.

You might argue that those server problems are bad in the long term, as regular participants might decide to quit the project. But I don't think there has ever been any evidence of this happening to a significant extent (just look at how many stability issues SETI@home has even during normal operation without any "stupid challenges", and still they have a very solid user base). On the other hand, the "stupid challenges" attract new participants who would not have joined otherwise, and some of them might stick with the project after the Competition to finish a milestone, collect run time at another app for WUProp, earn a badge, or because they just like the project. I don't have a proof that this really leads to a significant long-term increase of the user base, either (although I have seen this causing a higher overall throughput for at least a few days after the competition), so my best guess is that the long-term effects of the "stupid challenges" are negligible.

So all in all, the impacts of the "stupid challenges" are more work done for a short period of time and nothing significant in the long run, so there's no real downside. This probably explains why the project administrators do not share your view and rather see them as a chance to identify problems and increase the stability of the project also under very high-load conditions (see hoarfrost's latest comment, and I know from private communication with other project administrators that they have a similar opinion).
4) Message boards : News : Formula BOINC Sprint 2019 (Message 866)
Posted 12 Apr 2019 by pschoefer
Post:
There are still lots of failing file transfers (transient HTTP errors). This can cause computers to run dry, as BOINC does not request new tasks from a project with stalled downloads or too many uploads in progress.

On the other hand, there doesn't appear to be a significant validator backlog and the task generation is much faster than the tasks are sent out. You might even consider slowing task generation down a bit, as all those extra tasks only inflate the database.

All in all, the situation is far from perfect, but I have seen worse things. When PrimeGrid started their Challenge Series back in 2008, it took a few tries until they had figured out which parameters to adjust in which way in order to keep their server alive and their participants happy. And, unlike you, they knew about the challenge well in advance, so they had a chance to open some glass doors instead of just picking up the pieces when it is already too late.

Edit: Sorry for posting this multiple times. I got "504 timeout" errors and retried without checking if it was posted.




©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences