Posts by [B@P] Daniel

1) Message boards : News : Joint search of ODLS9 with Gerasim project (Message 1443)
Posted 20 Oct 2020 by Profile [B@P] Daniel
Hello Jürgen!
When will there be a Linux-application ?


This is currently unknown. It is possible that it will only exist under Windows.

I have checked your app and found that it was written in Delphi. I tried to google for Linux Delphi compilers, and found this: One of them is Lazarus, it looks promising. I also found that it can be built with LLVM support. LLVM is used as a backend by clang compiler, so compiled code should be well optimized. Please take a look on this, I am also waiting for Linux app.
2) Message boards : News : Some news about next search (Message 1422)
Posted 29 Sep 2020 by Profile [B@P] Daniel
Hello Conan!

This one I will have to sit out as I have no 64 bit Windows machine. Maybe a Linux version may pop up. I doubt it though as I didn't run Gerasim as it was only a Windows project.
Will keep waiting for the next one.

Having only Windows version for this app is a problem, but we can't solve it right now.

Is source code available somewhere? Updates and compilation for other platforms should not be that hard.
3) Message boards : Number crunching : Optimized RakeSearch app for rank 10 (Message 1355)
Posted 15 Aug 2020 by Profile [B@P] Daniel
I wonder if [B@P] Daniel would be willing and able to try to compile optimized apps for the current search, SAT-based search for orthogonal pairs of DLS of order 10.
If optimized apps were to be made available, I'm sure the members here would highly appreciate it.

I am going to do this. Unfortunately source code for new app is not released yet.
4) Message boards : News : First workunits of new search will be generated soon (Message 1332)
Posted 20 Jul 2020 by Profile [B@P] Daniel
Good to see you back!
5) Message boards : News : The last bunch of tasks were generated (Message 1317)
Posted 6 Jun 2020 by Profile [B@P] Daniel
It is sad to see good project ending. I hoped that new search for Rank 11 would be started soon. When I was working on optimized app for R10, I found another possible optimization - check and eliminate rows with duplicated pairs early, in similar way as app is eliminating duplicate values on diagonal. I created a small proof of concept using __uint128_t type. When app was looking for fully orthogonal pairs, it was 2.6 times faster (this probably could be improved). Unfortunately for R10 app is looking for partially orthogonal pairs, so another approach was required, and it was 30% slower. I recall that you wrote that it is easier to find orthogonal pairs for odd ranks than even ones, so I hoped that I could use it for R11 search. I also started work on GPU app. So far results were not spectacular (about 3 times faster on Nvidia 1070), but there is room for optimization. I hope that you will reconsider running R11 search :)
6) Message boards : News : Future of the RakeSearch project (Message 1293)
Posted 26 May 2020 by Profile [B@P] Daniel
Maybe i'm wrong; but looking at the current server status (94.5%) and the current rate (> 0.5% per day), R10 search will be finished within 2 weeks?

Something like this. Current progress is about 1% per day, so I expect that all WUs will be sent out by end of this week. Some of them will timeout or will need to be processed by 3rd person, so it take about 2 weeks until all results will be returned successfully.
7) Questions and Answers : Getting started : hundreds of hosts ready to work, need ARM support (Message 1291)
Posted 21 May 2020 by Profile [B@P] Daniel
I have hundreds (and potentially thousands) of ARM-based compute machines ready to work, but when I try to attach the first one, I get this message:
21-May-2020 14:54:17 [Rake search of diagonal Latin squares] This project doesn't support computers of type aarch64-unknown-linux-gnu

Can somebody help me get started? Boinc generally works fine on other projects, including rosetta@home and seti.

I recommend to install optimized apps created by me, here is topic with more info:

Official app is for ARM only. BOINC on AARCH64 by default is not configured to use them. You would need to install 32-bit ARM libraries and add ARM as an alternative platform for BOINC. Here is instruction how to do this:
8) Message boards : Number crunching : Optimized RakeSearch app for rank 10 (Message 1275)
Posted 19 Apr 2020 by Profile [B@P] Daniel
I set proper permissions on files before creating archives, so tar should set then properly when unpacking. It should be enough to verify them with "ls -l".
9) Message boards : Number crunching : Optimized RakeSearch app for rank 10 (Message 1269)
Posted 18 Apr 2020 by Profile [B@P] Daniel
Hi all,
I have found some free time and prepared optimized version of my app for RakeSearch Rank10. Results surprised me positively, optimized app is 3.5 times faster than original one :). Here are results from my machine with Intel Xeon E5-2683 v3 and Linux:

Original 64-bit app: 3m 39.425s
SSE2:  1m 4.225s
SSSE3: 1m 2.555s
AVX:   1m 2.554s
AVX2:  1m 0.665s

And here are times for optimized 32 bit apps on the same machine. Unfortunately there are no official 32-bit Linux app for comparison:

NoSSE: 2m 40.668s
SSE2:  1m 11.251s
SSSE3: 1m 11.203s

As you can see, 32-bit SSE apps are 2.3 times faster than non-SSE ones. This can explain why optimized 64-bit apps are much faster than original one, most of the speedup is gained thanks to SSE/AVX instructions.

I also checked results on Windows running on i7-2600K machine and got this:
Original 64-bit app: 2m 35.167s
SSE2:  0m 53.271s
SSSE3: 0m 50.523s
AVX:   0m 51.195s

And here are results for 32-bit apps:

Original 32-bit app: 2m 58.987s
NoSSE: 2m  5.546s
SSE2:  1m  2.453s
SSSE3: 0m 57.760s

ARM and AARCH64 apps are also available. Here are results for apps tested on Odroid XU4 board with ARM CPU:

Original app: 6m 51.444s
ARMv7:        4m 27.991s
ARMv7 NEON:   2m 44.874s
ARMv6:        4m 28.414s

And here are results for apps tested on Odroid CU2 board with AARCH64 CPU:

Original app: 9m 20.327s
AARCH64:      3m 41.197s
ARMv7:        6m 12.147s
ARMv7 NEON:   3m 40.030s
ARMv6:        6m  9.990s

I also found a way how to better approximate computation progress - you will no longer see WUs which ends at 50% or are stuck at 100% for hours. I did this by by generating all possible prefixes for 9 initial square cells and store them. Later I compare current initial 9 cells of current square with list of prefixes, and report position of prefix on the list as progress percent.

This app version also supports bigger squares, up to rank 16. Is it enough to change Rank constant and recompile all apps. One exception here is ARM NEON app, which supports squares up to Rank 12. However update for ranks 13..16 is pretty straightforward, it is easy copy/paste/update stuff. I also added static_asserts in the code in places which will need update for higher ranks (mostly 17+), so compiler will tell you what needs update.

Optimized app can be downloaded from GitHub: There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file.

In order to install this app, perform these steps:
- close BOINC (config reload will not work);
- unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch10 file is executable, and both rakesearch10 and app_info.xml are owned by boinc/boinc user/group;
- start BOINC again.

After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr.

All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1.

AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30.
AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too.

Short summary of each app versions:
- SSE2 - this is base app version with SSE support;
- SSSE3 (triple S) - it added shuffle instruction which allows to optimize bitmask calculations. This is some workaround and may be a bit slower than way used in SSE2 app, what I saw for 32-bit app;
- AVX - this instruction set adds longer vectors and floating points instructions which use them. There are also some new logic instructions which looked promising. Unfortunately CPU frequency throttling caused by AVX register use was too big, so app was slower. Fortunately AVX also added some improvements for old SSE instructions, so AVX app is faster than SSSE3 one;
- AVX2 - it added support for integer instructions, what I used. Additionally finally there is "shift by vector" instruction, so I could replace SSSE3 workaround with it. This app version also use BMI2 instructions, which also improves speed a bit;
- AVX512 - this instruction set one more time adds longer vectors, but I do not use them in the app. Instead I use improved version of vector compare instruction, which allows to get result as bitmask directly, without using extra instructions as in earlier SSE/AVX apps.
10) Message boards : Number crunching : Rakesearch appears as duplicate projects in BOINC manager (Message 1253)
Posted 13 Feb 2020 by Profile [B@P] Daniel
I had the same issue, BOINC reported that I am attached twice. I am attached to the project via BAM! (, probably it did something wrong. I suspect that it was changing project address from http to https, and for some time computers were attached twice.
11) Message boards : Science : Question about 10 X 10 squares (Message 1117)
Posted 18 Jul 2019 by Profile [B@P] Daniel
Since every 10 X 10 Latin square has an embedded 3 X 3 Latin square, could we seed three rows, three columns, and their intersections as the 3 X 3 Latin square?
Could your algorithm be modified to start with this additional information?

Could you elaborate more about this? It is unclear where exactly these 3x3 squares should be placed.

I thought about possibility to reuse existing squares and found another promising approach. You should start with existing ODLS pair of rank 8. Take first square from pair and extend it to rank 10 by appending rows and columns around this square. Then permute rows 2-9 of new square in the same way as in 2nd square from pair. You can also swap first and last rows. This looks like a promising way to find rank 10 ODLS pair.

Here is example how to turn rank 3 square into rank 5:
           O O O O O
A A A      O A A A O
B B B  =>  O B B B O
C C C      O C C C O
           O O O O O

Square after applying row permutation from 2nd square:
12) Message boards : News : R10 search temporary stopped! (Message 1095)
Posted 13 Jul 2019 by Profile [B@P] Daniel
So, how's Daniel doing on that code review? :)

I found one more issue and sent suggestions how to fix it. Looks that we have to wait a bit more until it will be implemented and tested.
13) Message boards : Science : Why is the time required to complete each WU different? (Message 1068)
Posted 21 Jun 2019 by Profile [B@P] Daniel
Diagonal Latin Squares are a bit like sudoku - every square must have unique values in rows, columns and diagonals. Workunit file provides square with values in 1st row, both diagonals and some from 2nd/3rd rows. All other square cells are filled by app. It is unknown how many squares can be generated from given initial partially filled square, hence an estimate has to be used.
14) Message boards : News : Future of the RakeSearch project (Message 1035)
Posted 7 Jun 2019 by Profile [B@P] Daniel
Started crunching R10 wus yesterday! So far they works fine, takes 3 hours per wu more or less.

So the whole R10 search space is 7 millions bigger than R9? Yeah to complete that we will need more people and hopefully a GPU app, otherwise it's impossible. But something is better than nothing! So... who will find the first R10 ODLS?

It's not very feasible. I released 1st optimized app about 1.5 year ago. This means that search of whole rank 10 space would roughly take ten million years at current speed. Assuming that Moore's law still would be in effect, full search of rank 10 space would require about 34 years. Assuming that more people and GPU app would be available now and allow to crunch thousand times faster, it still would require about 19 years.
15) Message boards : News : Future of the RakeSearch project (Message 1027)
Posted 5 Jun 2019 by Profile [B@P] Daniel
... Are you telling me that the current rank 10 app release (which requires run times of up to 4 hrs for some of the tasks I completed - compared to 20 min. for rank 9 tasks on the same machine) already employs the same optimizations incl. an autodetection module to select the appropriate SSE/AVX code? ...

Yes. Not all (because some optimizations linked with previous structure of application), but most effective. Without this optimizations computations reqiure in several times more time. This does not exclude options of additional optimizations. Amount of work into "average" workunit, for rank 10, as we see now - increased into several times. And like for rank 9, amount of work in different workunits can be varied by ~3-5-7 times for most workunits and by 20-30 for very small and very large workunits.

This is also important, as stated in
In new workunits (for rank 10) much more squares per 1% - 10 millions versus 2.75 millions in workunits for rank 9. And for "making" square of rank 10 also need more work than for square rank 9.
16) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 1008)
Posted 3 Jun 2019 by Profile [B@P] Daniel
Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,,

I had a chance to peek on a new app code. It already uses many of optimizations implemented by me in rank 9 app. There is still place for some optimizations (for sure SSE/AVX can be added), but do not hold your breath - possible speedups will not be as spectacular as for rank 9 app.
17) Message boards : Number crunching : Processing both R9 & R10 tasks on the same machine (Message 1007)
Posted 3 Jun 2019 by Profile [B@P] Daniel
When you do it this way, replaced binary will be used until BOINC restart - after it BOINC will download official binary again. Note that official binary is not able to properly load checkpoint files from optimized app, so it will not work properly.

If you want to use both apps, you need to add rank 10 app to app_info.xml. Do do this, you need to find appropriate tags for new app in your client_state.xml, and copy them to app_info.xml. Of course you will also need binaries for both apps.
18) Message boards : News : Future of the RakeSearch project (Message 995)
Posted 2 Jun 2019 by Profile [B@P] Daniel
Is there a way to just get the new 10 work units. I have deleted the app_config file but I am still getting the old files as well as the new ones and the old ones take two or three times the time to complete.

Does the new application have a new app_name ?

In RakeSearch preferences you can choose which apps you want to run. By default all apps are enabled.
19) Message boards : News : Future of the RakeSearch project (Message 973)
Posted 27 May 2019 by Profile [B@P] Daniel
Do we need to remove Daniel's optimized app in order for these to run properly?

Yes, otherwise BOINC will not download new app. You will have to remove app_info.xml and restart BOINC. When you will do this, BOINC will also download current official app, if you will have some WUs for it.

Before you do this, make sure you finish all downloaded and started WUs, or abort them. Optimized app uses a bit different checkpoint file format, which is not compatible with official app. It will not work properly if it will load such file.
You can keep WUs which are not started.
20) Message boards : Number crunching : Congratulations, we are over 50%! (Message 955)
Posted 7 May 2019 by Profile [B@P] Daniel
Progress passed 90%!


Now we have about month left before all WUs would be sent out. Could you reveal something about your next plans?

Next 20

©2023 The searchers team, Karelian Research Center of the Russian Academy of Sciences