Posts by [B@P] Daniel

1) Message boards : News : The last bunch of tasks were generated (Message 1317)
Posted 6 Jun 2020 by Profile [B@P] Daniel
It is sad to see good project ending. I hoped that new search for Rank 11 would be started soon. When I was working on optimized app for R10, I found another possible optimization - check and eliminate rows with duplicated pairs early, in similar way as app is eliminating duplicate values on diagonal. I created a small proof of concept using __uint128_t type. When app was looking for fully orthogonal pairs, it was 2.6 times faster (this probably could be improved). Unfortunately for R10 app is looking for partially orthogonal pairs, so another approach was required, and it was 30% slower. I recall that you wrote that it is easier to find orthogonal pairs for odd ranks than even ones, so I hoped that I could use it for R11 search. I also started work on GPU app. So far results were not spectacular (about 3 times faster on Nvidia 1070), but there is room for optimization. I hope that you will reconsider running R11 search :)
2) Message boards : News : Future of the RakeSearch project (Message 1293)
Posted 26 May 2020 by Profile [B@P] Daniel
Maybe i'm wrong; but looking at the current server status (94.5%) and the current rate (> 0.5% per day), R10 search will be finished within 2 weeks?

Something like this. Current progress is about 1% per day, so I expect that all WUs will be sent out by end of this week. Some of them will timeout or will need to be processed by 3rd person, so it take about 2 weeks until all results will be returned successfully.
3) Questions and Answers : Getting started : hundreds of hosts ready to work, need ARM support (Message 1291)
Posted 21 May 2020 by Profile [B@P] Daniel
I have hundreds (and potentially thousands) of ARM-based compute machines ready to work, but when I try to attach the first one, I get this message:
21-May-2020 14:54:17 [Rake search of diagonal Latin squares] This project doesn't support computers of type aarch64-unknown-linux-gnu

Can somebody help me get started? Boinc generally works fine on other projects, including rosetta@home and seti.

I recommend to install optimized apps created by me, here is topic with more info:

Official app is for ARM only. BOINC on AARCH64 by default is not configured to use them. You would need to install 32-bit ARM libraries and add ARM as an alternative platform for BOINC. Here is instruction how to do this:
4) Message boards : Number crunching : Optimized RakeSearch app for rank 10 (Message 1275)
Posted 19 Apr 2020 by Profile [B@P] Daniel
I set proper permissions on files before creating archives, so tar should set then properly when unpacking. It should be enough to verify them with "ls -l".
5) Message boards : Number crunching : Optimized RakeSearch app for rank 10 (Message 1269)
Posted 18 Apr 2020 by Profile [B@P] Daniel
Hi all,
I have found some free time and prepared optimized version of my app for RakeSearch Rank10. Results surprised me positively, optimized app is 3.5 times faster than original one :). Here are results from my machine with Intel Xeon E5-2683 v3 and Linux:

Original 64-bit app: 3m 39.425s
SSE2:  1m 4.225s
SSSE3: 1m 2.555s
AVX:   1m 2.554s
AVX2:  1m 0.665s

And here are times for optimized 32 bit apps on the same machine. Unfortunately there are no official 32-bit Linux app for comparison:

NoSSE: 2m 40.668s
SSE2:  1m 11.251s
SSSE3: 1m 11.203s

As you can see, 32-bit SSE apps are 2.3 times faster than non-SSE ones. This can explain why optimized 64-bit apps are much faster than original one, most of the speedup is gained thanks to SSE/AVX instructions.

I also checked results on Windows running on i7-2600K machine and got this:
Original 64-bit app: 2m 35.167s
SSE2:  0m 53.271s
SSSE3: 0m 50.523s
AVX:   0m 51.195s

And here are results for 32-bit apps:

Original 32-bit app: 2m 58.987s
NoSSE: 2m  5.546s
SSE2:  1m  2.453s
SSSE3: 0m 57.760s

ARM and AARCH64 apps are also available. Here are results for apps tested on Odroid XU4 board with ARM CPU:

Original app: 6m 51.444s
ARMv7:        4m 27.991s
ARMv7 NEON:   2m 44.874s
ARMv6:        4m 28.414s

And here are results for apps tested on Odroid CU2 board with AARCH64 CPU:

Original app: 9m 20.327s
AARCH64:      3m 41.197s
ARMv7:        6m 12.147s
ARMv7 NEON:   3m 40.030s
ARMv6:        6m  9.990s

I also found a way how to better approximate computation progress - you will no longer see WUs which ends at 50% or are stuck at 100% for hours. I did this by by generating all possible prefixes for 9 initial square cells and store them. Later I compare current initial 9 cells of current square with list of prefixes, and report position of prefix on the list as progress percent.

This app version also supports bigger squares, up to rank 16. Is it enough to change Rank constant and recompile all apps. One exception here is ARM NEON app, which supports squares up to Rank 12. However update for ranks 13..16 is pretty straightforward, it is easy copy/paste/update stuff. I also added static_asserts in the code in places which will need update for higher ranks (mostly 17+), so compiler will tell you what needs update.

Optimized app can be downloaded from GitHub: There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file.

In order to install this app, perform these steps:
- close BOINC (config reload will not work);
- unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch10 file is executable, and both rakesearch10 and app_info.xml are owned by boinc/boinc user/group;
- start BOINC again.

After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr.

All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1.

AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30.
AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too.

Short summary of each app versions:
- SSE2 - this is base app version with SSE support;
- SSSE3 (triple S) - it added shuffle instruction which allows to optimize bitmask calculations. This is some workaround and may be a bit slower than way used in SSE2 app, what I saw for 32-bit app;
- AVX - this instruction set adds longer vectors and floating points instructions which use them. There are also some new logic instructions which looked promising. Unfortunately CPU frequency throttling caused by AVX register use was too big, so app was slower. Fortunately AVX also added some improvements for old SSE instructions, so AVX app is faster than SSSE3 one;
- AVX2 - it added support for integer instructions, what I used. Additionally finally there is "shift by vector" instruction, so I could replace SSSE3 workaround with it. This app version also use BMI2 instructions, which also improves speed a bit;
- AVX512 - this instruction set one more time adds longer vectors, but I do not use them in the app. Instead I use improved version of vector compare instruction, which allows to get result as bitmask directly, without using extra instructions as in earlier SSE/AVX apps.
6) Message boards : Number crunching : Rakesearch appears as duplicate projects in BOINC manager (Message 1253)
Posted 13 Feb 2020 by Profile [B@P] Daniel
I had the same issue, BOINC reported that I am attached twice. I am attached to the project via BAM! (, probably it did something wrong. I suspect that it was changing project address from http to https, and for some time computers were attached twice.
7) Message boards : Science : Question about 10 X 10 squares (Message 1117)
Posted 18 Jul 2019 by Profile [B@P] Daniel
Since every 10 X 10 Latin square has an embedded 3 X 3 Latin square, could we seed three rows, three columns, and their intersections as the 3 X 3 Latin square?
Could your algorithm be modified to start with this additional information?

Could you elaborate more about this? It is unclear where exactly these 3x3 squares should be placed.

I thought about possibility to reuse existing squares and found another promising approach. You should start with existing ODLS pair of rank 8. Take first square from pair and extend it to rank 10 by appending rows and columns around this square. Then permute rows 2-9 of new square in the same way as in 2nd square from pair. You can also swap first and last rows. This looks like a promising way to find rank 10 ODLS pair.

Here is example how to turn rank 3 square into rank 5:
           O O O O O
A A A      O A A A O
B B B  =>  O B B B O
C C C      O C C C O
           O O O O O

Square after applying row permutation from 2nd square:
8) Message boards : News : R10 search temporary stopped! (Message 1095)
Posted 13 Jul 2019 by Profile [B@P] Daniel
So, how's Daniel doing on that code review? :)

I found one more issue and sent suggestions how to fix it. Looks that we have to wait a bit more until it will be implemented and tested.
9) Message boards : Science : Why is the time required to complete each WU different? (Message 1068)
Posted 21 Jun 2019 by Profile [B@P] Daniel
Diagonal Latin Squares are a bit like sudoku - every square must have unique values in rows, columns and diagonals. Workunit file provides square with values in 1st row, both diagonals and some from 2nd/3rd rows. All other square cells are filled by app. It is unknown how many squares can be generated from given initial partially filled square, hence an estimate has to be used.
10) Message boards : News : Future of the RakeSearch project (Message 1035)
Posted 7 Jun 2019 by Profile [B@P] Daniel
Started crunching R10 wus yesterday! So far they works fine, takes 3 hours per wu more or less.

So the whole R10 search space is 7 millions bigger than R9? Yeah to complete that we will need more people and hopefully a GPU app, otherwise it's impossible. But something is better than nothing! So... who will find the first R10 ODLS?

It's not very feasible. I released 1st optimized app about 1.5 year ago. This means that search of whole rank 10 space would roughly take ten million years at current speed. Assuming that Moore's law still would be in effect, full search of rank 10 space would require about 34 years. Assuming that more people and GPU app would be available now and allow to crunch thousand times faster, it still would require about 19 years.
11) Message boards : News : Future of the RakeSearch project (Message 1027)
Posted 5 Jun 2019 by Profile [B@P] Daniel
... Are you telling me that the current rank 10 app release (which requires run times of up to 4 hrs for some of the tasks I completed - compared to 20 min. for rank 9 tasks on the same machine) already employs the same optimizations incl. an autodetection module to select the appropriate SSE/AVX code? ...

Yes. Not all (because some optimizations linked with previous structure of application), but most effective. Without this optimizations computations reqiure in several times more time. This does not exclude options of additional optimizations. Amount of work into "average" workunit, for rank 10, as we see now - increased into several times. And like for rank 9, amount of work in different workunits can be varied by ~3-5-7 times for most workunits and by 20-30 for very small and very large workunits.

This is also important, as stated in
In new workunits (for rank 10) much more squares per 1% - 10 millions versus 2.75 millions in workunits for rank 9. And for "making" square of rank 10 also need more work than for square rank 9.
12) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 1008)
Posted 3 Jun 2019 by Profile [B@P] Daniel
Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,,

I had a chance to peek on a new app code. It already uses many of optimizations implemented by me in rank 9 app. There is still place for some optimizations (for sure SSE/AVX can be added), but do not hold your breath - possible speedups will not be as spectacular as for rank 9 app.
13) Message boards : Number crunching : Processing both R9 & R10 tasks on the same machine (Message 1007)
Posted 3 Jun 2019 by Profile [B@P] Daniel
When you do it this way, replaced binary will be used until BOINC restart - after it BOINC will download official binary again. Note that official binary is not able to properly load checkpoint files from optimized app, so it will not work properly.

If you want to use both apps, you need to add rank 10 app to app_info.xml. Do do this, you need to find appropriate tags for new app in your client_state.xml, and copy them to app_info.xml. Of course you will also need binaries for both apps.
14) Message boards : News : Future of the RakeSearch project (Message 995)
Posted 2 Jun 2019 by Profile [B@P] Daniel
Is there a way to just get the new 10 work units. I have deleted the app_config file but I am still getting the old files as well as the new ones and the old ones take two or three times the time to complete.

Does the new application have a new app_name ?

In RakeSearch preferences you can choose which apps you want to run. By default all apps are enabled.
15) Message boards : News : Future of the RakeSearch project (Message 973)
Posted 27 May 2019 by Profile [B@P] Daniel
Do we need to remove Daniel's optimized app in order for these to run properly?

Yes, otherwise BOINC will not download new app. You will have to remove app_info.xml and restart BOINC. When you will do this, BOINC will also download current official app, if you will have some WUs for it.

Before you do this, make sure you finish all downloaded and started WUs, or abort them. Optimized app uses a bit different checkpoint file format, which is not compatible with official app. It will not work properly if it will load such file.
You can keep WUs which are not started.
16) Message boards : Number crunching : Congratulations, we are over 50%! (Message 955)
Posted 7 May 2019 by Profile [B@P] Daniel
Progress passed 90%!


Now we have about month left before all WUs would be sent out. Could you reveal something about your next plans?
17) Questions and Answers : Unix/Linux : Tasks failing with code 193 (Message 938)
Posted 1 May 2019 by Profile [B@P] Daniel
During last Formula BOINC challenge some workunits were incorrectly generated, and causes crash like this one. More details are here:
18) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 838)
Posted 8 Apr 2019 by Profile [B@P] Daniel
have you planned building ARMv8 apps?
because raspberry pi 3 has an ARMv8 processor

This board uses 64-bit CPU, so please use AARCH64 app. It works on Odroid C2 with ARMv8 CPU, so it should work for you too.
19) Message boards : Number crunching : Congratulations, we are over 50%! (Message 833)
Posted 2 Apr 2019 by Profile [B@P] Daniel
70% percent passed!
Rake search of diagonal Latin squares of rank 9 (%) 70.311
20) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 797)
Posted 4 Mar 2019 by Profile [B@P] Daniel
Hi all.
I have created apps for ARM CPUs. There are new app versions for ARMv7 (with and without NEON) instructions, and for AARCH64. Additionally I also created app for ARMv6, which was requested in the past.

Here are results for new apps, measures on Odroid XU4 (ARM apps) and Odroid CU2 (AARCH64 one):

ARM:           12m49.368s
ARM+NEON:       9m56.425s
AARCH64, NEON: 13m54.945s

For comparison, here are results for previous version:

ARM:           20m35.665s
ARM+NEON:      15m57.060s
AARCH64, NEON: 20m52.180s

App for ARMv6 is a bit slower than ARMv7 one, so make sure you use ARMv7 app on ARMv7 CPU.

During my work I also found bug in non-SSE 32-bit v1.1 apps for Windows and Linux. On ARM app with this bug hang, but on x86 it seems to work, thanks to undefined behavior of one assembler instruction. If you are using these apps (32-bit non-SSE for Windows or Linux) I strongly advice to download and install app again. Old version may hang or produce wrong results. This bug affects only non-SSE apps; SSE and AVX ones are OK.

Next 20

©2020 The searchers team, Karelian Research Center of the Russian Academy of Sciences