Posts by [B@P] Daniel

1) Message boards : Science : Question about 10 X 10 squares (Message 1117)
Posted 18 Jul 2019 by Profile [B@P] Daniel
Post:
Since every 10 X 10 Latin square has an embedded 3 X 3 Latin square, could we seed three rows, three columns, and their intersections as the 3 X 3 Latin square?
Could your algorithm be modified to start with this additional information?


Could you elaborate more about this? It is unclear where exactly these 3x3 squares should be placed.

I thought about possibility to reuse existing squares and found another promising approach. You should start with existing ODLS pair of rank 8. Take first square from pair and extend it to rank 10 by appending rows and columns around this square. Then permute rows 2-9 of new square in the same way as in 2nd square from pair. You can also swap first and last rows. This looks like a promising way to find rank 10 ODLS pair.

Here is example how to turn rank 3 square into rank 5:
           O O O O O
A A A      O A A A O
B B B  =>  O B B B O
C C C      O C C C O
           O O O O O

Square after applying row permutation from 2nd square:
O O O O O
O C C C O
O B B B O
O A A A O
O O O O O
2) Message boards : News : R10 search temporary stopped! (Message 1095)
Posted 13 Jul 2019 by Profile [B@P] Daniel
Post:
So, how's Daniel doing on that code review? :)

I found one more issue and sent suggestions how to fix it. Looks that we have to wait a bit more until it will be implemented and tested.
3) Message boards : Science : Why is the time required to complete each WU different? (Message 1068)
Posted 21 Jun 2019 by Profile [B@P] Daniel
Post:
Diagonal Latin Squares are a bit like sudoku - every square must have unique values in rows, columns and diagonals. Workunit file provides square with values in 1st row, both diagonals and some from 2nd/3rd rows. All other square cells are filled by app. It is unknown how many squares can be generated from given initial partially filled square, hence an estimate has to be used.
4) Message boards : News : Future of the RakeSearch project (Message 1035)
Posted 7 Jun 2019 by Profile [B@P] Daniel
Post:
Started crunching R10 wus yesterday! So far they works fine, takes 3 hours per wu more or less.

So the whole R10 search space is 7 millions bigger than R9? Yeah to complete that we will need more people and hopefully a GPU app, otherwise it's impossible. But something is better than nothing! So... who will find the first R10 ODLS?

It's not very feasible. I released 1st optimized app about 1.5 year ago. This means that search of whole rank 10 space would roughly take ten million years at current speed. Assuming that Moore's law still would be in effect, full search of rank 10 space would require about 34 years. Assuming that more people and GPU app would be available now and allow to crunch thousand times faster, it still would require about 19 years.
5) Message boards : News : Future of the RakeSearch project (Message 1027)
Posted 5 Jun 2019 by Profile [B@P] Daniel
Post:
... Are you telling me that the current rank 10 app release (which requires run times of up to 4 hrs for some of the tasks I completed - compared to 20 min. for rank 9 tasks on the same machine) already employs the same optimizations incl. an autodetection module to select the appropriate SSE/AVX code? ...

Yes. Not all (because some optimizations linked with previous structure of application), but most effective. Without this optimizations computations reqiure in several times more time. This does not exclude options of additional optimizations. Amount of work into "average" workunit, for rank 10, as we see now - increased into several times. And like for rank 9, amount of work in different workunits can be varied by ~3-5-7 times for most workunits and by 20-30 for very small and very large workunits.

This is also important, as stated in https://rake.boincfast.ru/rakesearch/forum_thread.php?id=39&postid=1011:
In new workunits (for rank 10) much more squares per 1% - 10 millions versus 2.75 millions in workunits for rank 9. And for "making" square of rank 10 also need more work than for square rank 9.
6) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 1008)
Posted 3 Jun 2019 by Profile [B@P] Daniel
Post:
Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,,

I had a chance to peek on a new app code. It already uses many of optimizations implemented by me in rank 9 app. There is still place for some optimizations (for sure SSE/AVX can be added), but do not hold your breath - possible speedups will not be as spectacular as for rank 9 app.
7) Message boards : Number crunching : Processing both R9 & R10 tasks on the same machine (Message 1007)
Posted 3 Jun 2019 by Profile [B@P] Daniel
Post:
When you do it this way, replaced binary will be used until BOINC restart - after it BOINC will download official binary again. Note that official binary is not able to properly load checkpoint files from optimized app, so it will not work properly.

If you want to use both apps, you need to add rank 10 app to app_info.xml. Do do this, you need to find appropriate tags for new app in your client_state.xml, and copy them to app_info.xml. Of course you will also need binaries for both apps.
8) Message boards : News : Future of the RakeSearch project (Message 995)
Posted 2 Jun 2019 by Profile [B@P] Daniel
Post:
Is there a way to just get the new 10 work units. I have deleted the app_config file but I am still getting the old files as well as the new ones and the old ones take two or three times the time to complete.

Does the new application have a new app_name ?

In RakeSearch preferences you can choose which apps you want to run. By default all apps are enabled.
9) Message boards : News : Future of the RakeSearch project (Message 973)
Posted 27 May 2019 by Profile [B@P] Daniel
Post:
Do we need to remove Daniel's optimized app in order for these to run properly?

Yes, otherwise BOINC will not download new app. You will have to remove app_info.xml and restart BOINC. When you will do this, BOINC will also download current official app, if you will have some WUs for it.

Before you do this, make sure you finish all downloaded and started WUs, or abort them. Optimized app uses a bit different checkpoint file format, which is not compatible with official app. It will not work properly if it will load such file.
You can keep WUs which are not started.
10) Message boards : Number crunching : Congratulations, we are over 50%! (Message 955)
Posted 7 May 2019 by Profile [B@P] Daniel
Post:
Progress passed 90%!

Congratulations!

Now we have about month left before all WUs would be sent out. Could you reveal something about your next plans?
11) Questions and Answers : Unix/Linux : Tasks failing with code 193 (Message 938)
Posted 1 May 2019 by Profile [B@P] Daniel
Post:
During last Formula BOINC challenge some workunits were incorrectly generated, and causes crash like this one. More details are here: https://rake.boincfast.ru/rakesearch/forum_thread.php?id=165#928
12) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 838)
Posted 8 Apr 2019 by Profile [B@P] Daniel
Post:
have you planned building ARMv8 apps?
because raspberry pi 3 has an ARMv8 processor

This board uses 64-bit CPU, so please use AARCH64 app. It works on Odroid C2 with ARMv8 CPU, so it should work for you too.
13) Message boards : Number crunching : Congratulations, we are over 50%! (Message 833)
Posted 2 Apr 2019 by Profile [B@P] Daniel
Post:
70% percent passed!
Rake search of diagonal Latin squares of rank 9 (%) 70.311
14) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 797)
Posted 4 Mar 2019 by Profile [B@P] Daniel
Post:
Hi all.
I have created apps for ARM CPUs. There are new app versions for ARMv7 (with and without NEON) instructions, and for AARCH64. Additionally I also created app for ARMv6, which was requested in the past.

Here are results for new apps, measures on Odroid XU4 (ARM apps) and Odroid CU2 (AARCH64 one):

ARM:           12m49.368s
ARM+NEON:       9m56.425s
AARCH64, NEON: 13m54.945s


For comparison, here are results for previous version:

ARM:           20m35.665s
ARM+NEON:      15m57.060s
AARCH64, NEON: 20m52.180s


App for ARMv6 is a bit slower than ARMv7 one, so make sure you use ARMv7 app on ARMv7 CPU.

During my work I also found bug in non-SSE 32-bit v1.1 apps for Windows and Linux. On ARM app with this bug hang, but on x86 it seems to work, thanks to undefined behavior of one assembler instruction. If you are using these apps (32-bit non-SSE for Windows or Linux) I strongly advice to download and install app again. Old version may hang or produce wrong results. This bug affects only non-SSE apps; SSE and AVX ones are OK.
15) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 776)
Posted 17 Feb 2019 by Profile [B@P] Daniel
Post:
Thanks Daniel,

I will install 64 bit Windows.

And how to make such benchmarks?

You can download sample data file and shell script used to start test from https://github.com/sirzooro/RakeSearch/tree/boinc/RakeDiagSearch/RakeDiagSearch/test. It can be run directly in Linux. On Windows you will need to install Cygwin. You can also try MinGW or MSYS, they also should work, but I did not try to use them - I prefer Cygwin.
16) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 773)
Posted 17 Feb 2019 by Profile [B@P] Daniel
Post:
Есть компьютер с 3 гигабайтами памяти, процессором Intel Pentium Dual Core E2220 и новым пустым жёстким диском.
Планируется поставить Windows 7 на него.
А какой лучше поставить?
32 или 64 разрядный?
32 или 64 разрядные оптимизированные под SSSE3 приложения RakeSearch будут считать быстрее?
Проводились ли замеры скорости счёта на одном и том же компьютере но под 32 и под 64 разрядный Windows?

I have a computer with 3 gigabytes of memory, Intel Pentium Dual Core E2220 processor and a new empty hard drive.
I planned to install Windows 7 on this computer.
What's version of Windows 7 is the best to install?
32 or 64 bit?
32 or 64 bit SSSE3-optimized RakeSearch applications will be count faster?
Whether measurements were carried out of speed of crunching on the same computer but under 32 and under 64 bit Windows?

Please use 64-bit OS and app, 64-bit software can use more registers and has SSE2 by default, so it usually is faster than 32-bit one.

All my previous benchmarks were done on 64-bit Linux.

Windows does not pin CPU-intensitive apps to one CPU core like Linux does, they are constantly floating between them. This adds extra overhead because of context switching, so Windows results are usually few percent worse than Linux ones.

I did some benchmarking to see how 32-bit apps perform on my Haswell Xeon. This was also done on 64bit Linux. CPU-intensitive apps like this one do not have to perform many syscals or use system libraries a lot, so results for 32-bit apps should be similar on 32 and 64 bit systems.

SSSE3 64-bit:
real    4m2.163s
user    4m0.198s
sys     0m0.018s

SSE2 64-bit:
real    4m8.098s
user    4m6.121s
sys     0m0.032s

SSSE3 32-bit:
real    4m37.972s
user    4m36.001s
sys     0m0.032s

SSE2 32-bit:
real    4m56.755s
user    4m54.779s
sys     0m0.040s

Non-SSE 32-bit:
real    4m55.787s
user    4m53.806s
sys     0m0.044s


As you can see, 32-bit apps are slower than 64-bit ones. Result for non-SSE app is a bit surprinsing, I suspected that limitations of 32-bit software combined with various CPU hardware optimizations are responsible for this. It would be interesting to see some benchmark results from old CPUs like your ones, unfortunately I do not have such machine.
17) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 764)
Posted 12 Feb 2019 by Profile [B@P] Daniel
Post:
@ Daniel

Will there be also a "AVX2 NOPEXT" v1.1 for AMD? :thumbsup:

No. New optimized app does not use PEXT instruction, so no need to build separate app version. Please use AVX2 version.
18) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 761)
Posted 11 Feb 2019 by Profile [B@P] Daniel
Post:
I have uploaded fixed SSE2 version, and SSSE3 version (notice triple S here, it is Supplemental SSE3). It turned out that with new compilation options these versions are a bit faster than previous "SSE2" version:

SSE2:
real    4m8.098s
user    4m6.121s
sys     0m0.032s

SSSE3:
real    4m2.163s
user    4m0.198s
sys     0m0.018s

Previous "SSE2":
real    4m14.850s
user    4m12.858s
sys     0m0.047s
19) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 732)
Posted 5 Feb 2019 by Profile [B@P] Daniel
Post:
I have checked this and found bug in compilation options. SSE2 app versions uses SSSE3 instructions which are not supported by your CPU, so they crash with error/signal "Illegal Instruction". I will release fixed app versions later today. Until then please use previous app version, or non-SSE one.
20) Message boards : Number crunching : Congratulations, we are over 50%! (Message 724)
Posted 4 Feb 2019 by Profile [B@P] Daniel
Post:
We are still running rank 9 at this moment, correct (the 50% completion mark)? All my ODSL pairs found are rank 9.

So there is a possibility to run rank 10 search in the future?

Yes, search on space or diagonal latin squares of rank 9 is performed now. Search on space of rank 10 is also possible and interesting, but now it is a far future.

Happy crunching! :)

Not so far, my rough estimate is that with new optimized app released yesterday search for rank 9 will be finished this year, probably sometime between 6 and 9 months from now. Both numbers are based on assumption that during last 8 days (since creation of this thread) project progressed by something between 1% and 1.5%, and new app is 30% faster.

BTW, do you have example of rank 10 pair? My app is not ready for rank 10 yet. I would like to fix it, and need some test data to make sure it will work properly.


Next 20


©2019 The searchers team