Optimized RakeSearch app for rank 10

Message boards : Number crunching : Optimized RakeSearch app for rank 10
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 1269 - Posted: 18 Apr 2020, 11:52:46 UTC

Hi all,
I have found some free time and prepared optimized version of my app for RakeSearch Rank10. Results surprised me positively, optimized app is 3.5 times faster than original one :). Here are results from my machine with Intel Xeon E5-2683 v3 and Linux:

Original 64-bit app: 3m 39.425s
SSE2:  1m 4.225s
SSSE3: 1m 2.555s
AVX:   1m 2.554s
AVX2:  1m 0.665s


And here are times for optimized 32 bit apps on the same machine. Unfortunately there are no official 32-bit Linux app for comparison:

NoSSE: 2m 40.668s
SSE2:  1m 11.251s
SSSE3: 1m 11.203s


As you can see, 32-bit SSE apps are 2.3 times faster than non-SSE ones. This can explain why optimized 64-bit apps are much faster than original one, most of the speedup is gained thanks to SSE/AVX instructions.

I also checked results on Windows running on i7-2600K machine and got this:
Original 64-bit app: 2m 35.167s
SSE2:  0m 53.271s
SSSE3: 0m 50.523s
AVX:   0m 51.195s


And here are results for 32-bit apps:

Original 32-bit app: 2m 58.987s
NoSSE: 2m  5.546s
SSE2:  1m  2.453s
SSSE3: 0m 57.760s


ARM and AARCH64 apps are also available. Here are results for apps tested on Odroid XU4 board with ARM CPU:

Original app: 6m 51.444s
ARMv7:        4m 27.991s
ARMv7 NEON:   2m 44.874s
ARMv6:        4m 28.414s


And here are results for apps tested on Odroid CU2 board with AARCH64 CPU:

Original app: 9m 20.327s
AARCH64:      3m 41.197s
ARMv7:        6m 12.147s
ARMv7 NEON:   3m 40.030s
ARMv6:        6m  9.990s


I also found a way how to better approximate computation progress - you will no longer see WUs which ends at 50% or are stuck at 100% for hours. I did this by by generating all possible prefixes for 9 initial square cells and store them. Later I compare current initial 9 cells of current square with list of prefixes, and report position of prefix on the list as progress percent.

This app version also supports bigger squares, up to rank 16. Is it enough to change Rank constant and recompile all apps. One exception here is ARM NEON app, which supports squares up to Rank 12. However update for ranks 13..16 is pretty straightforward, it is easy copy/paste/update stuff. I also added static_asserts in the code in places which will need update for higher ranks (mostly 17+), so compiler will tell you what needs update.

Optimized app can be downloaded from GitHub: https://github.com/sirzooro/RakeSearch/releases/tag/RakeSearch10.v1.0. There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file.

In order to install this app, perform these steps:
- close BOINC (config reload will not work);
- unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch10 file is executable, and both rakesearch10 and app_info.xml are owned by boinc/boinc user/group;
- start BOINC again.

After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr.

All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1.

AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30.
AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too.

Short summary of each app versions:
- SSE2 - this is base app version with SSE support;
- SSSE3 (triple S) - it added shuffle instruction which allows to optimize bitmask calculations. This is some workaround and may be a bit slower than way used in SSE2 app, what I saw for 32-bit app;
- AVX - this instruction set adds longer vectors and floating points instructions which use them. There are also some new logic instructions which looked promising. Unfortunately CPU frequency throttling caused by AVX register use was too big, so app was slower. Fortunately AVX also added some improvements for old SSE instructions, so AVX app is faster than SSSE3 one;
- AVX2 - it added support for integer instructions, what I used. Additionally finally there is "shift by vector" instruction, so I could replace SSSE3 workaround with it. This app version also use BMI2 instructions, which also improves speed a bit;
- AVX512 - this instruction set one more time adds longer vectors, but I do not use them in the app. Instead I use improved version of vector compare instruction, which allows to get result as bitmask directly, without using extra instructions as in earlier SSE/AVX apps.
ID: 1269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 192,994,472
RAC: 1,363
Message 1270 - Posted: 19 Apr 2020, 13:54:21 UTC

Very nice. thank you.
but have problems on mx linux get permisions,, $ chmod 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
chmod: changing permissions of '/var/lib/boinc/projects/rake.boincfast.ru_rakesearch/': Operation not permitted
know somebody how,,,?
ID: 1270 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 192,994,472
RAC: 1,363
Message 1271 - Posted: 19 Apr 2020, 14:52:39 UTC

Very nice. thank you.
but have problems on mx linux get permisions,, $ chmod 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
chmod: changing permissions of '/var/lib/boinc/projects/rake.boincfast.ru_rakesearch/': Operation not permitted
know somebody how,,,?
ID: 1271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
scole of TSBT

Send message
Joined: 8 Sep 17
Posts: 3
Credit: 18,637,963
RAC: 5,635
Message 1272 - Posted: 19 Apr 2020, 17:22:45 UTC

sudo chmod -R 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
ID: 1272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
scole of TSBT

Send message
Joined: 8 Sep 17
Posts: 3
Credit: 18,637,963
RAC: 5,635
Message 1273 - Posted: 19 Apr 2020, 17:24:51 UTC
Last modified: 19 Apr 2020, 17:26:09 UTC

oops
ID: 1273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
scole of TSBT

Send message
Joined: 8 Sep 17
Posts: 3
Credit: 18,637,963
RAC: 5,635
Message 1274 - Posted: 19 Apr 2020, 17:25:15 UTC
Last modified: 19 Apr 2020, 17:26:39 UTC

oops
ID: 1274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 1275 - Posted: 19 Apr 2020, 17:34:03 UTC

I set proper permissions on files before creating archives, so tar should set then properly when unpacking. It should be enough to verify them with "ls -l".
ID: 1275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 192,994,472
RAC: 1,363
Message 1278 - Posted: 20 Apr 2020, 10:01:11 UTC

I see we doing 3-4 times more work but credit is nearly stalled, becouse vallidating is slow on server,, or something other,,? Also incoming challenges willl need some preparing " on project side.. still dont see nobody from project ,, here ..
thank you for help with lunex
ID: 1278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>EDLS]zOU

Send message
Joined: 18 Feb 20
Posts: 1
Credit: 1,906,837
RAC: 0
Message 1281 - Posted: 24 Apr 2020, 5:38:14 UTC - in response to Message 1269.  

Briilant, thank you !!!

This is the command line I use for a quick install on my ARM systems.

sudo wget https://github.com/sirzooro/RakeSearch/releases/download/RakeSearch10.v1.0/rakesearch10_armv7_neon_v10.tgz && tar xvzf rakesearch10_armv7_neon_v10.tgz && cp /root/app_info.xml /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/app_info.xml && cp /root/rakesearch10 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/rakesearch10 && /etc/init.d/boinc-client restart


This is how I made it:

1- verified instructions supported
cat /proc/cpuinfo
cat /proc/cpuinfo |grep -i flags <== to display only flags
cat /proc/cpuinfo |grep -i avx <== to highlight AVX
cat /proc/cpuinfo |grep -i sse <== to highlight SSE


2- get the correct archive (SSE3 here for my old Intel servers)
wget https://github.com/sirzooro/RakeSearch/releases/download/RakeSearch10.v1.0/rakesearch10_linux64_ssse3_v10.tgz

3- extract
tar xvzf rakesearch10_linux64_ssse3_v10.tgz

4- check where you are :)
pwd

5- copy the 2 files in the prject folder/directory
cp /root/app_info.xml /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/app_info.xml
cp /root/rakesearch10 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/rakesearch10

6- restart BOINC
/etc/init.d/boinc-client restart
ID: 1281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bfromcolo

Send message
Joined: 13 Sep 17
Posts: 2
Credit: 1,964,599
RAC: 462
Message 1345 - Posted: 2 Aug 2020, 13:18:51 UTC

8/2/2020 7:01:20 AM | Rake search of diagonal Latin squares | Message from server: Your app_info.xml file doesn't have a usable version of SAT-based search for orthogonal pairs of DLS of order 10.

I have tried the AVX and AVX2 versions in Win 10 and get this error. Processor supports both. app_info looks fine to me. On startup BOINC does see it.

8/2/2020 6:55:09 AM | Rake search of diagonal Latin squares | Found app_info.xml; using anonymous platform

<app_info>
<app>
<name>rakesearch10</name>
<user_friendly_name>RakeSearch for rank 10 (Opti v1.0)</user_friendly_name>
</app>
<file_info>
<name>rakesearch10.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>rakesearch10</app_name>
<version_num>301</version_num>
<api_version>7.9.0</api_version>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>rakesearch10.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
ID: 1345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bfromcolo

Send message
Joined: 13 Sep 17
Posts: 2
Credit: 1,964,599
RAC: 462
Message 1346 - Posted: 2 Aug 2020, 13:19:59 UTC
Last modified: 2 Aug 2020, 13:47:56 UTC

I reset the project and its working now.
ID: 1346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Millenium

Send message
Joined: 27 Jun 18
Posts: 47
Credit: 9,875,775
RAC: 0
Message 1348 - Posted: 4 Aug 2020, 13:47:44 UTC

This is an old thread, the Rank 10 search ended months ago.

The current search, the SAT-based one is a different one, the current app is different.
ID: 1348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
M0CZY
Avatar

Send message
Joined: 24 Aug 18
Posts: 6
Credit: 104,687
RAC: 26
Message 1353 - Posted: 14 Aug 2020, 16:21:11 UTC

I wonder if [B@P] Daniel would be willing and able to try to compile optimized apps for the current search, SAT-based search for orthogonal pairs of DLS of order 10.
If optimized apps were to be made available, I'm sure the members here would highly appreciate it.
ID: 1353 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
M0CZY
Avatar

Send message
Joined: 24 Aug 18
Posts: 6
Credit: 104,687
RAC: 26
Message 1354 - Posted: 14 Aug 2020, 16:22:18 UTC
Last modified: 14 Aug 2020, 16:24:25 UTC

Sorry, double post. I only clicked it once, honest!
ID: 1354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 1355 - Posted: 15 Aug 2020, 11:25:32 UTC - in response to Message 1353.  

I wonder if [B@P] Daniel would be willing and able to try to compile optimized apps for the current search, SAT-based search for orthogonal pairs of DLS of order 10.
If optimized apps were to be made available, I'm sure the members here would highly appreciate it.

I am going to do this. Unfortunately source code for new app is not released yet.
ID: 1355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 192,994,472
RAC: 1,363
Message 1361 - Posted: 24 Aug 2020, 17:36:56 UTC

This app is soo bizare,, unoptimised,, I can not run on my amd,, cpu is not much loaded (60 C)and "brake" gpu aps to run 10 times longer, also whole pc become slow and seek"
They really need do something with this ,soon as possible.
ID: 1361 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 192,994,472
RAC: 1,363
Message 1362 - Posted: 24 Aug 2020, 21:04:22 UTC

This app is soo bizare,, unoptimised,, I can not run on my amd,, cpu is not much loaded (60 C)and "brake" gpu aps to run 10 times longer, also whole pc become slow and seek"
They really need do something with this ,soon as possible.
ID: 1362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 192,994,472
RAC: 1,363
Message 1395 - Posted: 6 Sep 2020, 3:30:13 UTC

Hi, can not get task on linux mx. can anybody see on that problem?
ID: 1395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Optimized RakeSearch app for rank 10

©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences