Optimized RakeSearch app for rank 10

Message boards : Number crunching : Optimized RakeSearch app for rank 10

To post messages, you must log in.

AuthorMessage
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 95
Credit: 401,721,096
RAC: 7,401
Message 1269 - Posted: 18 Apr 2020, 11:52:46 UTC

Hi all,
I have found some free time and prepared optimized version of my app for RakeSearch Rank10. Results surprised me positively, optimized app is 3.5 times faster than original one :). Here are results from my machine with Intel Xeon E5-2683 v3 and Linux:

Original 64-bit app: 3m 39.425s
SSE2:  1m 4.225s
SSSE3: 1m 2.555s
AVX:   1m 2.554s
AVX2:  1m 0.665s


And here are times for optimized 32 bit apps on the same machine. Unfortunately there are no official 32-bit Linux app for comparison:

NoSSE: 2m 40.668s
SSE2:  1m 11.251s
SSSE3: 1m 11.203s


As you can see, 32-bit SSE apps are 2.3 times faster than non-SSE ones. This can explain why optimized 64-bit apps are much faster than original one, most of the speedup is gained thanks to SSE/AVX instructions.

I also checked results on Windows running on i7-2600K machine and got this:
Original 64-bit app: 2m 35.167s
SSE2:  0m 53.271s
SSSE3: 0m 50.523s
AVX:   0m 51.195s


And here are results for 32-bit apps:

Original 32-bit app: 2m 58.987s
NoSSE: 2m  5.546s
SSE2:  1m  2.453s
SSSE3: 0m 57.760s


ARM and AARCH64 apps are also available. Here are results for apps tested on Odroid XU4 board with ARM CPU:

Original app: 6m 51.444s
ARMv7:        4m 27.991s
ARMv7 NEON:   2m 44.874s
ARMv6:        4m 28.414s


And here are results for apps tested on Odroid CU2 board with AARCH64 CPU:

Original app: 9m 20.327s
AARCH64:      3m 41.197s
ARMv7:        6m 12.147s
ARMv7 NEON:   3m 40.030s
ARMv6:        6m  9.990s


I also found a way how to better approximate computation progress - you will no longer see WUs which ends at 50% or are stuck at 100% for hours. I did this by by generating all possible prefixes for 9 initial square cells and store them. Later I compare current initial 9 cells of current square with list of prefixes, and report position of prefix on the list as progress percent.

This app version also supports bigger squares, up to rank 16. Is it enough to change Rank constant and recompile all apps. One exception here is ARM NEON app, which supports squares up to Rank 12. However update for ranks 13..16 is pretty straightforward, it is easy copy/paste/update stuff. I also added static_asserts in the code in places which will need update for higher ranks (mostly 17+), so compiler will tell you what needs update.

Optimized app can be downloaded from GitHub: https://github.com/sirzooro/RakeSearch/releases/tag/RakeSearch10.v1.0. There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file.

In order to install this app, perform these steps:
- close BOINC (config reload will not work);
- unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch10 file is executable, and both rakesearch10 and app_info.xml are owned by boinc/boinc user/group;
- start BOINC again.

After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr.

All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1.

AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30.
AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too.

Short summary of each app versions:
- SSE2 - this is base app version with SSE support;
- SSSE3 (triple S) - it added shuffle instruction which allows to optimize bitmask calculations. This is some workaround and may be a bit slower than way used in SSE2 app, what I saw for 32-bit app;
- AVX - this instruction set adds longer vectors and floating points instructions which use them. There are also some new logic instructions which looked promising. Unfortunately CPU frequency throttling caused by AVX register use was too big, so app was slower. Fortunately AVX also added some improvements for old SSE instructions, so AVX app is faster than SSSE3 one;
- AVX2 - it added support for integer instructions, what I used. Additionally finally there is "shift by vector" instruction, so I could replace SSSE3 workaround with it. This app version also use BMI2 instructions, which also improves speed a bit;
- AVX512 - this instruction set one more time adds longer vectors, but I do not use them in the app. Instead I use improved version of vector compare instruction, which allows to get result as bitmask directly, without using extra instructions as in earlier SSE/AVX apps.
ID: 1269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 44
Credit: 186,911,851
RAC: 3,156
Message 1270 - Posted: 19 Apr 2020, 13:54:21 UTC

Very nice. thank you.
but have problems on mx linux get permisions,, $ chmod 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
chmod: changing permissions of '/var/lib/boinc/projects/rake.boincfast.ru_rakesearch/': Operation not permitted
know somebody how,,,?
ID: 1270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 44
Credit: 186,911,851
RAC: 3,156
Message 1271 - Posted: 19 Apr 2020, 14:52:39 UTC

Very nice. thank you.
but have problems on mx linux get permisions,, $ chmod 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
chmod: changing permissions of '/var/lib/boinc/projects/rake.boincfast.ru_rakesearch/': Operation not permitted
know somebody how,,,?
ID: 1271 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
scole of TSBT

Send message
Joined: 8 Sep 17
Posts: 3
Credit: 12,538,001
RAC: 115
Message 1272 - Posted: 19 Apr 2020, 17:22:45 UTC

sudo chmod -R 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
ID: 1272 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
scole of TSBT

Send message
Joined: 8 Sep 17
Posts: 3
Credit: 12,538,001
RAC: 115
Message 1273 - Posted: 19 Apr 2020, 17:24:51 UTC
Last modified: 19 Apr 2020, 17:26:09 UTC

oops
ID: 1273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
scole of TSBT

Send message
Joined: 8 Sep 17
Posts: 3
Credit: 12,538,001
RAC: 115
Message 1274 - Posted: 19 Apr 2020, 17:25:15 UTC
Last modified: 19 Apr 2020, 17:26:39 UTC

oops
ID: 1274 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 95
Credit: 401,721,096
RAC: 7,401
Message 1275 - Posted: 19 Apr 2020, 17:34:03 UTC

I set proper permissions on files before creating archives, so tar should set then properly when unpacking. It should be enough to verify them with "ls -l".
ID: 1275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 44
Credit: 186,911,851
RAC: 3,156
Message 1278 - Posted: 20 Apr 2020, 10:01:11 UTC

I see we doing 3-4 times more work but credit is nearly stalled, becouse vallidating is slow on server,, or something other,,? Also incoming challenges willl need some preparing " on project side.. still dont see nobody from project ,, here ..
thank you for help with lunex
ID: 1278 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>EDLS]zOU

Send message
Joined: 18 Feb 20
Posts: 1
Credit: 1,906,837
RAC: 374
Message 1281 - Posted: 24 Apr 2020, 5:38:14 UTC - in response to Message 1269.  

Briilant, thank you !!!

This is the command line I use for a quick install on my ARM systems.

sudo wget https://github.com/sirzooro/RakeSearch/releases/download/RakeSearch10.v1.0/rakesearch10_armv7_neon_v10.tgz && tar xvzf rakesearch10_armv7_neon_v10.tgz && cp /root/app_info.xml /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/app_info.xml && cp /root/rakesearch10 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/rakesearch10 && /etc/init.d/boinc-client restart


This is how I made it:

1- verified instructions supported
cat /proc/cpuinfo
cat /proc/cpuinfo |grep -i flags <== to display only flags
cat /proc/cpuinfo |grep -i avx <== to highlight AVX
cat /proc/cpuinfo |grep -i sse <== to highlight SSE


2- get the correct archive (SSE3 here for my old Intel servers)
wget https://github.com/sirzooro/RakeSearch/releases/download/RakeSearch10.v1.0/rakesearch10_linux64_ssse3_v10.tgz

3- extract
tar xvzf rakesearch10_linux64_ssse3_v10.tgz

4- check where you are :)
pwd

5- copy the 2 files in the prject folder/directory
cp /root/app_info.xml /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/app_info.xml
cp /root/rakesearch10 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/rakesearch10

6- restart BOINC
/etc/init.d/boinc-client restart
ID: 1281 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Optimized RakeSearch app for rank 10


©2020 The searchers team, Karelian Research Center of the Russian Academy of Sciences