Message boards :
Number crunching :
Optimized RakeSearch app for rank 10
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Sep 17 Posts: 99 Credit: 402,603,726 RAC: 0 |
Hi all, I have found some free time and prepared optimized version of my app for RakeSearch Rank10. Results surprised me positively, optimized app is 3.5 times faster than original one :). Here are results from my machine with Intel Xeon E5-2683 v3 and Linux: Original 64-bit app: 3m 39.425s SSE2: 1m 4.225s SSSE3: 1m 2.555s AVX: 1m 2.554s AVX2: 1m 0.665s And here are times for optimized 32 bit apps on the same machine. Unfortunately there are no official 32-bit Linux app for comparison: NoSSE: 2m 40.668s SSE2: 1m 11.251s SSSE3: 1m 11.203s As you can see, 32-bit SSE apps are 2.3 times faster than non-SSE ones. This can explain why optimized 64-bit apps are much faster than original one, most of the speedup is gained thanks to SSE/AVX instructions. I also checked results on Windows running on i7-2600K machine and got this: Original 64-bit app: 2m 35.167s SSE2: 0m 53.271s SSSE3: 0m 50.523s AVX: 0m 51.195s And here are results for 32-bit apps: Original 32-bit app: 2m 58.987s NoSSE: 2m 5.546s SSE2: 1m 2.453s SSSE3: 0m 57.760s ARM and AARCH64 apps are also available. Here are results for apps tested on Odroid XU4 board with ARM CPU: Original app: 6m 51.444s ARMv7: 4m 27.991s ARMv7 NEON: 2m 44.874s ARMv6: 4m 28.414s And here are results for apps tested on Odroid CU2 board with AARCH64 CPU: Original app: 9m 20.327s AARCH64: 3m 41.197s ARMv7: 6m 12.147s ARMv7 NEON: 3m 40.030s ARMv6: 6m 9.990s I also found a way how to better approximate computation progress - you will no longer see WUs which ends at 50% or are stuck at 100% for hours. I did this by by generating all possible prefixes for 9 initial square cells and store them. Later I compare current initial 9 cells of current square with list of prefixes, and report position of prefix on the list as progress percent. This app version also supports bigger squares, up to rank 16. Is it enough to change Rank constant and recompile all apps. One exception here is ARM NEON app, which supports squares up to Rank 12. However update for ranks 13..16 is pretty straightforward, it is easy copy/paste/update stuff. I also added static_asserts in the code in places which will need update for higher ranks (mostly 17+), so compiler will tell you what needs update. Optimized app can be downloaded from GitHub: https://github.com/sirzooro/RakeSearch/releases/tag/RakeSearch10.v1.0. There are multiple app versions, compiled with support for different instruction sets. If you are not sure what your CPU supports, on Windows use CPU-Z, and on Linux check "flags" in /proc/cpuinfo file. In order to install this app, perform these steps: - close BOINC (config reload will not work); - unpack archive to project directory - on Windows it is path like "C:\Users\All Users\BOINC\projects\rake.boincfast.ru_rakesearch", on Linux /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ . On Linux also please make sure that rakesearch10 file is executable, and both rakesearch10 and app_info.xml are owned by boinc/boinc user/group; - start BOINC again. After doing this, in event log you should see entry for RakeSearch like "Found app_info.xml; using anonymous platform". Additionally you should see (Opti v1.0) in app name displayed in BOINC Mgr. All app versions checks if CPU and OS supports required instruction sets. If they are not, app will print appropriate error message and exit with code 1. AVX/AVX2 app versions requires at least Windows 7 SP1, Windows Server 2008 R2 SP1 or Linux with kernel 2.6.30. AVX512 app versions requires at least Windows 10, Windows Server 2016 or Linux with kernel 3.15. I am not sure about Windows versions, you can try if earlier versions can run it too. Short summary of each app versions: - SSE2 - this is base app version with SSE support; - SSSE3 (triple S) - it added shuffle instruction which allows to optimize bitmask calculations. This is some workaround and may be a bit slower than way used in SSE2 app, what I saw for 32-bit app; - AVX - this instruction set adds longer vectors and floating points instructions which use them. There are also some new logic instructions which looked promising. Unfortunately CPU frequency throttling caused by AVX register use was too big, so app was slower. Fortunately AVX also added some improvements for old SSE instructions, so AVX app is faster than SSSE3 one; - AVX2 - it added support for integer instructions, what I used. Additionally finally there is "shift by vector" instruction, so I could replace SSSE3 workaround with it. This app version also use BMI2 instructions, which also improves speed a bit; - AVX512 - this instruction set one more time adds longer vectors, but I do not use them in the app. Instead I use improved version of vector compare instruction, which allows to get result as bitmask directly, without using extra instructions as in earlier SSE/AVX apps. |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,388,032 RAC: 3,439 |
Very nice. thank you. but have problems on mx linux get permisions,, $ chmod 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ chmod: changing permissions of '/var/lib/boinc/projects/rake.boincfast.ru_rakesearch/': Operation not permitted know somebody how,,,? |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,388,032 RAC: 3,439 |
Very nice. thank you. but have problems on mx linux get permisions,, $ chmod 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ chmod: changing permissions of '/var/lib/boinc/projects/rake.boincfast.ru_rakesearch/': Operation not permitted know somebody how,,,? |
Send message Joined: 8 Sep 17 Posts: 3 Credit: 19,577,960 RAC: 16,011 |
sudo chmod -R 777 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/ |
Send message Joined: 8 Sep 17 Posts: 3 Credit: 19,577,960 RAC: 16,011 |
oops |
Send message Joined: 8 Sep 17 Posts: 3 Credit: 19,577,960 RAC: 16,011 |
oops |
Send message Joined: 8 Sep 17 Posts: 99 Credit: 402,603,726 RAC: 0 |
I set proper permissions on files before creating archives, so tar should set then properly when unpacking. It should be enough to verify them with "ls -l". |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,388,032 RAC: 3,439 |
I see we doing 3-4 times more work but credit is nearly stalled, becouse vallidating is slow on server,, or something other,,? Also incoming challenges willl need some preparing " on project side.. still dont see nobody from project ,, here .. thank you for help with lunex |
Send message Joined: 18 Feb 20 Posts: 1 Credit: 1,906,837 RAC: 0 |
Briilant, thank you !!! This is the command line I use for a quick install on my ARM systems. sudo wget https://github.com/sirzooro/RakeSearch/releases/download/RakeSearch10.v1.0/rakesearch10_armv7_neon_v10.tgz && tar xvzf rakesearch10_armv7_neon_v10.tgz && cp /root/app_info.xml /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/app_info.xml && cp /root/rakesearch10 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/rakesearch10 && /etc/init.d/boinc-client restart This is how I made it: 1- verified instructions supported cat /proc/cpuinfo cat /proc/cpuinfo |grep -i flags <== to display only flags cat /proc/cpuinfo |grep -i avx <== to highlight AVX cat /proc/cpuinfo |grep -i sse <== to highlight SSE 2- get the correct archive (SSE3 here for my old Intel servers) wget https://github.com/sirzooro/RakeSearch/releases/download/RakeSearch10.v1.0/rakesearch10_linux64_ssse3_v10.tgz 3- extract tar xvzf rakesearch10_linux64_ssse3_v10.tgz 4- check where you are :) pwd 5- copy the 2 files in the prject folder/directory cp /root/app_info.xml /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/app_info.xml cp /root/rakesearch10 /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/rakesearch10 6- restart BOINC /etc/init.d/boinc-client restart |
Send message Joined: 13 Sep 17 Posts: 2 Credit: 1,964,599 RAC: 462 |
8/2/2020 7:01:20 AM | Rake search of diagonal Latin squares | Message from server: Your app_info.xml file doesn't have a usable version of SAT-based search for orthogonal pairs of DLS of order 10. I have tried the AVX and AVX2 versions in Win 10 and get this error. Processor supports both. app_info looks fine to me. On startup BOINC does see it. 8/2/2020 6:55:09 AM | Rake search of diagonal Latin squares | Found app_info.xml; using anonymous platform <app_info> <app> <name>rakesearch10</name> <user_friendly_name>RakeSearch for rank 10 (Opti v1.0)</user_friendly_name> </app> <file_info> <name>rakesearch10.exe</name> <executable/> </file_info> <app_version> <app_name>rakesearch10</app_name> <version_num>301</version_num> <api_version>7.9.0</api_version> <platform>windows_x86_64</platform> <file_ref> <file_name>rakesearch10.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> |
Send message Joined: 13 Sep 17 Posts: 2 Credit: 1,964,599 RAC: 462 |
I reset the project and its working now. |
Send message Joined: 27 Jun 18 Posts: 47 Credit: 9,875,775 RAC: 0 |
This is an old thread, the Rank 10 search ended months ago. The current search, the SAT-based one is a different one, the current app is different. |
Send message Joined: 24 Aug 18 Posts: 6 Credit: 104,687 RAC: 26 |
I wonder if [B@P] Daniel would be willing and able to try to compile optimized apps for the current search, SAT-based search for orthogonal pairs of DLS of order 10. If optimized apps were to be made available, I'm sure the members here would highly appreciate it. |
Send message Joined: 24 Aug 18 Posts: 6 Credit: 104,687 RAC: 26 |
Sorry, double post. I only clicked it once, honest! |
Send message Joined: 8 Sep 17 Posts: 99 Credit: 402,603,726 RAC: 0 |
I wonder if [B@P] Daniel would be willing and able to try to compile optimized apps for the current search, SAT-based search for orthogonal pairs of DLS of order 10. I am going to do this. Unfortunately source code for new app is not released yet. |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,388,032 RAC: 3,439 |
This app is soo bizare,, unoptimised,, I can not run on my amd,, cpu is not much loaded (60 C)and "brake" gpu aps to run 10 times longer, also whole pc become slow and seek" They really need do something with this ,soon as possible. |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,388,032 RAC: 3,439 |
This app is soo bizare,, unoptimised,, I can not run on my amd,, cpu is not much loaded (60 C)and "brake" gpu aps to run 10 times longer, also whole pc become slow and seek" They really need do something with this ,soon as possible. |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,388,032 RAC: 3,439 |
Hi, can not get task on linux mx. can anybody see on that problem? |
©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences