21)
Questions and Answers :
Unix/Linux :
Tasks failing with code 193
(Message 938)
Posted 1 May 2019 by [B@P] Daniel Post: During last Formula BOINC challenge some workunits were incorrectly generated, and causes crash like this one. More details are here: https://rake.boincfast.ru/rakesearch/forum_thread.php?id=165#928 |
22)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 838)
Posted 8 Apr 2019 by [B@P] Daniel Post: have you planned building ARMv8 apps? This board uses 64-bit CPU, so please use AARCH64 app. It works on Odroid C2 with ARMv8 CPU, so it should work for you too. |
23)
Message boards :
Number crunching :
Congratulations, we are over 50%!
(Message 833)
Posted 2 Apr 2019 by [B@P] Daniel Post: 70% percent passed! Rake search of diagonal Latin squares of rank 9 (%) 70.311 |
24)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 797)
Posted 4 Mar 2019 by [B@P] Daniel Post: Hi all. I have created apps for ARM CPUs. There are new app versions for ARMv7 (with and without NEON) instructions, and for AARCH64. Additionally I also created app for ARMv6, which was requested in the past. Here are results for new apps, measures on Odroid XU4 (ARM apps) and Odroid CU2 (AARCH64 one): ARM: 12m49.368s ARM+NEON: 9m56.425s AARCH64, NEON: 13m54.945s For comparison, here are results for previous version: ARM: 20m35.665s ARM+NEON: 15m57.060s AARCH64, NEON: 20m52.180s App for ARMv6 is a bit slower than ARMv7 one, so make sure you use ARMv7 app on ARMv7 CPU. During my work I also found bug in non-SSE 32-bit v1.1 apps for Windows and Linux. On ARM app with this bug hang, but on x86 it seems to work, thanks to undefined behavior of one assembler instruction. If you are using these apps (32-bit non-SSE for Windows or Linux) I strongly advice to download and install app again. Old version may hang or produce wrong results. This bug affects only non-SSE apps; SSE and AVX ones are OK. |
25)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 776)
Posted 17 Feb 2019 by [B@P] Daniel Post: Thanks Daniel, You can download sample data file and shell script used to start test from https://github.com/sirzooro/RakeSearch/tree/boinc/RakeDiagSearch/RakeDiagSearch/test. It can be run directly in Linux. On Windows you will need to install Cygwin. You can also try MinGW or MSYS, they also should work, but I did not try to use them - I prefer Cygwin. |
26)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 773)
Posted 17 Feb 2019 by [B@P] Daniel Post: Есть компьютер с 3 гигабайтами памяти, процессором Intel Pentium Dual Core E2220 и новым пустым жёстким диском. Please use 64-bit OS and app, 64-bit software can use more registers and has SSE2 by default, so it usually is faster than 32-bit one. All my previous benchmarks were done on 64-bit Linux. Windows does not pin CPU-intensitive apps to one CPU core like Linux does, they are constantly floating between them. This adds extra overhead because of context switching, so Windows results are usually few percent worse than Linux ones. I did some benchmarking to see how 32-bit apps perform on my Haswell Xeon. This was also done on 64bit Linux. CPU-intensitive apps like this one do not have to perform many syscals or use system libraries a lot, so results for 32-bit apps should be similar on 32 and 64 bit systems. SSSE3 64-bit: real 4m2.163s user 4m0.198s sys 0m0.018s SSE2 64-bit: real 4m8.098s user 4m6.121s sys 0m0.032s SSSE3 32-bit: real 4m37.972s user 4m36.001s sys 0m0.032s SSE2 32-bit: real 4m56.755s user 4m54.779s sys 0m0.040s Non-SSE 32-bit: real 4m55.787s user 4m53.806s sys 0m0.044s As you can see, 32-bit apps are slower than 64-bit ones. Result for non-SSE app is a bit surprinsing, I suspected that limitations of 32-bit software combined with various CPU hardware optimizations are responsible for this. It would be interesting to see some benchmark results from old CPUs like your ones, unfortunately I do not have such machine. |
27)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 764)
Posted 12 Feb 2019 by [B@P] Daniel Post: @ Daniel No. New optimized app does not use PEXT instruction, so no need to build separate app version. Please use AVX2 version. |
28)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 761)
Posted 11 Feb 2019 by [B@P] Daniel Post: I have uploaded fixed SSE2 version, and SSSE3 version (notice triple S here, it is Supplemental SSE3). It turned out that with new compilation options these versions are a bit faster than previous "SSE2" version: SSE2: real 4m8.098s user 4m6.121s sys 0m0.032s SSSE3: real 4m2.163s user 4m0.198s sys 0m0.018s Previous "SSE2": real 4m14.850s user 4m12.858s sys 0m0.047s |
29)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 732)
Posted 5 Feb 2019 by [B@P] Daniel Post: I have checked this and found bug in compilation options. SSE2 app versions uses SSSE3 instructions which are not supported by your CPU, so they crash with error/signal "Illegal Instruction". I will release fixed app versions later today. Until then please use previous app version, or non-SSE one. |
30)
Message boards :
Number crunching :
Congratulations, we are over 50%!
(Message 724)
Posted 4 Feb 2019 by [B@P] Daniel Post: We are still running rank 9 at this moment, correct (the 50% completion mark)? All my ODSL pairs found are rank 9. Not so far, my rough estimate is that with new optimized app released yesterday search for rank 9 will be finished this year, probably sometime between 6 and 9 months from now. Both numbers are based on assumption that during last 8 days (since creation of this thread) project progressed by something between 1% and 1.5%, and new app is 30% faster. BTW, do you have example of rank 10 pair? My app is not ready for rank 10 yet. I would like to fix it, and need some test data to make sure it will work properly. |
31)
Questions and Answers :
Web site :
504 Gateway Time-out when posing on forum
(Message 723)
Posted 4 Feb 2019 by [B@P] Daniel Post: When I add reply to post on this forum, I get error 504 Gateway Time-out from nginx. Fortunately reply is added despite this error. One small inconvenience is that main forum page is not updated immnediately (last post date is not updated), update happen few minutes later. This happened to me when I posted reply to "Optimized RakeSearch app" thread. |
32)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 722)
Posted 4 Feb 2019 by [B@P] Daniel Post: Thanks Daniel for the updates! Have the AVX running on a couple older machines and the AVX2 on a Ryzen. Unfortunately Panda AV flagged both exe files on all 3 machines as a virus (and deleted them) and I had to exclude them to get them to run. Never had this happen with the older versions. Is it possible to clue in the Panda people concerning this? Hmm, interesting. I tried to scan them using Virus Total which allows ts scan file using 69 scanners (including Panda) and they are clear. MetaDefender (37 scanners) also confirmed this. Anyway, I sent info to Panda about this false positive. |
33)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 715)
Posted 3 Feb 2019 by [B@P] Daniel Post: Hi all, I have (finally!) released new version of my optimized RakeSearch app, Opti v1.1. It can be downloaded from here: https://github.com/sirzooro/RakeSearch/releases/tag/v1.1. Installation instruction is the same as before, so please refer to 1st post in this thread for details. There are 4 versions available as before: SSE2, AVX, AVX2 and AVX512. There are also apps for 32-bit Windows and Linux. For comparison, here are results for previous (Opti v1.0) version: SSE2: real 6m2.431s user 6m0.451s sys 0m0.030s AVX: real 5m45.740s user 5m43.759s sys 0m0.026s AVX2: real 5m24.624s user 5m22.626s sys 0m0.042s And this is for new version: SSE2: real 4m14.850s user 4m12.858s sys 0m0.047s AVX: real 3m58.809s user 3m56.813s sys 0m0.035s AVX2: real 3m51.881s user 3m49.885s sys 0m0.040s As you can see, new app version is about 30% faster than previous one. New AVX2 app does not use PEXT instruction. This means that this app version also on AMD Ryzen and Threadripper will be faster than AVX one. I also changed AVX512 app version a bit, now it uses new instructions which operate on old (SSE/AVX) registers only. This means that it will not suffer from CPU frequency throttling related to use of new AVX512 registers. I have some results captured for his app version. Unfortunately this machine had other things running on it, so results look high in comparison to ones above. Anyway, you can see that AVX512 app is the fastest: SSE2: real 6m47,701s user 0m0,000s sys 0m0,046s AVX: real 6m22,165s user 0m0,015s sys 0m0,062s AVX2: real 6m14,271s user 0m0,000s sys 0m0,031s AVX512: real 6m12,458s user 0m0,000s sys 0m0,062s I did not create ARM/AARCH64 app versions yet, I am going to release them soon. |
34)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 711)
Posted 30 Jan 2019 by [B@P] Daniel Post: Hi daniel, how is about new app? cant wait.. )) Things looks good, most things are ready now. If everything will go well, I will release new version this week. |
35)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 696)
Posted 15 Jan 2019 by [B@P] Daniel Post: I looked at the list of optimized apps, but I couldn't see the one that my computer needs, which is Linux 32-bit with SSE2. Hi, I am working on new version of my optimized app. I am going to release it before end of January. I will add 32-bit Linux versions together with other ones. |
36)
Message boards :
News :
Closing web registration
(Message 682)
Posted 3 Jan 2019 by [B@P] Daniel Post: This will not stop them. You need to enable invitation code, and post it somewhere - e.g. on front page or in pinned thread here. You can also provide it after request sent via email to you, but other projects usually do not require this extra step. Edit: I am not sure if invitation code can be entered when registering via client. You may need to disable registration via client if code cannot be entered there. |
37)
Message boards :
News :
Happy birthday to RakeSearch project!
(Message 558)
Posted 16 Aug 2018 by [B@P] Daniel Post: I wonder if it's possible to have the current code to run on GPUs, from what I know it should be doable: the memory requirements of the project are very low and the search is about comparing latin squares, nothing too complex. Square checking can be moved to GPU quite easily, and it is blazing fast there. Unfortunately square generator running on CPU is too slow to feed GPU and keep it loaded - I tested my code on 1080 and its load was only about 3%. So generator also must be moved to GPU, what is more complicated, as you have to run multiple generators in parallel. Here things gets complicated - search space has to be divided somehow in smaller pieces, so few hundreds of them could run in parallel, Also keep in mind that every generator instance needs some memory (about 370 bytes), plus some memory for output buffer. And this everything must somehow fit into local memory on GPU, which is very limited (only 32KB per compute unit is guaranteed). |
38)
Message boards :
Science :
Source code of the project application
(Message 437)
Posted 8 Jun 2018 by [B@P] Daniel Post: Hi, Please use MinGW compiler shipped with Cygwin. Makefiles created my me automatically uses it when you pass option MinGW=1 when calling make. BTW, tou will need to install both 32 and 64-bit versions of Cygwin (in separate directories), as 64-bit version does not have 32-bit libs needed for linking final app. You will also need to recompile BOINC libs as you wrote. This is a bit tricky, as you have to use boinc/lib/Makefile.mingw instead of one generated by configure script. You can also use my ones, you can download them from here: https://bitbucket.org/sirzooro/boinc-stuff/downloads/. |
39)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 346)
Posted 26 Mar 2018 by [B@P] Daniel Post: Hi Daniel, thanks for working on this! To clarify/summarize this long thread: Hi, Most information about this optimized app is provided in my first post in this thread, please check it. New app is about 10 times faster than original version (for AVX2 version running on Intel CPU). Other app versions for older CPUs are slower, but still a lot faster that original one - e.g. SSE2 version is about 9 times faster. It turned out that AVX2+BMI2 app on AMD Ryzen/Threadripper is slower than AVX one. I created new AVX2 app without PEXT instruction to address this. I did not get any feedback about its speed on AMD CPUs, so I do not know if it is really faster there (on Intel it is a bit faster than AVX app). Some time ago project admins announced that current optimized app will be released as a official one. They are going to do this after doing other planned tasks here. Optimized app can be installed as "anonymous platform" manually, so this is not highest priority for them now. I am still working on new version of optimized app. x86 app version is ready, I still have some work to do for ARM versions. |
40)
Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
(Message 332)
Posted 13 Mar 2018 by [B@P] Daniel Post: Thanks Daniel. I grep'ed for sse2 on the Phenom, didn't think to grep for neon on the Arms. Thanks for info. ARM app on Pi Zero crashed after receiving signal 4 - that is SIGILL, illegal instruction. Looks that there should be separate app for ARMv6, or non-NEON one should have some instruction sets disabled. I will look on this when I find some free time. |
©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences