Message boards :
Number crunching :
Optimized RakeSearch app for rank 9 (computations finished)
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 8 Sep 17 Posts: 6 Credit: 1,174,843 RAC: 0 |
Additional test. The same two computers. this was about to be expected - in 32bit-mode only the first half of the registers is available. so only in rare cases you will see 32bit faster than 64bit when using sse-code. |
Send message Joined: 8 Sep 17 Posts: 99 Credit: 402,603,726 RAC: 0 |
Hi all. I have created apps for ARM CPUs. There are new app versions for ARMv7 (with and without NEON) instructions, and for AARCH64. Additionally I also created app for ARMv6, which was requested in the past. Here are results for new apps, measures on Odroid XU4 (ARM apps) and Odroid CU2 (AARCH64 one): ARM: 12m49.368s ARM+NEON: 9m56.425s AARCH64, NEON: 13m54.945s For comparison, here are results for previous version: ARM: 20m35.665s ARM+NEON: 15m57.060s AARCH64, NEON: 20m52.180s App for ARMv6 is a bit slower than ARMv7 one, so make sure you use ARMv7 app on ARMv7 CPU. During my work I also found bug in non-SSE 32-bit v1.1 apps for Windows and Linux. On ARM app with this bug hang, but on x86 it seems to work, thanks to undefined behavior of one assembler instruction. If you are using these apps (32-bit non-SSE for Windows or Linux) I strongly advice to download and install app again. Old version may hang or produce wrong results. This bug affects only non-SSE apps; SSE and AVX ones are OK. |
Send message Joined: 14 Dec 17 Posts: 11 Credit: 3,282,877 RAC: 959 |
have you planned building ARMv8 apps? because raspberry pi 3 has an ARMv8 processor |
Send message Joined: 8 Sep 17 Posts: 99 Credit: 402,603,726 RAC: 0 |
have you planned building ARMv8 apps? This board uses 64-bit CPU, so please use AARCH64 app. It works on Odroid C2 with ARMv8 CPU, so it should work for you too. |
Send message Joined: 18 Jan 18 Posts: 4 Credit: 52,193,414 RAC: 0 |
It will depend on the OS your Pi is using... If you're using Raspbian on your Pi 3 then you'll need the 32-bit ARM app, as it's a 32-bit OS and the CPU is running as an ARMv7 CPU. If you're running a 64-bit Linux (I use the 64-bit Ubuntu 18.04 on my Pi 3s) then you need the 64-bit ARM app (although it will also run the 32-bit ARM app if you've installed the 32-bit ARM OS architecture). Jon |
Send message Joined: 14 Dec 17 Posts: 11 Credit: 3,282,877 RAC: 959 |
thanks |
Send message Joined: 16 Apr 18 Posts: 2 Credit: 313,873 RAC: 0 |
Edit: problem solved |
Send message Joined: 15 Jan 18 Posts: 6 Credit: 1,237,249 RAC: 0 |
Are there any plans for an optimized version of the new app - "RakeSearch for rank 10 v1.00" ?? I have two of the new WUs still running after 100 minutes..... |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,406,895 RAC: 2,340 |
Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,, |
Send message Joined: 8 Sep 17 Posts: 99 Credit: 402,603,726 RAC: 0 |
Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,, I had a chance to peek on a new app code. It already uses many of optimizations implemented by me in rank 9 app. There is still place for some optimizations (for sure SSE/AVX can be added), but do not hold your breath - possible speedups will not be as spectacular as for rank 9 app. |
Send message Joined: 15 Jan 18 Posts: 6 Credit: 1,237,249 RAC: 0 |
Anything you can do would be very much appreciated. Some of the new units I've returned have taken over 2 hours to complete.... |
Send message Joined: 15 Feb 18 Posts: 1 Credit: 56,278,640 RAC: 75 |
If you have time, an armv6 and armv7 app will be great. I'm running a bunch of Raspberrys here. Thanks a lot for your effort. |
Send message Joined: 11 Aug 17 Posts: 648 Credit: 22,556,630 RAC: 13,441 |
Hi folks! In new workunits (for rank 10) much more squares per 1% - 10 millions versus 2.75 millions in workunits for rank 9. And for "making" square of rank 10 also need more work than for square rank 9. |
Send message Joined: 8 Sep 17 Posts: 22 Credit: 19,171,868 RAC: 12,035 |
Anything you can do would be very much appreciated. Run time is irrelevant for credit. |
Send message Joined: 15 Jan 18 Posts: 6 Credit: 1,237,249 RAC: 0 |
I'm not that fussed about credit, it's the spare time that's limited.... |
Send message Joined: 8 Sep 17 Posts: 22 Credit: 19,171,868 RAC: 12,035 |
Anything you can do would be very much appreciated. Run time is irrelevant for credit. |
Send message Joined: 27 Jun 18 Posts: 47 Credit: 9,875,775 RAC: 0 |
So, if there are 10 millions squares per 1%, then in each WU there are 1 billions squares? |
Send message Joined: 11 Aug 17 Posts: 648 Credit: 22,556,630 RAC: 13,441 |
So, if there are 10 millions squares per 1%, then in each WU there are 1 billions squares? 1 billion squares per workunits - is a very rough estimate for computation progress display. Actual number of squares in workunit between ~ 70 000 000 and ~1 700 000 000. |
Send message Joined: 11 Sep 17 Posts: 51 Credit: 194,406,895 RAC: 2,340 |
Hi, from yesterday i can get only R10 units , soo its look like R9 is done, or just web server stats are bit late after real run.. but with optimisation from daniel we done fantastic work and finnish R9 in first half of 2019. thank for all after long run on this project i can say: R9 with optimisation was more intensive on cpu , for exemple on VRM phase on X399 chipset, hitting 120C in last summer, but also in winter times help heating rooms..))(open case,cheapaiocooling) while now r10 is lower intensive on cpu and radically lower for vrm phase, witch is good for upcomming summer) But we hope in some ,small optimisation of r10, i run 2990wx on 3700mhz, 1.00 Vcore, its best "ratio" becouse after 3800mhz become THR heat/electr.hungry monster(on 24/7 100%load) for normal desktop use is not problem 4000mhz+ and on R9 it was AVX2 app from daniel, with Coreprio on win10, witch was really fast run and biggest day rac)) long term future is i will try migrate on linux with this 2990wx,for more reason.. As i see run times on R10, its not soo bad becouse R10 its bit diffrent as i read here on other thread, but we will see.. im optimistic |
Send message Joined: 27 Jun 18 Posts: 47 Credit: 9,875,775 RAC: 0 |
So, if there are 10 millions squares per 1%, then in each WU there are 1 billions squares? Just noticed I missed that answer. But any reason why each WU is so different? |
©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences