Optimized RakeSearch app for rank 9 (computations finished)

Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
frankhagen

Send message
Joined: 8 Sep 17
Posts: 6
Credit: 1,174,843
RAC: 0
Message 783 - Posted: 22 Feb 2019, 19:29:38 UTC - in response to Message 782.  

Additional test. The same two computers.
32 bit SSE2 rakesearch.exe

real    6m15,250s
user    0m0,000s
sys     0m0,015s

---------------------------

64 bit SSE2 rakesearch.exe

real    5m28,938s
user    0m0,000s
sys     0m0,015s


this was about to be expected - in 32bit-mode only the first half of the registers is available. so only in rare cases you will see 32bit faster than 64bit when using sse-code.
ID: 783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 797 - Posted: 4 Mar 2019, 22:49:28 UTC

Hi all.
I have created apps for ARM CPUs. There are new app versions for ARMv7 (with and without NEON) instructions, and for AARCH64. Additionally I also created app for ARMv6, which was requested in the past.

Here are results for new apps, measures on Odroid XU4 (ARM apps) and Odroid CU2 (AARCH64 one):

ARM:           12m49.368s
ARM+NEON:       9m56.425s
AARCH64, NEON: 13m54.945s


For comparison, here are results for previous version:

ARM:           20m35.665s
ARM+NEON:      15m57.060s
AARCH64, NEON: 20m52.180s


App for ARMv6 is a bit slower than ARMv7 one, so make sure you use ARMv7 app on ARMv7 CPU.

During my work I also found bug in non-SSE 32-bit v1.1 apps for Windows and Linux. On ARM app with this bug hang, but on x86 it seems to work, thanks to undefined behavior of one assembler instruction. If you are using these apps (32-bit non-SSE for Windows or Linux) I strongly advice to download and install app again. Old version may hang or produce wrong results. This bug affects only non-SSE apps; SSE and AVX ones are OK.
ID: 797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 14 Dec 17
Posts: 11
Credit: 3,282,877
RAC: 959
Message 837 - Posted: 7 Apr 2019, 10:09:08 UTC

have you planned building ARMv8 apps?
because raspberry pi 3 has an ARMv8 processor
ID: 837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 838 - Posted: 8 Apr 2019, 6:52:47 UTC - in response to Message 837.  

have you planned building ARMv8 apps?
because raspberry pi 3 has an ARMv8 processor

This board uses 64-bit CPU, so please use AARCH64 app. It works on Odroid C2 with ARMv8 CPU, so it should work for you too.
ID: 838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JonS

Send message
Joined: 18 Jan 18
Posts: 4
Credit: 52,193,414
RAC: 0
Message 839 - Posted: 8 Apr 2019, 13:36:58 UTC - in response to Message 837.  

It will depend on the OS your Pi is using...

If you're using Raspbian on your Pi 3 then you'll need the 32-bit ARM app, as it's a 32-bit OS and the CPU is running as an ARMv7 CPU.

If you're running a 64-bit Linux (I use the 64-bit Ubuntu 18.04 on my Pi 3s) then you need the 64-bit ARM app (although it will also run the 32-bit ARM app if you've installed the 32-bit ARM OS architecture).

Jon
ID: 839 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 14 Dec 17
Posts: 11
Credit: 3,282,877
RAC: 959
Message 844 - Posted: 9 Apr 2019, 5:24:48 UTC

thanks
ID: 844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Magiceye04

Send message
Joined: 16 Apr 18
Posts: 2
Credit: 313,873
RAC: 0
Message 850 - Posted: 10 Apr 2019, 19:38:14 UTC
Last modified: 10 Apr 2019, 19:52:38 UTC

Edit: problem solved
ID: 850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile entigy

Send message
Joined: 15 Jan 18
Posts: 6
Credit: 1,237,249
RAC: 0
Message 1000 - Posted: 2 Jun 2019, 17:38:19 UTC

Are there any plans for an optimized version of the new app - "RakeSearch for rank 10 v1.00" ??

I have two of the new WUs still running after 100 minutes.....
ID: 1000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 193,012,102
RAC: 1,899
Message 1006 - Posted: 3 Jun 2019, 9:01:02 UTC

Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,,
ID: 1006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 1008 - Posted: 3 Jun 2019, 10:54:27 UTC - in response to Message 1006.  

Hi, also run time on last R9 is begin be slower,dont know why... running last optimised avx , from daniel. but whole rake team do good work , hope optimised R10 will comming soon ,,

I had a chance to peek on a new app code. It already uses many of optimizations implemented by me in rank 9 app. There is still place for some optimizations (for sure SSE/AVX can be added), but do not hold your breath - possible speedups will not be as spectacular as for rank 9 app.
ID: 1008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile entigy

Send message
Joined: 15 Jan 18
Posts: 6
Credit: 1,237,249
RAC: 0
Message 1009 - Posted: 3 Jun 2019, 11:44:36 UTC - in response to Message 1008.  

Anything you can do would be very much appreciated.

Some of the new units I've returned have taken over 2 hours to complete....
ID: 1009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 15 Feb 18
Posts: 1
Credit: 56,270,033
RAC: 102
Message 1010 - Posted: 3 Jun 2019, 12:14:38 UTC

If you have time, an armv6 and armv7 app will be great. I'm running a bunch of Raspberrys here. Thanks a lot for your effort.
ID: 1010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 624
Credit: 20,551,851
RAC: 7,975
Message 1011 - Posted: 3 Jun 2019, 13:23:51 UTC

Hi folks! In new workunits (for rank 10) much more squares per 1% - 10 millions versus 2.75 millions in workunits for rank 9. And for "making" square of rank 10 also need more work than for square rank 9.
ID: 1011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Sep 17
Posts: 22
Credit: 18,374,830
RAC: 11,948
Message 1021 - Posted: 5 Jun 2019, 0:20:26 UTC - in response to Message 1009.  

Anything you can do would be very much appreciated.

Some of the new units I've returned have taken over 2 hours to complete....


Run time is irrelevant for credit.
ID: 1021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile entigy

Send message
Joined: 15 Jan 18
Posts: 6
Credit: 1,237,249
RAC: 0
Message 1023 - Posted: 5 Jun 2019, 7:31:50 UTC - in response to Message 1021.  

I'm not that fussed about credit, it's the spare time that's limited....
ID: 1023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 8 Sep 17
Posts: 22
Credit: 18,374,830
RAC: 11,948
Message 1026 - Posted: 5 Jun 2019, 10:08:26 UTC - in response to Message 1009.  

Anything you can do would be very much appreciated.

Some of the new units I've returned have taken over 2 hours to complete....


Run time is irrelevant for credit.
ID: 1026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Millenium

Send message
Joined: 27 Jun 18
Posts: 47
Credit: 9,875,775
RAC: 0
Message 1034 - Posted: 7 Jun 2019, 13:07:34 UTC

So, if there are 10 millions squares per 1%, then in each WU there are 1 billions squares?
ID: 1034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 624
Credit: 20,551,851
RAC: 7,975
Message 1036 - Posted: 7 Jun 2019, 21:07:10 UTC - in response to Message 1034.  
Last modified: 7 Jun 2019, 21:09:44 UTC

So, if there are 10 millions squares per 1%, then in each WU there are 1 billions squares?

1 billion squares per workunits - is a very rough estimate for computation progress display. Actual number of squares in workunit between ~ 70 000 000 and ~1 700 000 000.
ID: 1036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jozef j
Avatar

Send message
Joined: 11 Sep 17
Posts: 51
Credit: 193,012,102
RAC: 1,899
Message 1038 - Posted: 10 Jun 2019, 9:23:03 UTC

Hi, from yesterday i can get only R10 units , soo its look like R9 is done, or just web server stats are bit late after real run..
but with optimisation from daniel we done fantastic work and finnish R9 in first half of 2019. thank for all
after long run on this project i can say:
R9 with optimisation was more intensive on cpu , for exemple on VRM phase on X399 chipset, hitting 120C in last summer, but also in winter times help heating rooms..))(open case,cheapaiocooling)
while now r10 is lower intensive on cpu and radically lower for vrm phase, witch is good for upcomming summer) But we hope in some ,small optimisation of r10,
i run 2990wx on 3700mhz, 1.00 Vcore, its best "ratio" becouse after 3800mhz become THR heat/electr.hungry monster(on 24/7 100%load) for normal desktop use is not problem 4000mhz+ and on R9 it was AVX2 app from daniel, with Coreprio on win10, witch was really fast run and biggest day rac))
long term future is i will try migrate on linux with this 2990wx,for more reason..

As i see run times on R10, its not soo bad becouse R10 its bit diffrent as i read here on other thread, but we will see.. im optimistic
ID: 1038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Millenium

Send message
Joined: 27 Jun 18
Posts: 47
Credit: 9,875,775
RAC: 0
Message 1067 - Posted: 20 Jun 2019, 23:03:42 UTC - in response to Message 1036.  

So, if there are 10 millions squares per 1%, then in each WU there are 1 billions squares?

1 billion squares per workunits - is a very rough estimate for computation progress display. Actual number of squares in workunit between ~ 70 000 000 and ~1 700 000 000.

Just noticed I missed that answer. But any reason why each WU is so different?
ID: 1067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished)

©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences