Posts by Stephen Uitti

1) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 660)
Posted 8 Dec 2018 by Profile Stephen Uitti
Post:
I recently got a new pi 3, and couldn't recall how to install the app. Google got me to my own instructions here. I had forgotten the restart. I now have the instructions local. Apparently, i use the neon version on pi 2 and pi 3.

My notes suggest that the neon version may be 10% faster than the other one on the pi 3. That's within experimental error.

I don't have a comparison on the pi 2. I only have neon times. Weird. I also started routinely overclocking the pi 2 about then.
The overclocked pi 2 shows 23% faster than the pi 3. Also weird.

Stephen.
2) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 659)
Posted 8 Dec 2018 by Profile Stephen Uitti
Post:
I recently got a new pi 3, and couldn't recall how to install the app. Google got me to my own instructions here. I had forgotten the restart. I now have the instructions local. Apparently, i use the neon version on pi 2 and pi 3.

My notes suggest that the neon version may be 10% faster than the other one on the pi 3. That's within experimental error.

I don't have a comparison on the pi 2. I only have neon times.
3) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 330)
Posted 12 Mar 2018 by Profile Stephen Uitti
Post:
Thanks Daniel. I grep'ed for sse2 on the Phenom, didn't think to grep for neon on the Arms.

It turns out that both the pi 2 and the pi 3 Arm processors support NEON. Both processor systems have completed units. The pi 2 and pi 3 systems have gotten credit for NEON units.
Pi zeros don't work with the accelerated apps. They error out right away. (I've turned them off.) One zero was running Jessie, and the other Stretch, but I'm sure it's the processor, not the OS.

I've verified that the AMD A8 is in fact running the AVX accelerated app, and is successful. It's about 20% slower than the Phenom II, which doesn't have AVX, and is running SSE2. It's not unusual for the A8 to run 20% faster or 20% slower than the Phenom II on different apps or benchmarks. I might try the SSE2 app on the A8. I time these by pasting 20 valid units stats into a spreadsheet, and averaging.

Stephen.
4) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 329)
Posted 12 Mar 2018 by Profile Stephen Uitti
Post:
Thanks Daniel. I grep'ed for sse2 on the Phenom, didn't think to grep for neon on the Arms.

It turns out that both the pi 2 and the pi 3 Arm processors support NEON. Both processor systems have completed units. The pi 2 and pi 3 systems have gotten credit for NEON units.
Pi zeros don't work with the accelerated apps. They error out right away. (I've turned them off.) One zero was running Jessie, and the other Stretch, but I'm sure it's the processor, not the OS.

I've verified that the AMD A8 is in fact running the AVX accelerated app, and is successful. It's about 20% slower than the Phenom II, which doesn't have AVX, and is running SSE2. It's not unusual for the A8 to run 20% faster or 20% slower than the Phenom II on different apps or benchmarks. I might try the SSE2 app on the A8. I time these by pasting 20 valid units stats into a spreadsheet, and averaging.

Stephen.
5) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 328)
Posted 12 Mar 2018 by Profile Stephen Uitti
Post:
Thanks Daniel. I grep'ed for sse2 on the Phenom, didn't think to grep for neon on the Arms.

It turns out that both the pi 2 and the pi 3 Arm processors support NEON. Both processor systems have completed units. The pi 2 and pi 3 systems have gotten credit for NEON units.
Pi zeros don't work with the accelerated apps. They error out right away. (I've turned them off.) One zero was running Jessie, and the other Stretch, but I'm sure it's the processor, not the OS.

I've verified that the AMD A8 is in fact running the AVX accelerated app, and is successful. It's about 20% slower than the Phenom II, which doesn't have AVX, and is running SSE2. It's not unusual for the A8 to run 20% faster or 20% slower than the Phenom II on different apps or benchmarks. I might try the SSE2 app on the A8. I time these by pasting 20 valid units stats into a spreadsheet, and averaging.

Stephen.
6) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 327)
Posted 12 Mar 2018 by Profile Stephen Uitti
Post:
Thanks Daniel. I grep'ed for sse2 on the Phenom, didn't think to grep for neon on the Arms.

It turns out that both the pi 2 and the pi 3 Arm processors support NEON. Both processor systems have completed units. The pi 2 and pi 3 systems have gotten credit for NEON units.
Pi zeros don't work with the accelerated apps. They error out right away. (I've turned them off.) One zero was running Jessie, and the other Stretch, but I'm sure it's the processor, not the OS.

I've verified that the AMD A8 is in fact running the AVX accelerated app, and is successful. It's about 20% slower than the Phenom II, which doesn't have AVX, and is running SSE2. It's not unusual for the A8 to run 20% faster or 20% slower than the Phenom II on different apps or benchmarks. I might try the SSE2 app on the A8. I time these by pasting 20 valid units stats into a spreadsheet, and averaging.

Stephen.
7) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 279)
Posted 8 Jan 2018 by Profile Stephen Uitti
Post:
I downloaded rakesearch_linux_arm_v7l.tgz from github.
On a pi 3 (not overclocked), with Raspbian Stretch, with boinc loaded
sudo apt-get install boinc
I ran the boinc manager and added Rakesearch (by URL). I ignored the warning "this project may not have units for your CPU" (or whatever it says). I usually run my pi 3's headless. I suppose i could have done it with boinccmd. The boinc manager showed me what was going on a bit quicker.

I then installed the application:
# get a root shell
sudo bash
# extract the binary:
cd /var/lib/boinc-client/projects/rake.boincfast.ru_rakesearch/
tar xvf ~pi/rakesearch_linux_arm_v7l.tgz
# exit the root shell
exit

Stopping and starting the boinc manager didn't work, so i restarted the pi
sudo shutdown -r

I let it download a couple units, which executed in 5 to 6 hours each.

I chose the arm_v7l version as it is the one that i expected to work on the pi 3. I don't expect it to work on a pi 2. I have a pi 2 that runs Jessie, and i'll give it a try soon. I also have a pi zero w, and could give that a shot.

I don't expect the NEON version (rakesearch_linux_arm_v7l_neon.tgz) to work on a pi 3. I might give it a try and see. It might possibly work with a 64 bit OS. That would be nice to know for sure, one way or the other. It might work on a banana pi or a higher end droid. I don't have either of these.

The above process is more or less the same as on the x86, which was smooth for me.

I'm running the rakesearch_linux_64_sse2.tgz version on an AMD Phenom (running Linux Mint 13). It's not young enough to support AVX. I also have an AMD A8 also on Mint 13, which does have AVX. I haven't attempted to run that as yet.

I've only looked at Arm optimization a little bit. It looks complicated, and like a ton of work. In particular, getting the data to move in and out of the processor while the processor does the work looks difficult to get right. Daniel has clearly gotten it right, so it very likely was a ton of work. Thanks, very much.

Stephen.




©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences