Posts by [B@P] Daniel

21) Questions and Answers : Unix/Linux : Tasks failing with code 193 (Message 938)
Posted 1 May 2019 by Profile [B@P] Daniel
Post:
During last Formula BOINC challenge some workunits were incorrectly generated, and causes crash like this one. More details are here: https://rake.boincfast.ru/rakesearch/forum_thread.php?id=165#928
22) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 838)
Posted 8 Apr 2019 by Profile [B@P] Daniel
Post:
have you planned building ARMv8 apps?
because raspberry pi 3 has an ARMv8 processor

This board uses 64-bit CPU, so please use AARCH64 app. It works on Odroid C2 with ARMv8 CPU, so it should work for you too.
23) Message boards : Number crunching : Congratulations, we are over 50%! (Message 833)
Posted 2 Apr 2019 by Profile [B@P] Daniel
Post:
70% percent passed!
Rake search of diagonal Latin squares of rank 9 (%) 70.311
24) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 797)
Posted 4 Mar 2019 by Profile [B@P] Daniel
Post:
Hi all.
I have created apps for ARM CPUs. There are new app versions for ARMv7 (with and without NEON) instructions, and for AARCH64. Additionally I also created app for ARMv6, which was requested in the past.

Here are results for new apps, measures on Odroid XU4 (ARM apps) and Odroid CU2 (AARCH64 one):

ARM:           12m49.368s
ARM+NEON:       9m56.425s
AARCH64, NEON: 13m54.945s


For comparison, here are results for previous version:

ARM:           20m35.665s
ARM+NEON:      15m57.060s
AARCH64, NEON: 20m52.180s


App for ARMv6 is a bit slower than ARMv7 one, so make sure you use ARMv7 app on ARMv7 CPU.

During my work I also found bug in non-SSE 32-bit v1.1 apps for Windows and Linux. On ARM app with this bug hang, but on x86 it seems to work, thanks to undefined behavior of one assembler instruction. If you are using these apps (32-bit non-SSE for Windows or Linux) I strongly advice to download and install app again. Old version may hang or produce wrong results. This bug affects only non-SSE apps; SSE and AVX ones are OK.
25) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 776)
Posted 17 Feb 2019 by Profile [B@P] Daniel
Post:
Thanks Daniel,

I will install 64 bit Windows.

And how to make such benchmarks?

You can download sample data file and shell script used to start test from https://github.com/sirzooro/RakeSearch/tree/boinc/RakeDiagSearch/RakeDiagSearch/test. It can be run directly in Linux. On Windows you will need to install Cygwin. You can also try MinGW or MSYS, they also should work, but I did not try to use them - I prefer Cygwin.
26) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 773)
Posted 17 Feb 2019 by Profile [B@P] Daniel
Post:
Есть компьютер с 3 гигабайтами памяти, процессором Intel Pentium Dual Core E2220 и новым пустым жёстким диском.
Планируется поставить Windows 7 на него.
А какой лучше поставить?
32 или 64 разрядный?
32 или 64 разрядные оптимизированные под SSSE3 приложения RakeSearch будут считать быстрее?
Проводились ли замеры скорости счёта на одном и том же компьютере но под 32 и под 64 разрядный Windows?

I have a computer with 3 gigabytes of memory, Intel Pentium Dual Core E2220 processor and a new empty hard drive.
I planned to install Windows 7 on this computer.
What's version of Windows 7 is the best to install?
32 or 64 bit?
32 or 64 bit SSSE3-optimized RakeSearch applications will be count faster?
Whether measurements were carried out of speed of crunching on the same computer but under 32 and under 64 bit Windows?

Please use 64-bit OS and app, 64-bit software can use more registers and has SSE2 by default, so it usually is faster than 32-bit one.

All my previous benchmarks were done on 64-bit Linux.

Windows does not pin CPU-intensitive apps to one CPU core like Linux does, they are constantly floating between them. This adds extra overhead because of context switching, so Windows results are usually few percent worse than Linux ones.

I did some benchmarking to see how 32-bit apps perform on my Haswell Xeon. This was also done on 64bit Linux. CPU-intensitive apps like this one do not have to perform many syscals or use system libraries a lot, so results for 32-bit apps should be similar on 32 and 64 bit systems.

SSSE3 64-bit:
real    4m2.163s
user    4m0.198s
sys     0m0.018s

SSE2 64-bit:
real    4m8.098s
user    4m6.121s
sys     0m0.032s

SSSE3 32-bit:
real    4m37.972s
user    4m36.001s
sys     0m0.032s

SSE2 32-bit:
real    4m56.755s
user    4m54.779s
sys     0m0.040s

Non-SSE 32-bit:
real    4m55.787s
user    4m53.806s
sys     0m0.044s


As you can see, 32-bit apps are slower than 64-bit ones. Result for non-SSE app is a bit surprinsing, I suspected that limitations of 32-bit software combined with various CPU hardware optimizations are responsible for this. It would be interesting to see some benchmark results from old CPUs like your ones, unfortunately I do not have such machine.
27) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 764)
Posted 12 Feb 2019 by Profile [B@P] Daniel
Post:
@ Daniel

Will there be also a "AVX2 NOPEXT" v1.1 for AMD? :thumbsup:

No. New optimized app does not use PEXT instruction, so no need to build separate app version. Please use AVX2 version.
28) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 761)
Posted 11 Feb 2019 by Profile [B@P] Daniel
Post:
I have uploaded fixed SSE2 version, and SSSE3 version (notice triple S here, it is Supplemental SSE3). It turned out that with new compilation options these versions are a bit faster than previous "SSE2" version:

SSE2:
real    4m8.098s
user    4m6.121s
sys     0m0.032s

SSSE3:
real    4m2.163s
user    4m0.198s
sys     0m0.018s

Previous "SSE2":
real    4m14.850s
user    4m12.858s
sys     0m0.047s
29) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 732)
Posted 5 Feb 2019 by Profile [B@P] Daniel
Post:
I have checked this and found bug in compilation options. SSE2 app versions uses SSSE3 instructions which are not supported by your CPU, so they crash with error/signal "Illegal Instruction". I will release fixed app versions later today. Until then please use previous app version, or non-SSE one.
30) Message boards : Number crunching : Congratulations, we are over 50%! (Message 724)
Posted 4 Feb 2019 by Profile [B@P] Daniel
Post:
We are still running rank 9 at this moment, correct (the 50% completion mark)? All my ODSL pairs found are rank 9.

So there is a possibility to run rank 10 search in the future?

Yes, search on space or diagonal latin squares of rank 9 is performed now. Search on space of rank 10 is also possible and interesting, but now it is a far future.

Happy crunching! :)

Not so far, my rough estimate is that with new optimized app released yesterday search for rank 9 will be finished this year, probably sometime between 6 and 9 months from now. Both numbers are based on assumption that during last 8 days (since creation of this thread) project progressed by something between 1% and 1.5%, and new app is 30% faster.

BTW, do you have example of rank 10 pair? My app is not ready for rank 10 yet. I would like to fix it, and need some test data to make sure it will work properly.
31) Questions and Answers : Web site : 504 Gateway Time-out when posing on forum (Message 723)
Posted 4 Feb 2019 by Profile [B@P] Daniel
Post:
When I add reply to post on this forum, I get error 504 Gateway Time-out from nginx. Fortunately reply is added despite this error. One small inconvenience is that main forum page is not updated immnediately (last post date is not updated), update happen few minutes later.

This happened to me when I posted reply to "Optimized RakeSearch app" thread.
32) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 722)
Posted 4 Feb 2019 by Profile [B@P] Daniel
Post:
Thanks Daniel for the updates! Have the AVX running on a couple older machines and the AVX2 on a Ryzen. Unfortunately Panda AV flagged both exe files on all 3 machines as a virus (and deleted them) and I had to exclude them to get them to run. Never had this happen with the older versions. Is it possible to clue in the Panda people concerning this?

Hmm, interesting. I tried to scan them using Virus Total which allows ts scan file using 69 scanners (including Panda) and they are clear. MetaDefender (37 scanners) also confirmed this. Anyway, I sent info to Panda about this false positive.
33) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 715)
Posted 3 Feb 2019 by Profile [B@P] Daniel
Post:
Hi all,
I have (finally!) released new version of my optimized RakeSearch app, Opti v1.1. It can be downloaded from here: https://github.com/sirzooro/RakeSearch/releases/tag/v1.1. Installation instruction is the same as before, so please refer to 1st post in this thread for details.

There are 4 versions available as before: SSE2, AVX, AVX2 and AVX512. There are also apps for 32-bit Windows and Linux.

For comparison, here are results for previous (Opti v1.0) version:

SSE2:
real    6m2.431s
user    6m0.451s
sys     0m0.030s

AVX:
real    5m45.740s
user    5m43.759s
sys     0m0.026s

AVX2:
real    5m24.624s
user    5m22.626s
sys     0m0.042s


And this is for new version:

SSE2:
real    4m14.850s
user    4m12.858s
sys     0m0.047s

AVX:
real    3m58.809s
user    3m56.813s
sys     0m0.035s

AVX2:
real    3m51.881s
user    3m49.885s
sys     0m0.040s


As you can see, new app version is about 30% faster than previous one.

New AVX2 app does not use PEXT instruction. This means that this app version also on AMD Ryzen and Threadripper will be faster than AVX one.

I also changed AVX512 app version a bit, now it uses new instructions which operate on old (SSE/AVX) registers only. This means that it will not suffer from CPU frequency throttling related to use of new AVX512 registers. I have some results captured for his app version. Unfortunately this machine had other things running on it, so results look high in comparison to ones above. Anyway, you can see that AVX512 app is the fastest:

SSE2:
real    6m47,701s
user    0m0,000s
sys     0m0,046s

AVX:
real    6m22,165s
user    0m0,015s
sys     0m0,062s

AVX2:
real    6m14,271s
user    0m0,000s
sys     0m0,031s

AVX512:
real    6m12,458s
user    0m0,000s
sys     0m0,062s


I did not create ARM/AARCH64 app versions yet, I am going to release them soon.
34) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 711)
Posted 30 Jan 2019 by Profile [B@P] Daniel
Post:
Hi daniel, how is about new app? cant wait.. ))
i use just AVX ,still best on win 10 with THR 2 cpu.. i experimented with coreprio last month but for boinc is not there improvement.. maybe for gamers or web. ,, i try also cinebench and other tests from hwboot but different was 100-200 points in cine. for exemple..
Thank you much for work.

Things looks good, most things are ready now. If everything will go well, I will release new version this week.
35) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 696)
Posted 15 Jan 2019 by Profile [B@P] Daniel
Post:
I looked at the list of optimized apps, but I couldn't see the one that my computer needs, which is Linux 32-bit with SSE2.
Whereabouts on the page is it?

Can one be built for me please?

Hi,
I am working on new version of my optimized app. I am going to release it before end of January. I will add 32-bit Linux versions together with other ones.
36) Message boards : News : Closing web registration (Message 682)
Posted 3 Jan 2019 by Profile [B@P] Daniel
Post:
This will not stop them. You need to enable invitation code, and post it somewhere - e.g. on front page or in pinned thread here. You can also provide it after request sent via email to you, but other projects usually do not require this extra step.

Edit: I am not sure if invitation code can be entered when registering via client. You may need to disable registration via client if code cannot be entered there.
37) Message boards : News : Happy birthday to RakeSearch project! (Message 558)
Posted 16 Aug 2018 by Profile [B@P] Daniel
Post:
I wonder if it's possible to have the current code to run on GPUs, from what I know it should be doable: the memory requirements of the project are very low and the search is about comparing latin squares, nothing too complex.

Perhaps, but there is no news on this subject yet.

Thank you for participation!

Square checking can be moved to GPU quite easily, and it is blazing fast there. Unfortunately square generator running on CPU is too slow to feed GPU and keep it loaded - I tested my code on 1080 and its load was only about 3%. So generator also must be moved to GPU, what is more complicated, as you have to run multiple generators in parallel. Here things gets complicated - search space has to be divided somehow in smaller pieces, so few hundreds of them could run in parallel, Also keep in mind that every generator instance needs some memory (about 370 bytes), plus some memory for output buffer. And this everything must somehow fit into local memory on GPU, which is very limited (only 32KB per compute unit is guaranteed).
38) Message boards : Science : Source code of the project application (Message 437)
Posted 8 Jun 2018 by Profile [B@P] Daniel
Post:
Hi,
Please use MinGW compiler shipped with Cygwin. Makefiles created my me automatically uses it when you pass option MinGW=1 when calling make. BTW, tou will need to install both 32 and 64-bit versions of Cygwin (in separate directories), as 64-bit version does not have 32-bit libs needed for linking final app.

You will also need to recompile BOINC libs as you wrote. This is a bit tricky, as you have to use boinc/lib/Makefile.mingw instead of one generated by configure script. You can also use my ones, you can download them from here: https://bitbucket.org/sirzooro/boinc-stuff/downloads/.
39) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 346)
Posted 26 Mar 2018 by Profile [B@P] Daniel
Post:
Hi Daniel, thanks for working on this! To clarify/summarize this long thread:
What's the current status/version of the optimized app, how much faster is it than the stock version and approximately when is it due to be made default?

Repository for optimized versions: https://github.com/sirzooro/RakeSearch/releases/tag/v1.0

Any thoughts on which version is currently best for most 64bit machines?


Hi,
Most information about this optimized app is provided in my first post in this thread, please check it.

New app is about 10 times faster than original version (for AVX2 version running on Intel CPU). Other app versions for older CPUs are slower, but still a lot faster that original one - e.g. SSE2 version is about 9 times faster.

It turned out that AVX2+BMI2 app on AMD Ryzen/Threadripper is slower than AVX one. I created new AVX2 app without PEXT instruction to address this. I did not get any feedback about its speed on AMD CPUs, so I do not know if it is really faster there (on Intel it is a bit faster than AVX app).

Some time ago project admins announced that current optimized app will be released as a official one. They are going to do this after doing other planned tasks here. Optimized app can be installed as "anonymous platform" manually, so this is not highest priority for them now.

I am still working on new version of optimized app. x86 app version is ready, I still have some work to do for ARM versions.
40) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 332)
Posted 13 Mar 2018 by Profile [B@P] Daniel
Post:
Thanks Daniel. I grep'ed for sse2 on the Phenom, didn't think to grep for neon on the Arms.

It turns out that both the pi 2 and the pi 3 Arm processors support NEON. Both processor systems have completed units. The pi 2 and pi 3 systems have gotten credit for NEON units.
Pi zeros don't work with the accelerated apps. They error out right away. (I've turned them off.) One zero was running Jessie, and the other Stretch, but I'm sure it's the processor, not the OS.

I've verified that the AMD A8 is in fact running the AVX accelerated app, and is successful. It's about 20% slower than the Phenom II, which doesn't have AVX, and is running SSE2. It's not unusual for the A8 to run 20% faster or 20% slower than the Phenom II on different apps or benchmarks. I might try the SSE2 app on the A8. I time these by pasting 20 valid units stats into a spreadsheet, and averaging.

Stephen.

Thanks for info. ARM app on Pi Zero crashed after receiving signal 4 - that is SIGILL, illegal instruction. Looks that there should be separate app for ARMv6, or non-NEON one should have some instruction sets disabled. I will look on this when I find some free time.


Previous 20 · Next 20

©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences