Posts by [B@P] Daniel

41) Questions and Answers : Web site : Cannot display tak list (Message 308)
Posted 28 Jan 2018 by Profile [B@P] Daniel
Post:
This bug is fixed now in BOINC, you can selectively apply fix for it or upgrade everything to latest version.
42) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 306)
Posted 27 Jan 2018 by Profile [B@P] Daniel
Post:
is the optimized app now part of the official package ?

Not yet, but this is in plans.

BTW, I am going to release new optimized app version soon. Stay tuned!
43) Message boards : Science : Source code of the project application (Message 293)
Posted 14 Jan 2018 by Profile [B@P] Daniel
Post:
Thanks for answers!
Hello!

Hi,
I have few question about things which are not clear for me:
1. Is Generator::keyValue field needed? I checked few WUs and it always was set to -1 there. I wonder if this field and code which uses it can be removed;

This class member is required for store a value of cell[keyRowId][keyColumnId] at which the generator of diagonal squares must stop. It is a workunit border. With current partitioning of squares on workunits, keyValue is always -1.

Do you have any plans to start using non-negative values of keyValue in the future? I wonder if you could use longer path prefix instead, and remove this field completely.

2. Can I assume that every new WU at the beginning will have Generator::cellId set to 0?

Yes. Because cellId is an index of current cell in path - array Generator::path. But while working cellId changed, of course. You can see this situation in any checkpoint file.

3. Can I assume that every new WU at the beginning for every cell in Generator::path will have all corresponding values (bits) in Generator::cellsHistory set to 1 (i.e. all numbers from 0 to 8 can be potentially used for cell)?

In cellsHistory[rowId][columnId] the generator marks values, that used in current visit on this cell - Matrix[rowId][columnId]. When generator make a rollback to previous cell - it clear the history of cell. When generator go "into cell", cell history is clean, but real set of values, available for inserting into cell determine by flags in columns[value][columnId] and rows[rowId][value] arrays and cellsHistory, of course.

I asked these two questions because I wonder if I can assume a "clean slate" state at the beginning, i.e. only bits corresponding to cells in constant path prefix are changed, and all other would be changes by app during WU processing. I would like to use cellsHistory to store cell value candidates in similar way like in MovePairSearch::MoveRows. If in new WU cells on path would be partially processed, this would complicate things for me.

4. How do you check if result is valid - are you checking if files are binary identical, or examine contents more closely? I wonder what will happen if for some WU app will find two square pairs, but report them in different order than now. Will server be able to handle this? Or do I need to sort pairs in such case to make result "canonical"?

By binary equivalency. But applications use an identical Generator::path[] (that supplied in workunit file) and computations must produce the identical squares sets in identical order.

OK, so GPU app would have to take care of this (I started working on it). It would run few hundreds of generators (or maybe thousands?), each of them processing its part of search space. They would run in parallel, so order of results is no longer predetermined.

One more question: do you always generate WUs with all values on diagonals set? I checked few WUs and noticed this. I wanted to optimize Generator::Start by processing diagonal elements before non-diagonal ones and save some cycles consumed by processing of 'primary' and 'secondary' variables in latter part, but now I wonder if this part of code could be eliminated completely.

In workunits of current search all diagonals - primary and secondary completely filled. Excluding of if (rowid == columnId) and if (rowId == Rank - 1 - columnId), if I understand you right - may be a good idea.

Yes, this is what I meant. Generator is 2nd most time-consuming function (~40% of total time), so elimination of these checks would speedup everything.
44) Message boards : Science : Source code of the project application (Message 290)
Posted 14 Jan 2018 by Profile [B@P] Daniel
Post:
One more question: do you always generate WUs with all values on diagonals set? I checked few WUs and noticed this. I wanted to optimize Generator::Start by processing diagonal elements before non-diagonal ones and save some cycles consumed by processing of 'primary' and 'secondary' variables in latter part, but now I wonder if this part of code could be eliminated completely.
45) Message boards : Science : Source code of the project application (Message 288)
Posted 13 Jan 2018 by Profile [B@P] Daniel
Post:
Hi,
I have few question about things which are not clear for me:
1. Is Generator::keyValue field needed? I checked few WUs and it always was set to -1 there. I wonder if this field and code which uses it can be removed;
2. Can I assume that every new WU at the beginning will have Generator::cellId set to 0?
3. Can I assume that every new WU at the beginning for every cell in Generator::path will have all corresponding values (bits) in Generator::cellsHistory set to 1 (i.e. all numbers from 0 to 8 can be potentially used for cell)?
4. How do you check if result is valid - are you checking if files are binary identical, or examine contents more closely? I wonder what will happen if for some WU app will find two square pairs, but report them in different order than now. Will server be able to handle this? Or do I need to sort pairs in such case to make result "canonical"?
46) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 285)
Posted 9 Jan 2018 by Profile [B@P] Daniel
Post:
I run tests again several times with offset set the same for AVX2 and AVX512 to 0 =4300MHz and the results are:

AVX2
real 3m32,724s
user 0m0,000s
sys 0m0,015s

AVX512
real 3m25,637s
user 0m0,000s
sys 0m0,015s

AVX512 looks (and is) faster, but when interpolated it is basically the same as the last time with offset set to -3 =4000MHz, but now you can see it clock to clock.
BOINC and other CPU load intensive processes were suspended.

I am available for another test when needed.
Keep up the good work.

Thanks! These results looks reasonable, I was expecting something like this. Real WUs are about 6 times longer, so with AVX512 computations would complete about 40 seconds faster. PC running 24/7 would be able to complete 5 more WUs per core per day.
47) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 281)
Posted 8 Jan 2018 by Profile [B@P] Daniel
Post:
Sure, results for AVX2 and AVX512 on i9-7920X (offset for AVX2 is set to 4GHz, for AVX512 is set to 3.8GHz) under Windows 10:

AVX2
real 3m32,268s
user 0m0,000s
sys 0m0,000s

AVX512
real 3m40,743s
user 0m0,000s
sys 0m0,000s

Yes, times are correct. I could set the offset same for the benchmarking if it makes a difference later today.

Thanks for results. This is interesting, I thought that AVX512 version would be faster a bit. I wonder if it is really slower, or it was some random execution time variation. If you execute test few times (e.g. 3 times), you will see that numbers are different each time. CPU load also influences results. Could you repeat these tests few times with BOINC suspended to confirm if AVX512 version is really slower instead of faster?
48) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 280)
Posted 8 Jan 2018 by Profile [B@P] Daniel
Post:
I downloaded rakesearch_linux_arm_v7l.tgz from github.
On a pi 3 (not overclocked), with Raspbian Stretch, with boinc loaded
sudo apt-get install boinc
I ran the boinc manager and added Rakesearch (by URL). I ignored the warning "this project may not have units for your CPU" (or whatever it says). I usually run my pi 3's headless. I suppose i could have done it with boinccmd. The boinc manager showed me what was going on a bit quicker.

I then installed the application:
# get a root shell
sudo bash
# extract the binary:
cd /var/lib/boinc-client/projects/rake.boincfast.ru_rakesearch/
tar xvf ~pi/rakesearch_linux_arm_v7l.tgz
# exit the root shell
exit

Stopping and starting the boinc manager didn't work, so i restarted the pi
sudo shutdown -r

I let it download a couple units, which executed in 5 to 6 hours each.

I chose the arm_v7l version as it is the one that i expected to work on the pi 3. I don't expect it to work on a pi 2. I have a pi 2 that runs Jessie, and i'll give it a try soon. I also have a pi zero w, and could give that a shot.

I don't expect the NEON version (rakesearch_linux_arm_v7l_neon.tgz) to work on a pi 3. I might give it a try and see. It might possibly work with a 64 bit OS. That would be nice to know for sure, one way or the other. It might work on a banana pi or a higher end droid. I don't have either of these.

The above process is more or less the same as on the x86, which was smooth for me.

Good to hear that!~

If you want to check if your RPI supports NEON or not, please execute following command. If it will print something, it would mean that your CPU supports NEON instructions.

grep 'neon\|asimd' /proc/cpuinfo | head -1


I'm running the rakesearch_linux_64_sse2.tgz version on an AMD Phenom (running Linux Mint 13). It's not young enough to support AVX. I also have an AMD A8 also on Mint 13, which does have AVX. I haven't attempted to run that as yet.

I've only looked at Arm optimization a little bit. It looks complicated, and like a ton of work. In particular, getting the data to move in and out of the processor while the processor does the work looks difficult to get right. Daniel has clearly gotten it right, so it very likely was a ton of work. Thanks, very much.

Stephen.

Well, most of this complicated stuff is done by compiler :). I had to find proper intrinsics which will do what I need, and this was most complicated part for me. Beside this things are similar to SSE/AVX programming :)
49) Message boards : News : RakeSearch project technical update 2018-01-06 (Message 274)
Posted 7 Jan 2018 by Profile [B@P] Daniel
Post:
Minor update at 2018-01-07.

As you might noticed, number of tasks ready to send (line "Tasks ready to send" on the Server status page) became a significant smaller. This is a consequence of new mechanism of tasks generation. It starts up every 5 minutes and if tasks queue is empty then adds a new 32000 tasks in queue.

Please modify this a bit, to add new tasks when there is less that 1000 tasks in the queue. By doing so some tasks should always be in the queue.
50) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 273)
Posted 7 Jan 2018 by Profile [B@P] Daniel
Post:
Hi, I am trying running AVX512 app and it is not triggering my AVX512 offset set in BIOS on my i9-7920X cpu, it runs with offset for AVX2. Is this app really using AVX512?

edit: been observing it for a little longer and it occasionaly triggers AVX512 offset, but it is quite rare and only for a very short period of time.

Answer is more complicated. This app version in most performance-critical place uses new AVX512 instruction which works on old AVX registers. Beside this there are some places where memory blocks are copied, what uses AVX512 registers. However these copies are made rarely. This matches with what you are observing.

BTW, could you test performance of various app versions on your machine? In post linked below I wrote small instruction how to do this. I am mainly interested how AVX512 version compares with AVX2 one, I do not have any hardware to do such benchmark.
http://rake.boincfast.ru/rakesearch/forum_thread.php?id=39&postid=237
51) Message boards : Number crunching : ARM chip support: Raspberry Pi (Linux/Raspbian) or Android (Message 269)
Posted 31 Dec 2017 by Profile [B@P] Daniel
Post:
Hello Brian!

Excuse me for belated answer. Now there no plans, but of course, ARM platform can supply a big amount of computer power.


I am just curious. Three Months later since last reply. Any change in developing ARM chips support. As you stated the ARM platform can supply a big amount of computer power.

Please check the Optimized RakeSearch app thead here. I have released ARM apps there (for v7l CPU, with and without NEON), and for AARCH64 one. Let me know if they work for you, or if you need one for some older CPU.
52) Questions and Answers : Web site : Cannot display tak list (Message 257)
Posted 21 Dec 2017 by Profile [B@P] Daniel
Post:
This memory allocation error has been happening on quite a few projects for quite a few years.
The advice given is generally just to double the limit for the memory in the config file each time it occurs.

It is to do with the number of results the user has, the code may say only to load 20 but 'something' in the process is loading everything into memory.

Purging older results can help stop it from happening as well as speeding up the time it takes to load the page.

Thanks for info. I have checked list of open issues at GitHub BOINC repository, and found that they do not have open issue for this, so I logged them new one: https://github.com/BOINC/boinc/issues/2277.
53) Questions and Answers : Web site : Cannot display tak list (Message 253)
Posted 20 Dec 2017 by Profile [B@P] Daniel
Post:
I guess the system need an automatic process to purge the older WUs from this page ...

All WUs I computed since I started the project (November 30th) are still in this list ... I can't imagine how many entries the page has to handle for Daniel with such a high RAC ... :o

It may be surprising, but it is enough to load and process 20 entries at once. Plus one or few ones extra to display various total numbers there. As I checked in BOINC sever code, it should be already working this way. Something strange is going there.

And yes, periodic removal of old WUs would help to keep server load low. I read somewhere that BOINC server comes with script which can do this, so there is no need to reinvent the wheel.

Edit: BTW, there are people who have higher RAC than me :) http://rake.boincfast.ru/rakesearch/top_users.php
54) Questions and Answers : Web site : Cannot display tak list (Message 249)
Posted 17 Dec 2017 by Profile [B@P] Daniel
Post:
Hi,
I cannot display task list - I get following error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 2 bytes) in .../html/inc/db_conn.inc on line 125

It is a bit surprising that 128MB is not enough there. I checked original code and it loads only 20 results from DB at once. If you did not make any changes in this area, please log bug at https://github.com/BOINC/boinc/issues that something else consumes lots of memory.
55) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 248)
Posted 17 Dec 2017 by Profile [B@P] Daniel
Post:
On my Odroid-XU4 i get this error:
../../projects/rake.boincfast.ru_rakesearch/rakesearch: error while loading shared libraries: libboinc_api.so.7: cannot open shared object file: No such file or directory
http://rake.boincfast.ru/rakesearch/results.php?hostid=1797

I have rebuilt ARM apps and now this lib is linked statically. Please download new app, it should work now.
56) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 244)
Posted 15 Dec 2017 by Profile [B@P] Daniel
Post:
Hi Daniel, and Anyone Else involved in this Optimized App,

If this application's code works as well as some are reporting, then would this code be helpful for the other "Boinc" projects, or only for this project? If so, can this code be integrated into the Boinc software?

Best regards,
Phil
phd21

This code is specific to this project, so it cannot be integrated directly with other projects or Boinc itself. However other projects may review all changes done by me, get familiar with optimization techniques used by me and then apply them to their apps.

I only wonder about ODLK project, it also works with Latin Squares. Maybe it could directly integrate some code.
57) Message boards : News : New badges for total credit! (Message 240)
Posted 12 Dec 2017 by Profile [B@P] Daniel
Post:
I am also OK with removing these badges. When you will do this, I will have more motivation to get them back and make app even faster :)
58) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 237)
Posted 11 Dec 2017 by Profile [B@P] Daniel
Post:
I have fixed the avx2nopext app for WIndows, please try it again. Linux version was fine.

I also added NEON app version for ARM CPUs. It is about 22% faster than non-NEON one. Before installing it please check if your device supports NEON instructions - open /proc/cpuinfo file.and check if there is "neon" in "Features" line.

ARM:
real    20m37.322s
user    20m35.665s
sys     0m0.155s

ARM+NEON:
real    15m58.774s
user    15m57.060s
sys     0m0.080s


Edit: I have added test.tgz archive, which contains files needed to perform benchmark test. If you are using it, unpack this archive somewhere, copy rakesearch file to the same dir and run test.sh script.

It is also possible to test Windows apps. You need to install Cygwin, and then follow above steps. Please do not rename rakesearch.exe to rakesearch, Cygwin will be able to run it as-is.
Note: for some reason now Cygwin displays 0.000 as a user time, what is incorrect. It used to work properly when I was using Win7, I suspect that Win10 broke this.

Please post your results. I am especially interested how AVX512 app compares with other app versions.
59) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 224)
Posted 9 Dec 2017 by Profile [B@P] Daniel
Post:
PEXT instruction is very slow on Ryzen, as I wrote above. Please try avx2nopext app version, it does not use it, and should be a bit faster than AVX one for you.

I tried to a few days ago. but all end up immediately with a bug/error and then the project start blocking me from download new units.also change all my rest tasks in boinc.m to error task.. on ryzen 1700,1700x .. so i back to AVX after deatach project in boinc manager.. Soo i dont know..but i will do later new tests..))

Good to know that is does not work :) I suspect what may be wrong, but today I do not have access to my PC - I will do it tomorrow.
60) Message boards : Number crunching : Optimized RakeSearch app for rank 9 (computations finished) (Message 223)
Posted 9 Dec 2017 by Profile [B@P] Daniel
Post:
Could someone please explain the process for correctly unpacking the optimized app files in Linux? I have successfully downloaded and extracted the files to my desktop, but when attempting to place them in the rakesearch folder I hit a dead end. I must be going about this the wrong way. I am trying to use the same process as setting up a cc_config file and it is not working.

So far in Linux Mint Xfce 18.2:
(1) Download file
(2) Extract contents to desktop (couldn't figure out how to extract directly to the rakesearch folder as in Win 7)
(3) Tried using gksudo xed /var/lib/boinc-client/projects/rake.boincfast.ru_rakesearch/ to open the destination folder and add contents but no go.

Is the command wrong or do I need to add /home/skivelitis before /var /lib? Or as is most likely am I completely off-base? I have been using Linux for about a year now but only on dedicated number crunchers and am definitely a noob.
Thanks in advance.

I do not use desktop on Linux, only shell :) Here are required commands to execute. You may have to adjust paths and URLs:
su -
cd /var/lib/boinc/projects/rake.boincfast.ru_rakesearch/
wget https://github.com/sirzooro/RakeSearch/releases/download/v1.0/rakesearch_linux_64_avx.tgz
tar zxvf rakesearch_linux_64_avx.tgz
systemctl restart boinc-client

Above commands are enough to download, unpack and install AVX app on CentOS 7. You may have to adjust them a bit for your Linux version. You may have BOINC in /var/lib/boinc-client/... dir, and its service may be called boinc instead of boinc-client. BTW, Boinc prints path to its dir in event log when it starts, you can look for it there.


Previous 20 · Next 20

©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences