Source code of the project application

Message boards : Science : Source code of the project application
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 161 - Posted: 12 Nov 2017, 23:28:26 UTC

I have added 3 more commits today. With them in AVX app version MovePairSearch::MoveRows() takes ~61% of CPU time. AVX2 version is even faster, this function uses ~46% of CPU time. In that version Generator::Start() moved to 1st spot, as it takes ~48% of CPU time :) I saw that it also could benefit from bit operations, so I will optimize it too.

I also added some AVX512 code, and I noticed that gcc generates few less instructions when this instruction set is enabled. App on hosts with newest CPUs should be even faster. However I am not sure if anyone here would be able to run it - new CPU is not enough, user also has to use OS which supports AVX512 - at least Linux with kernel 3.15, Windows 10 or Windows Server 2016 (I am not sure about Windows, maybe MS has decided to add support to some earlier versions too). There are few hosts with Xeon Gold on "Top hosts" list, but they use Windows Server 2012, so most probably they will not be able to use these new instructions.
ID: 161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 162 - Posted: 13 Nov 2017, 20:26:43 UTC

Yesterday I thought that I am done with MovePairSearch::MoveRows(). I was wrong :). Today I applied two more optimizations, and AVX2 version of it is at ~43% of CPU now. One of them also opens way to vectorization with SSE2, I will apply it later.
ID: 162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,883,338
RAC: 11,313
Message 163 - Posted: 14 Nov 2017, 5:21:44 UTC - in response to Message 162.  

Daniel, I made a small workunit on ~ 38 millions of squares and about 30 minutes of work for unoptimized application. During the processing, a pair of the ODLS must be finded as in sample result.txt.

Thank you for work!
ID: 163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 164 - Posted: 14 Nov 2017, 19:51:32 UTC - in response to Message 163.  

Daniel, I made a small workunit on ~ 38 millions of squares and about 30 minutes of work for unoptimized application. During the processing, a pair of the ODLS must be finded as in sample result.txt.

Thank you for work!

Thanks! I will try it.
ID: 164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 168 - Posted: 23 Nov 2017, 21:11:56 UTC

I have problem with this small workunit - MovePairSearch::Read throws exception in first do/while loop.

I also found that Generator::Start at the end has line "if (newSquare.Matrix[keyRowId][keyColumnId] == Square::Empty && cellId < 0)". Looks that it is enough to check if "cellId < 0", otherwise app could read data before buffer. Could you confirm if I can remove 1st part of this condition?
ID: 168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 169 - Posted: 24 Nov 2017, 21:55:08 UTC
Last modified: 24 Nov 2017, 22:11:25 UTC

I have found why this sample file did not work for me - it had windows line endings. After converting it to unix format it started working.

I did small benchmark on my machine using this sample workunit and got following results. I tried to benchmark original app, but I do not have it. I tried to attach new machine, but server stopped sending new work to me - looks that I have too many tasks which are pending validation (>2k).
SSE2:
real    1m26.704s
user    1m24.704s
sys     0m0.004s

AVX:
real    1m27.987s
user    1m25.985s
sys     0m0.005s

AVX2+BMI2:
real    1m20.868s
user    1m18.872s
sys     0m0.003s

Current app version is probably about 4 times faster than original, and most WUs are completed in less than hour.

How is your code review? Probably it is time to start public beta, as I started having problems with too many pending tasks.

BTW, Some extra optimizations for Generator::Start may be possible, but this may affect data stored in checkpoint file (and potentially in input file too). I am checking this this. So far everything is backward-compatible and looks that it works fine, I do not have any invalid results from my app.
ID: 169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,883,338
RAC: 11,313
Message 170 - Posted: 25 Nov 2017, 5:55:01 UTC - in response to Message 169.  
Last modified: 25 Nov 2017, 5:55:40 UTC

Daniel, hello!

I reviewed the code. It was very interesting for me and I would like to understand it (and "mechanics" of commands like AVX and SSE) more deeper. And your application works fast, correct and found many ODLS. Could you to share this application and post a new thread in Number crunching (for example) with information about it? After this will to publish the news with a link to you thread.

Great work! Thank you!
ID: 170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,883,338
RAC: 11,313
Message 171 - Posted: 25 Nov 2017, 7:26:32 UTC - in response to Message 169.  

I tried to attach new machine, but server stopped sending new work to me - looks that I have too many tasks which are pending validation (>2k).

Daniel, could you try to attach a new machine now? I generated a new bunch of tasks, may be the problem was in number of workunits without results on you machines. (For example, if any task in queue has a pair on your hosts).
ID: 171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 173 - Posted: 25 Nov 2017, 20:55:03 UTC - in response to Message 171.  
Last modified: 25 Nov 2017, 20:57:26 UTC

I tried to attach new machine, but server stopped sending new work to me - looks that I have too many tasks which are pending validation (>2k).

Daniel, could you try to attach a new machine now? I generated a new bunch of tasks, may be the problem was in number of workunits without results on you machines. (For example, if any task in queue has a pair on your hosts).

I tried to downloaded new tasks on already attached ones, and got some. I have run test WU with official app, and got following results. Looks that AVX2 app process this sample WU 10 times faster :)
Original:
real    13m29.530s
user    13m27.579s
sys     0m0.027s


Daniel, hello!

I reviewed the code. It was very interesting for me and I would like to understand it (and "mechanics" of commands like AVX and SSE) more deeper. And your application works fast, correct and found many ODLS. Could you to share this application and post a new thread in Number crunching (for example) with information about it? After this will to publish the news with a link to you thread.

Great work! Thank you!

I have added new post here: http://rake.boincfast.ru/rakesearch/forum_thread.php?id=39.

Please make sure you check all commits on my branch, there are few commits there which are not linked from this thread.

Here is summary of my changes:
- use bits and bitwise operations to track elements of square;
- do not calculate things which are not necessary. Usually this means calculating things once before loop, calculating them as late as possible (this was usual case here), or in place where it make most sense (check exit condition after decrementing variable, instead of every loop iteration);
- simplify algorithm - on "step forward" only set new values, and revert to previous state on "step backward";
- use SSE/AVX to calculate things in parallel;
- use metaprogramming (templates), there was one place where "if" could be moved before whole function.

Regarding SSE/AVX, unfortunately there are no good tutorials on the net, I only saw few simple ones how to sum or multiply elements two arrays. Most of my knowledge comes from studying various examples on stackoverflow and other sites which appeared in Google. And of course Intel Intrinsics Guide, it contains detailed description of every intrinsic: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#.
You can also look on Wikipedia pages about SSE and AVX, they contain summary of things added in every new instruction set.
ID: 173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 288 - Posted: 13 Jan 2018, 20:38:01 UTC

Hi,
I have few question about things which are not clear for me:
1. Is Generator::keyValue field needed? I checked few WUs and it always was set to -1 there. I wonder if this field and code which uses it can be removed;
2. Can I assume that every new WU at the beginning will have Generator::cellId set to 0?
3. Can I assume that every new WU at the beginning for every cell in Generator::path will have all corresponding values (bits) in Generator::cellsHistory set to 1 (i.e. all numbers from 0 to 8 can be potentially used for cell)?
4. How do you check if result is valid - are you checking if files are binary identical, or examine contents more closely? I wonder what will happen if for some WU app will find two square pairs, but report them in different order than now. Will server be able to handle this? Or do I need to sort pairs in such case to make result "canonical"?
ID: 288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 290 - Posted: 14 Jan 2018, 16:35:55 UTC

One more question: do you always generate WUs with all values on diagonals set? I checked few WUs and noticed this. I wanted to optimize Generator::Start by processing diagonal elements before non-diagonal ones and save some cycles consumed by processing of 'primary' and 'secondary' variables in latter part, but now I wonder if this part of code could be eliminated completely.
ID: 290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,883,338
RAC: 11,313
Message 291 - Posted: 14 Jan 2018, 21:29:45 UTC - in response to Message 288.  

Hello!

Hi,
I have few question about things which are not clear for me:
1. Is Generator::keyValue field needed? I checked few WUs and it always was set to -1 there. I wonder if this field and code which uses it can be removed;

This class member is required for store a value of cell[keyRowId][keyColumnId] at which the generator of diagonal squares must stop. It is a workunit border. With current partitioning of squares on workunits, keyValue is always -1.

2. Can I assume that every new WU at the beginning will have Generator::cellId set to 0?

Yes. Because cellId is an index of current cell in path - array Generator::path. But while working cellId changed, of course. You can see this situation in any checkpoint file.

3. Can I assume that every new WU at the beginning for every cell in Generator::path will have all corresponding values (bits) in Generator::cellsHistory set to 1 (i.e. all numbers from 0 to 8 can be potentially used for cell)?

In cellsHistory[rowId][columnId] the generator marks values, that used in current visit on this cell - Matrix[rowId][columnId]. When generator make a rollback to previous cell - it clear the history of cell. When generator go "into cell", cell history is clean, but real set of values, available for inserting into cell determine by flags in columns[value][columnId] and rows[rowId][value] arrays and cellsHistory, of course.

4. How do you check if result is valid - are you checking if files are binary identical, or examine contents more closely? I wonder what will happen if for some WU app will find two square pairs, but report them in different order than now. Will server be able to handle this? Or do I need to sort pairs in such case to make result "canonical"?

By binary equivalency. But applications use an identical Generator::path[] (that supplied in workunit file) and computations must produce the identical squares sets in identical order.

One more question: do you always generate WUs with all values on diagonals set? I checked few WUs and noticed this. I wanted to optimize Generator::Start by processing diagonal elements before non-diagonal ones and save some cycles consumed by processing of 'primary' and 'secondary' variables in latter part, but now I wonder if this part of code could be eliminated completely.

In workunits of current search all diagonals - primary and secondary completely filled. Excluding of if (rowid == columnId) and if (rowId == Rank - 1 - columnId), if I understand you right - may be a good idea.

Thank you for attention!
ID: 291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 293 - Posted: 14 Jan 2018, 23:34:12 UTC - in response to Message 291.  

Thanks for answers!
Hello!

Hi,
I have few question about things which are not clear for me:
1. Is Generator::keyValue field needed? I checked few WUs and it always was set to -1 there. I wonder if this field and code which uses it can be removed;

This class member is required for store a value of cell[keyRowId][keyColumnId] at which the generator of diagonal squares must stop. It is a workunit border. With current partitioning of squares on workunits, keyValue is always -1.

Do you have any plans to start using non-negative values of keyValue in the future? I wonder if you could use longer path prefix instead, and remove this field completely.

2. Can I assume that every new WU at the beginning will have Generator::cellId set to 0?

Yes. Because cellId is an index of current cell in path - array Generator::path. But while working cellId changed, of course. You can see this situation in any checkpoint file.

3. Can I assume that every new WU at the beginning for every cell in Generator::path will have all corresponding values (bits) in Generator::cellsHistory set to 1 (i.e. all numbers from 0 to 8 can be potentially used for cell)?

In cellsHistory[rowId][columnId] the generator marks values, that used in current visit on this cell - Matrix[rowId][columnId]. When generator make a rollback to previous cell - it clear the history of cell. When generator go "into cell", cell history is clean, but real set of values, available for inserting into cell determine by flags in columns[value][columnId] and rows[rowId][value] arrays and cellsHistory, of course.

I asked these two questions because I wonder if I can assume a "clean slate" state at the beginning, i.e. only bits corresponding to cells in constant path prefix are changed, and all other would be changes by app during WU processing. I would like to use cellsHistory to store cell value candidates in similar way like in MovePairSearch::MoveRows. If in new WU cells on path would be partially processed, this would complicate things for me.

4. How do you check if result is valid - are you checking if files are binary identical, or examine contents more closely? I wonder what will happen if for some WU app will find two square pairs, but report them in different order than now. Will server be able to handle this? Or do I need to sort pairs in such case to make result "canonical"?

By binary equivalency. But applications use an identical Generator::path[] (that supplied in workunit file) and computations must produce the identical squares sets in identical order.

OK, so GPU app would have to take care of this (I started working on it). It would run few hundreds of generators (or maybe thousands?), each of them processing its part of search space. They would run in parallel, so order of results is no longer predetermined.

One more question: do you always generate WUs with all values on diagonals set? I checked few WUs and noticed this. I wanted to optimize Generator::Start by processing diagonal elements before non-diagonal ones and save some cycles consumed by processing of 'primary' and 'secondary' variables in latter part, but now I wonder if this part of code could be eliminated completely.

In workunits of current search all diagonals - primary and secondary completely filled. Excluding of if (rowid == columnId) and if (rowId == Rank - 1 - columnId), if I understand you right - may be a good idea.

Yes, this is what I meant. Generator is 2nd most time-consuming function (~40% of total time), so elimination of these checks would speedup everything.
ID: 293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,883,338
RAC: 11,313
Message 296 - Posted: 17 Jan 2018, 5:35:29 UTC - in response to Message 293.  

Daniel, hello!

Do you have any plans to start using non-negative values of keyValue in the future? I wonder if you could use longer path prefix instead, and remove this field completely.

In this search,I think keyValue will always be -1, because current partition global tasks on workunits is practical.

[quote]2. Can I assume that every new WU at the beginning will have Generator::cellId set to 0?

Yes. Because cellId is an index of current cell in path - array Generator::path. But while working cellId changed, of course. You can see this situation in any checkpoint file.

I asked these two questions because I wonder if I can assume a "clean slate" state at the beginning, i.e. only bits corresponding to cells in constant path prefix are changed, and all other would be changes by app during WU processing. I would like to use cellsHistory to store cell value candidates in similar way like in MovePairSearch::MoveRows. If in new WU cells on path would be partially processed, this would complicate things for me.

Yes. All cell in the Generator::path must be "clean" at start, because them form a work for Generator. And only these cells modify by Generator during work. And none cells from first line, diagonals and Matrix[1][0] (that not included in path) - cannot be changed by it.

OK, so GPU app would have to take care of this (I started working on it). It would run few hundreds of generators (or maybe thousands?), each of them processing its part of search space. They would run in parallel, so order of results is no longer predetermined.

Very interesting! Maybe set a number for each GPU thread by and place results into order of them? (But will have to abandon from checkpoints).

Yes, this is what I meant. Generator is 2nd most time-consuming function (~40% of total time), so elimination of these checks would speedup everything.

Ok.

Thank you very much Daniel!
ID: 296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hoarfrost
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 11 Aug 17
Posts: 626
Credit: 20,883,338
RAC: 11,313
Message 433 - Posted: 5 Jun 2018, 2:02:55 UTC

Hello!

Now I test a new application under Linux and want to compile it under Windows, using MinGW, but get an error:
$ make
C:\MinGW\bin\g++.exe -O3 -ftree-vectorize -ID:\GitHub\boinc -ID:\GitHub\boinc/lib -ID:\GitHub\boinc/api -LD:\GitHub\boinc/api -LD:\GitHub\boinc/lib -c main.cpp
In file included from D:\GitHub\boinc/api/boinc_api.h:24:0,
                 from main.cpp:5:
D:\GitHub\boinc/lib/boinc_win.h:164:21: fatal error: winhttp.h: No such file or directory
 #include <winhttp.h>
                     ^
compilation terminated.
make: *** [main.o] Error 1


I use a makefile:
BOINC_DIR = D:\GitHub\boinc
BOINC_API_DIR = $(BOINC_DIR)/api
BOINC_LIB_DIR = $(BOINC_DIR)/lib

COMPILER = C:\MinGW\bin\g++.exe

CXXFLAGS += -O3 -ftree-vectorize \
    -I$(BOINC_DIR) \
    -I$(BOINC_LIB_DIR) \
    -I$(BOINC_API_DIR) \
    -L$(BOINC_API_DIR) \
    -L$(BOINC_LIB_DIR)

PROGRAM = rakesearch

all: $(PROGRAM)

clean:
	rm -f $(PROGRAM) *.o

$(PROGRAM): main.o Square.o Generator.o MovePairSearch.o $(BOINC_LIB_DIR)/libboinc.a $(BOINC_API_DIR)/libboinc_api.a
	$(COMPILER) $(CXXFLAGS) -o $(PROGRAM) main.o Square.o Generator.o MovePairSearch.o -pthread -lboinc_api -lboinc

Square.o: Square.cpp
	$(COMPILER) $(CXXFLAGS) -c Square.cpp

Generator.o: Generator.cpp
	$(COMPILER) $(CXXFLAGS) -c Generator.cpp
	
MovePairSearch.o: MovePairSearch.cpp
	$(COMPILER) $(CXXFLAGS) -c MovePairSearch.cpp 

main.o: main.cpp
	$(COMPILER) $(CXXFLAGS) -c main.cpp


In Visual Studio 2015 the compilation is normal. I must recompile BOINC libraries under MinGW?
ID: 433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@P] Daniel

Send message
Joined: 8 Sep 17
Posts: 99
Credit: 402,603,726
RAC: 0
Message 437 - Posted: 8 Jun 2018, 7:56:19 UTC - in response to Message 433.  
Last modified: 8 Jun 2018, 8:03:23 UTC

Hi,
Please use MinGW compiler shipped with Cygwin. Makefiles created my me automatically uses it when you pass option MinGW=1 when calling make. BTW, tou will need to install both 32 and 64-bit versions of Cygwin (in separate directories), as 64-bit version does not have 32-bit libs needed for linking final app.

You will also need to recompile BOINC libs as you wrote. This is a bit tricky, as you have to use boinc/lib/Makefile.mingw instead of one generated by configure script. You can also use my ones, you can download them from here: https://bitbucket.org/sirzooro/boinc-stuff/downloads/.
ID: 437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Science : Source code of the project application

©2024 The searchers team, Karelian Research Center of the Russian Academy of Sciences