Anybody know how to optimize this code with openCL or SIMD instructions?

Question

Anybody know how to optimize this code with openCL or SIMD instructions?

Robert Garcia

uint64_t sqrtx, x;

for (uint64_t i = 37; i

September 24, 2016 - 04:42

Nolan Evans

>(((SIMD)))
shiggy

September 24, 2016 - 04:49

Aaron Ward

Explain your reasoning.

September 24, 2016 - 04:52

Asher Ward

bumping for input.

September 24, 2016 - 06:02

Angel Rogers

do you're own homework

September 24, 2016 - 06:04

Landon Long

It's not homework.

September 24, 2016 - 06:07

Brody Garcia

what's it for then?

September 24, 2016 - 06:09

Connor Walker

uint64_t sqrtx, x;

for (uint64_t i = 37; i

September 24, 2016 - 06:12

Jaxson Rodriguez

>i + 6
It's not the same code m8

September 24, 2016 - 06:16

Lincoln James

can crunch the if statements:
if(!(x%(i+NUM)) return ...

September 24, 2016 - 06:16

Jordan Bailey

Class.

September 24, 2016 - 06:18

Gavin Edwards

Actually, since you're dealing with magic numbers (0,4,6,10,12,16,22,24), I'd just put those in an array - then you'd have something like:

int specialNums[8] = {0,4,6,10,12,16,22,24};

for(unit64_t i = 37; i

September 24, 2016 - 06:21

Ethan Anderson

What the fuck is the point of this code?

September 24, 2016 - 06:22

Nathaniel Hill

It's a part of a number factoring function. The selected code occupies the majority of CPU time and I need to speed it up using vector /multi- processing. The problem is that I have little experience with such tasks.

September 24, 2016 - 06:27

Nicholas Hughes

All if checks can be removed. Speed is 20 ns then.

Too bad you never read Hacker's Delight.

Now back to 9gag

September 24, 2016 - 06:30

Brandon Hill

>All if checks can be removed
how?

September 24, 2016 - 06:37

John Price

That'd be pointless. The only way to optimize that is to get rid of the excessive testing.
Branch misprediction is killing your code.

September 24, 2016 - 07:22

Easton Campbell

What is this code supposed to do?
Can you provide some inputs/outputs please?

September 24, 2016 - 07:27

Dylan Campbell

Okay... (I explained what it does in )

You can set it up with the following values (takes about 6 seconds to do 83 million loops on my machine) to get the return value 2502845209:

uint64_t x = 8700000089193112463;

uint64_t sqrtx = 2949576255;

The return value is the next lowest prime factor of the input number.

No it can easily be sped up by using multi-processing (splitting the task across CPUs. But I want to know if there are any vector operations or if OpenCL would be of any use. It looks like there are a few things I can do before resorting to multi-processing at least. Seems like nobody here has much experience with vectors, GPU processing, or multi-threadded applications.

September 24, 2016 - 08:26

Logan Morgan

Yeah, just rewrite it without a loop

September 24, 2016 - 08:34

Aaron Powell

Not possible, it has to loop over 100 million times in some cases.

September 24, 2016 - 08:38

John Martin

movapd xmm0, XMMWORD PTR A
movddup xmm2, QWORD PTR B
mulpd xmm2, xmm0
movddup xmm1, QWORD PTR B+8
shufpd xmm0, xmm0, 1
mulpd xmm1, xmm0
addsubpd xmm2, xmm1
movapd XMMWORD PTR C, xmm2

September 24, 2016 - 08:42

Jeremiah Foster

Well, how many tasks can be executed simultaneously on your GPU device?

September 24, 2016 - 09:07

Kevin King

One has 16 execute units, the other is supposed to have 384 CUDA cores. My CPU is quad core with HT in each core (SIMD operations in each).

GPU seems too slow though due to the overheads unless I change the way things run.

September 24, 2016 - 09:10

Hunter Rodriguez

Can you give me any references as to how to understand this? How do I use it?

September 24, 2016 - 09:28

Jeremiah Perry

I am a noob and this is a horrible solution, but I was able to gain a speedup by spawning pthreads with intervals of the i value.
Although lower intervals will finish first, I secured the priority with a Queue structure.
Do that with openCL and you should gain an average speedup.

However, this is a really weird task to optimize this loop with openCL/SIMD.
If you are able to, just switch to another algorithm for your factoring function.

September 24, 2016 - 10:16

1 2 3 Next

Anybody know how to optimize this code with openCL or SIMD instructions?

Last threads