Aptarimas:Matematika/Sinuso Integralas
Pridėti temąIšvaizda
popular example of Fourier integral
[keisti]Given function
- f(x)=1, when |x|<1,
- f(x)=1/2, when |x|=1,
- f(x)=0, when |x|>1.
- Need to write this function through Fourier integral.
- Solution is for this example:
- In particular case if x=0 (|x|<1), then
- and we put 0 into x place and we get And so we have:
- As far as I understand about Fourier integral, this integral means:
- But problem is, that I check it through Free Pascal program "Version 1.0.12 2011/04/23; Compiler Version 2.4.4; Debugger GDB 7.2" with this code:
var a:longint; c:real; begin c:=0; a:=0; for a:=1 to 100000 do c:=c+sin(a)/a; writeln(c); readln(); end.
- so I get result 1.07080565212341. Even not close to — Preceding unsigned comment added by Versatranitsonlywaytofly (talk • contribs) 22:52, 22 December 2011 (UTC) BTW, you can use it like benchmark changing number in line "for a:=1 to 100000" to bigger than 100000. With number 1000000000 I got 1.07079632630307 and it take for CPU 52 seconds to compute result. You can use "a:integer" instead "a:longint", but then smaller number you will be able to choose. With number 100000000 it tooks only 5 seconds and result is 1.07079633477997.
- It apears just simple mistake, I thought impossible that it mean that it mean, because but it not, it equal to ~1. But interesting coincidence, that result is it something must to do with Fourier series and coefficient . So real code is:
var b:real; a:longint; begin b:=0; a:=0; for a:=1 to 1000000000 do b:=b+0.00001*sin(a/100000)/(a/100000); writeln(b); readln(); end.
- And result is 1.57088654523321 after 63 seconds.
- [Small fixing 2024] Above code looks strange. It will give result 1.57088654523321 after 63 seconds (on 2.6 GHz CPU). On 4.16 GHz CPU this above code gives result [1.5708865452332146E+000] first time (Free Pascal first time calculating longer than second and third and any later try) after about 2 minutes and about 25 seconds (about 145 seconds) with enough heavily loaded CPU with 3 internet Browsers. Second time launching this above code on at 4.16 GHz working CPU gives result [1.5708865452332146E+000] after 51 second with two heavely enough loaded internet browsers (one browser I close, but nothing was playing in Browsers anyway, just opened youtube and over pages...).
- Such Free Pascal code:
a:longint; begin b:=0; for a:=1 to 1000000000 do b:=b+0.000000001*sin(a*0.000000001)/(a*0.000000001); writeln(b); readln(); end.
- gives result [9.4608307028940319E-001 which means 9.4608307028940319/10] after 2 minutes and 10-15 seconds (130-135 seconds) first time on at 4.16 GHz speed working CPU (with two heavily loaded Internet Browsers). So I now understood that above code [which gives result ~pi/2 after 63 seconds on 2.6 GHz CPU] is correct and this code is wrong, because it have too big intervals between x values of function Si(x). Launching second time this code gives result [9.4608307028940319E-001] after 40 seconds on 4.16 GHz CPU.
- This Free Pascal code:
var b:real; a:longint; begin b:=0; for a:=1 to 1000000000 do b:=b+0.0000001*sin(a*0.01)/(a*0.01); writeln(b); readln(); end.
- gives result [1.5657964177342434E-005 which means 1.5657964177342434/100000] after 2 minutes and 19 seconds (139 seconds) first time on 4.16 GHz CPU (with heavily loaded one internet browser, but nothing was playing then). Second time this code gives result [1.5657964177342434E-005] after 46 seconds on 4.16 GHz CPU (with heavily loaded one internet browser, but nothing was playing then).
- This code:
var b:real; a:longint; begin b:=0; for a:=1 to 1000000000 do b:=b+0.000001*sin(a*0.000001)/(a*0.000001); writeln(b); readln(); end.
- gives result [1.5702326223791097E+000] first time after 2 minutes and 15 seconds (135 seconds) on 4.16 GHz CPU (with heavily loaded one internet browser, but nothing was playing then). Second time launching this code brings result [1.5702326223791097E+000] after 41 second on 4.16 GHz CPU (with heavily loaded one internet browser, but nothing was playing then).
- There is 3 multiplications, 1 division, 1 addition operation and one sine operation in each iteration. There is bilion iterations (10^9 iterations). For this calculations was waisted 41*4.16*10^9 = 170,560,000,000 = 170560000000 (cycles) = 1.7056 * 10^11 (cycles). If think that multiplication and addition takes about 4 cycles each. Then in each iteration waisted 4*3+4*1=16 (cycles). Here I get that for one division operation need about 25 cycles on 4.16 GHz CPU (but actualy it maybe doing division faster, because some cycles waisted on one addition operation and iteration operation itself; without addition operation there can be not 25 cycles, but about 25-4=21 cycle for one division operation). So for one sin(x) operation need (1.7056 * 10^11)/(10^9 [iterations]) = 170.56 cycles. If we subtract 16 cycles for all multiplications operations and addition operation and also subtract 21 cycle for division operation, then we will get:
- 170.56 - 16 - 21 = 133.56 =~133 cycles for one sin(x) operation. This was done on AMD FX(tm)-8350 Eight-Core Processor 4.00 GHz (it official goes to 4.2 GHz, but Windows 10 showing in Task Manager that it goes to 4.16 GHz when loaded with calculations). BTW, AMD was suied for saying that this CPU is 8 cores, because it can't run tasks designed for 8 cores, but only tasks designed for 4 cores. Anyway, I think Free Pascal always using only one core. This CPU is equiped with 8 GB DDR3-1600 (800 MHz) Dual Channel RAM.
- This Free Pascal code:
var b:real; a:longint; begin b:=0; for a:=1 to 1000000000 do b:=b+0.0001*sin(a*0.0001)/(a*0.0001); writeln(b); readln(); end.
- gives result [1.5707563204155381E+000] after 2 minutes and 16 seconds (136 seconds) in first time launch on 4.16 GHz CPU (with heavily loaded one internet browser, but nothing was playing then). Second and third time this code gives result [1.5707563204155381E+000] after 43-44 seconds on 4.16 GHz CPU (with heavily loaded one internet browser, but nothing was playing then). This is the best code from all experiments, which gives most accurate result. For example code with such line:
b:=b+0.001*sin(a*0.001)/(a*0.001);
- gives result 1.5702953898705860E+000, which is less accurate, because pi/2 =~ 1.5707963267948966.
- Only very first code (which calculated on 2.6 GHz CPU) with line
b:=b+0.00001*sin(a/100000)/(a/100000);
- which is equal to line:
b:=b+0.00001*sin(a*0.00001)/(a*0.00001);
- gives result 1.5708865452332150E+000 almost as good as code mentioned recently.
Sine Taylor series benchmarking
[keisti]- Sine function can be written as Taylor series. Here we have 14 numbers after point (15 total and one last for last number approximation, which isn't shows and is only in memory). If one decimal number is 4 bits, then it can be 16*4=64 bits precision.
- Using windows calculator I check, that
- particularly
- or
- or
Some stupid infinity furje integral problem (Furje Integral must be pseudoscience)
[keisti]var b:real; a:longint; begin b:=0; a:=0; for a:=1 to 1000000000 do b:=b+0.00001*cos(1.1*a/100000)*sin(a/100000)/(a/100000); writeln(b); readln();
- It just gives always different result do not matter how close to infinity you choose to be for a.
Simplest benchmark
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+a; writeln(c); readln(); end.
- Result is after 6 seconds on ~3 GHz processor. Notice, that in this case result is only twice faster than on 1.6 GHz Intel Atom processor, because it don't depends on number of cores, nor on instructions or amounts of cashes. Pentium III of 3GHz would calculate in same time.
- There is integer numbers of 16 decimal places (one 16th number is for approximation in the end and it's not shown). One decimal number is 4 bits. So 64 bits precision in total. There is exactly one billion additions in 6 seconds. This is 166 millions additions per second. But if calculate how many bits addition is per second then we get 166*64=10624 millions/s or 10 billions additions per second. This is 10624/8=1328 Megabytes per second or 1.3 GB/s. For now seems like nothing, what can not handle (800MHz*64bits)/8=6400 MB/s RAM memory.
- Interesting coincidence, that say at 3(GHz) done in 5(seconds), then at 1(GHz) done in 15(seconds). And we have 15 decimal places. So I suggest, that in one cycle (takt) CPU making one decimal number sum operation (like 4+7 or 8+6 or 4+5). I suggest, that in one cycle (3GHz CPU have 3 bilions cycles/s) can be done either one sum operation or one subtraction operation or multiplication or division (maybe subtraction operation takes 2 cycles and multiplication 3 cycles, but maybe no). From this can be conclusion, that Bill Gates don't using MMX(64bit), SSE(128bit) or AVX(256bit) instructions and they are just in some kinda BIOS or ROM memory, but have small influence in practice as all old programs like Visual C++ and Windows running on instructions [software codes] before introducing MMX, SSE, AVX. My drift is, that Intel various SSE instructions don't adding physical calculation units, but is just kinda smarter vector calculating codes, than kinda you can write and thus can be faster.
Simplest benchmark 2
[keisti]- Strange even this harder benchmark gives result in 3-4 seconds.
var a:longint; c,b,d,e,f,g:real; begin for a:=1 to 1000000000 do b:=a; d:=a*b; //a^2 e:=d*d; //a^4 f:=e*e; //a^8 g:=f*f; //a^16 // c:=c+sin(a)/a; c:=c+(a*g); writeln(c); readln(); end.
Something wrong
[keisti]var a:longint; c,b,d,e,f,g:real; begin for a:=1 to 1000000000 do b:=a; d:=a*b; //a^2 e:=d*d; //a^4 f:=e*e; //a^8 g:=f*f; //a^16 // c:=c+sin(a)/a; c:=c+(1-a*d/6+a*e/120-a*d*e/5040+a*f/362880-a*d*f/39916800+a*e*f/6227020800-(g/a)/1307674368000+a*g/355687428096000-a*d*g/121645100408832000)/a; writeln(c); readln(); end.
- Result should be , but for some reason is (this result is gotten after 3-4 seconds).
Something wrong 2
[keisti]- This also gives wrong result:
var a:longint; c,b,d,e,f,g,h:real; begin for a:=1 to 1000000000 do b:=a/100000; d:=a*b/100000; //a^2 e:=d*d; //a^4 f:=e*e; //a^8 g:=f*f; //a^16 h:=1-b*d/6+b*e/120-b*d*e/5040+b*f/362880-b*d*f/39916800+b*e*f/6227020800-(g/b)/1307674368000+b*g/355687428096000-b*d*g/121645100408832000; c:=c+0.00001*h/(a/100000); writeln(c); readln(); end.
- Result is after 10 seconds. But result should be
- update: silly error (must be b instead 1),
h:=b-b*d/6+b*e/120-b*d*e/5040+b*f/362880-b*d*f/39916800+b*e*f/6227020800-(g/b)/1307674368000+b*g/355687428096000-b*d*g/121645100408832000;
- but still need much longer serie (like up to ) to get something, which would give at least two first correct decimal places. It seems that there is some trick for last number to made series perhaps much much shorter.
Only sin(1) is quite precise with short series
[keisti]I check, that
- and
- While precise result is
- Update. With all small number (like from 0 to ) need only calculate to to get 10 decimal places precision. So smart choise is to divide big number by or by and remove fractional part and integer part multiply by or by and then gotten result subtract from initial big number. Then you have number from 0 to (or to ), which Taylor series calculation makes short and simple.
- Update 2. Need divide by then must do not be any errors. Then fractional part remove and integer part multiply by and gotten result subtract from big number. This gives about 5 correct decimal places if you calculate to , but it's still much shorter than very big numbers. For exampe,
- which is quite close to
simplest benchmark 2.1
[keisti]var a:longint; c,b,d,e,f,g,h:real; begin for a:=900000000 to 1000000000 do c:=c+a; writeln(c); readln();
- Result is in ~1 second. This is addition operations.
simplest benchmark 3
[keisti]var a:longint; c,b,d,e,f,g,h:real; begin for a:=500000000 to 1000000000 do c:=c+a; writeln(c); readln();
- Result is in 2 or 3 seconds on ~3 GHz CPU. This is addition operations. Notice, that on average done in precision where number 16 is 16 decimal digits (15, but one for rounding, I think). Because, I think, CPU don't using 15-16 numbers precision until numbers are small like or Putting this to binary form, I have in mind, that which is 16 decimal numbers from 52 binary numbers. So and it is 8 decimal places of integer. So if CPU using addition by 1 bit, then here is So need 5.2 GHz CPU to accomplish this task. My suggestion is that CPU 0's adding faster than 1's. Or cycle (all CPU do ~3*10^9 cycles/s) is determined by first CPU designers as one decimal number addition to over decimal number and if precision is 16 decimal numbers then there is 16 additions and thus it will be Now you know what mean cycle, because cycle is one addition of one number (from 0 to 9) to another number (from 0 to 9). So, for example, 64 bits (double precision) 2*10^12 floating operations per second (2 TFLOPS) is operations (additions, for example) per second of 16 decimal places digits numbers (of 64 bits numbers). For example, AMD claims, that Radeon HD 6970 have 683 GFLOPs Double Precision compute power (and 2.7 TFLOPs Single Precision compute power). So 683/16=42.6875 billions additions/s of 16 decimal places numbers. Also 2700/8=337.5 billions additions/s of 7-8 decimal places numbers (single precision (32 bits) have more like 7 decimal digits). Doubling computer power 2 times in two years, in 2016 there should be 4 times bigger number. But recently arived Radeon HD 7970 have 3.79 TFLOPs Single Precision compute power and 947 GFLOPs Double Precision compute power. This AMD "Radeon HD 7970" is at writing time most powerfull single chip graphics card. So in 2016 there should be 3.79*4=15.16 TFLOPS Single Precision and 0.947*4=3.788 TFLOPs double precision most powerful card (double precision amount of FLOPs is 4 times smaller than single precision on AMD Radeon HD cards). And in 2030 there should be TFLOPs ~ 1000 PFLOPS ~1 ExaFLOPS ( FLOPS) Single Precision most powerful GPU card.
- Another theory is, that CPU doing all job, and only CPU single core power increase, because of bigger frequency, which in 2016 should be 3.5 GHz not expensive as now 3 GHz. And in 2030 should be 10 GHz or 30 GHz (or the same 3 GHz in worst case). But games programming will be so professional, that you will not be able to say or it is done on 3GHz single core or on 10^18 FLOPS GPU. Anyway 10 GHz CPU will trick you with no problems with raytracing (reflection in reflection in small area and you are tricked) and approximated radiosity.
Natural logarithm benchmarking
[keisti]var a:longint; c,b,d,e,f,g,h:real; begin for a:=500000000 to 1000000000 do c:=c+ln(a); //real // b:=a/100000; // d:=a*b/100000; //a^2 // e:=d*d; //a^4 // f:=e*e; //a^8 // g:=f*f; //a^16 // c:=c+sin(a)/a; // h:=b-b*d/6+b*e/120-b*d*e/5040+b*f/362880-b*d*f/39916800+b*e*f/6227020800-(g/b)/1307674368000+b*g/355687428096000-b*d*g/121645100408832000; // c:=c+0.00001*h/b; writeln(c); readln(); end.
- This benchmark gives result 1.02082065291082*10^10 after 25 seconds on ~3GHz CPU.
Natural logarithm benchmarking 2
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+ln(a); writeln(c); readln(); end.
- This benchmark gives result after 50 seconds on ~3GHz CPU. Notice, that for sine function calculation ("c:=c+sin(a);") in exactly the same manner result was gotten also in ~50 seconds (47 seconds; result ). It makes me think, that sine or natural logarithm is gotten from some kind big database table, rather than calculated. But I check few times and there really is 3-4 seconds difference between sine function and natural logarithm (natural logarithm calculated 3 seconds longer).
Natural logarithm benchmarking 3
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+123456789012345*ln(a); writeln(c); readln(); end.
- This benchmark gives result after 52 seconds on ~3GHz CPU.
- If there is only one multiplier unit, then either there is calculations in smaller precision or there is more than one multiplier unit. Because to multiply each number with another number then there is 15^2=225 or 16^2=256 multiplications and 225 or 256 addition operations for multiplying two 15 (or 16) decimal places numbers. So in total to multiply, for example, 123456789012345 with 123456789012345 need 15^2+225=450 operations. And if one operation (like addition operation) done in one cycle, then for such benchmark (if not counting calculation of natural logarithm) to do in 52 seconds need not ~3GHz CPU, but CPU. So I pretty believe, that there is 15 or 16 decimal digits number multiplication with one decimal digit number in 1 or at most 3 cycles (but really not more than in 4-10 cycles).
- Update: "Free Pascal" can reform this code into this (and result will be the same, but little bit different ):
var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+ln(a); writeln(123456789012345*c); readln(); end.
- So multiplication must be hard, because you can't even use multiple times (like "for a:=1 to 1000000000 do") free pascal function "sqr()", which is power of 2. This function ("sqr()") you can use only once, so you need go around and use "exp()" and "ln()" functions combinations to rise power of 2 (like this: exp(2*ln(a))=a*a). So if there really all programing languages have problems with multiplication then there is hope, that GPU is not a fake. BTW, I read, that RISC processors programing also do not have multiplication or can't do multiplication, something like that, but can do division.
Natural logarithm benchmarking 4
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+exp(2*ln(a)); //a^2=exp(2*ln(a)) writeln(c); readln(); end.
- This benchmark gives result after 96 seconds on ~3GHz CPU.
Natural logarithm benchmarking 5
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+exp(ln(a)); writeln(c); readln(); end.
- This benchmark gives result after 92 seconds on ~3GHz CPU.
Power function benchmarking
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+a*exp(ln(a)); // a^2=a*exp(ln(a)) writeln(c); readln(); end.
- This benchmark gives result after 93 seconds on ~3GHz CPU.
Simple benchmark
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+1; writeln(c); readln(); end.
- This benchmark gives result after 4 seconds on ~3GHz CPU.
Simple benchmark 2
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+2.7; writeln(c); readln(); end.
- This benchmark gives result after 5 seconds on ~3GHz CPU.
Simple benchmark 3
[keisti]var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+2.789123456789012; writeln(c); readln(); end.
- This benchmark gives result after 5 seconds on ~3GHz CPU. Such code in "Free Pascal" can be reformed to this: "c:=c+1;=>result*2.789123456789012", so it does not prove anything. Like this:
var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+1; writeln(c*2.789123456789012); readln(); end.
- This time result is after 4-5 seconds on ~3GHz CPU.
- Or if you use this benchmark, then multiplication will be rounded:
var a:longint; b,c:real; begin for a:=1 to 1000000000 do c:=c+1; b:=c*2.789123456789012; writeln(b); readln(); end.
- And result is after 4-5 seconds on ~3GHz CPU. If you get result faster it means, that ~3GHz=2.6GHz.
Square root benchmarking
[keisti]var a:longint; b,c:real; begin for a:=1 to 1000000000 do c:=c+sqrt(a); // a^(1/2)=sqrt(a) writeln(c); readln(); end.
- This benchmark gives result after 12 seconds on ~3GHz CPU (if this line "c:=c+sqrt(a);" replace with this line "c:=c+a*sqrt(a);", it still gives result after 12 seconds on the same CPU). Calculations going on in 16 decimal digits precision. So two cycles waisted per each decimal digit in this [square root] calculation, because (12*2.6*10^9)/(16*10^9)=(31.2*10^9)/(16*10^9)=2. Or 2*16=32 cycles used for square root of one double precision (64 bits = 16 decimal digits) number.
- Here example how square root is calculated.
- To get
- Step 1: Guess G = 1;
- Step 2: New Guess = (G + x/G)/2;
- Repeat Step 2 arbitrary number of times, to get arbitrary precise result.
- For example,
- G = 1;
- G = (1+3/1)/2 = 4/2 = 2;
- G = (2 + 3/2)/2 = 7/4 = 1.75;
- G = (7/4 + 12/7)/2 = (49+48/28)/2 = (97/28)/2 = 97/56 = 1.732142857;
- G = (97/56 + 168/97)/2 = ((97^2 + 168*56)/(97*56))/2 = ((9409 + 9408)/5432)/2 = 18817/10864 = 1.732050810.
Too fast multiplication (multiplication benchmark)
[keisti]var a:longint; b,c:real; begin b:=0; c:=123456789012345; for a:=1 to 1000000000 do b:=b+c*a; writeln(b); readln(); end.
- This benchmark gives result after 5 seconds on ~3GHz CPU.
- But this benchmark
var a:longint; b,c:real; begin b:=0; c:=123456789012345; for a:=1 to 1000000000 do b:=b+a; writeln(c*b); readln(); end.
- gives result also after 5 seconds on ~3GHz CPU.
- And this benchmark
var a:longint; b,d,c:real; begin b:=0; c:=123456789012345; for a:=1 to 1000000000 do b:=b+a; d:=c*b; writeln(d); readln(); end.
- gives result also after 5 seconds on ~3GHz CPU.
- Even this benchmark
var a:longint; b,c:real; begin b:=0; for a:=1 to 1000000000 do b:=b+a; writeln(b); readln(); end.
- gives result after 5 seconds on ~3GHz CPU.
Free Pascal pagrindas norint sukūrti sinuso skaičiavimą
[keisti]Uses math; begin Writeln(Ceil(-3.7)); // should be -3 Writeln(floor(-3.7)); // should be -4 Writeln(frac(3.7)); // should be 0.7 Writeln(floor(3.7)); // should be 3 Writeln(ceil(3.7)); // should be 4 Writeln(Ceil(-4.0)); // should be -4 Readln; End.
- tai ir yra pilnas kodas Free Pascal'iui (Compiler version 2.6.0) su kuriuo jau veikia (paspaudžius "Run" arba klaviatūra "Ctrl+F9") ir parodomas monitoriuje skaičių stulpelis.
- Bet koks skaičius x paverčiamas į radianus nuo 0 iki pritaikius formulę:
- Pavyzdžiui, x=10, tada:
- sin(10)=-0.54402111088936981340474766185138;
- ir
- frac(1.5915494309189533576888376337251)=0.5915494309189533576888376337251,
- sin(3.716814692820413523074713233441)=-0.54402111088936981340474766185138.
Sinuso Tailoro eilutės Free Pascal kodas
[keisti]var a:longint; c:real; begin for a:=0 to 3 do c:=c+(a-0.16666666666666667*a*a*a+0.0083333333333333333*a*sqr(sqr(a*1.0))- 0.00019841269841269841*a*sqr(a*1.0)*sqr(sqr(a*1.0))+ 0.0000027557319223985891*a*sqr(sqr(sqr(a*1.0)))- 0.000000025052108385441718775*a*sqr(a*1.0)*sqr(sqr(sqr(a*1.0)))+ 0.000000000160590438368216146*a*sqr(sqr(a*1.0))*sqr(sqr(sqr(a*1.0)))- 0.00000000000076471637318198164759*a*sqr(a)*sqr(sqr(a))*sqr(sqr(sqr(a*1.0)))+ 0.000000000000002811457254345520763*a*sqr(sqr(sqr(sqr(a*1.0))))); writeln(c); writeln(sin(1)+sin(2)+sin(3)); Readln; End.
- duoda resultatus:
- 1.89188842905109;
- 1.8918884196934454.
Sinuso Free Pascal kodas bet kokiems skaičiams
[keisti]Uses math; var a:longint; c:real; begin for a:=0 to 100000000 do c:=c+6.283185307179586477*frac(a*0.159154943091895336)- 0.16666666666666667*sqr(6.283185307179586477*frac(a*0.159154943091895336))*6.283185307179586477*frac(a*0.159154943091895336)+ 0.0083333333333333333*6.283185307179586477*frac(a*0.159154943091895336)*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))- 0.00019841269841269841*6.283185307179586477*frac(a*0.159154943091895336)*sqr(6.283185307179586477*frac(a*0.159154943091895336))*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))+ 0.0000027557319223985891*6.283185307179586477*frac(a*0.159154943091895336)*sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))- 0.000000025052108385441718775*6.283185307179586477*frac(a*0.159154943091895336)*sqr(6.283185307179586477*frac(a*0.159154943091895336))*sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))+ 0.000000000160590438368216146*6.283185307179586477*frac(a*0.159154943091895336)*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))*sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))- 0.00000000000076471637318198164759*6.283185307179586477*frac(a*0.159154943091895336)* sqr(6.283185307179586477*frac(a*0.159154943091895336))*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))* sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))+ 0.000000000000002811457254345520763*6.283185307179586477*frac(a*0.159154943091895336)*sqr(sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))); // 1/(2*3.14)=0.159 writeln(c); Readln; End.
- kuris duoda atsakymą 55365,3072928836 po 71 sekundės su 2,6 GHz procesorium. Šitas kodas duoda sin(x) tą patį ką ir originali Free Pascal sinuso funkcija tik, kai 0<x<1.09, o su vis didesniais 1.09<x<6.283185307179586477 atsakymas gaunasi vis netikslesnis. Nes ko gero Free Pascal skaičiuoja sinusą ekonomiškai iki 45 laipsnių, o ne naudoja labai ilgą Teiloro eilutę.
- Truputi optimizuotas šio kodo variantas:
Uses math; var a:longint; c:real; begin for a:=1 to 100000000 do c:=c+frac(a*0.159154943091895336)- 0.16666666666666667*sqr(6.283185307179586477*frac(a*0.159154943091895336))*frac(a*0.159154943091895336)+ 0.0083333333333333333*frac(a*0.159154943091895336)*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))- 0.00019841269841269841*frac(a*0.159154943091895336)*sqr(6.283185307179586477*frac(a*0.159154943091895336))*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))+ 0.0000027557319223985891*frac(a*0.159154943091895336)*sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))- 0.000000025052108385441718775*frac(a*0.159154943091895336)*sqr(6.283185307179586477*frac(a*0.159154943091895336))*sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))+ 0.000000000160590438368216146*frac(a*0.159154943091895336)*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))*sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))- 0.00000000000076471637318198164759*frac(a*0.159154943091895336)* sqr(6.283185307179586477*frac(a*0.159154943091895336))*sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336)))* sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))+ 0.000000000000002811457254345520763*frac(a*0.159154943091895336)*sqr(sqr(sqr(sqr(6.283185307179586477*frac(a*0.159154943091895336))))); // 1/(2*3.14)=0.159 writeln(6.283185307179586477*c); Readln; End.
- duoda atsakymą 55365,307292856515 vis tiek po 71 sekundės su 2,6 GHz procesorium.
Sinuso benchmark'as
[keisti]- Šitas sinuso kodas:
Uses math; var a:longint; c:real; begin for a:=0 to 100000000 do c:=c+sin(a); writeln(c); Readln; End.
- yra 71/5=14 karų greitesnis už savadarbį, nes duoda atsakymą 1,71364934657128 po 5 sekundžių su 2,6 GHz procesorium.
Sinuso benchmark'as 2
[keisti]- Šitas sinuso kodas:
Uses math; var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+sin(a); writeln(c); Readln; End.
- duoda atsakymą 0,421294486750096 po 47 sekundžių su 2,6 GHz procesorium. Vadinasi, iš tiesų Free pascal funkciją frac() skaičiuoja tik vieną kartą ir greičiausiai iki kad Teiloro eilutė būtų kuo trumpesnė suderindamas minuso ženklus ir panašiai (gal dar kvadratu pakeltas reikšmes panaudoja vėl, o ne skaičiuoja iš naujo), nes 71/4,7=15,1 karto greičiau.
Free Pascal funkcijos frac() benchmark'as
[keisti]Uses math; var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+frac(a*0.15915494309189533576888); // 1/(2*3.14)=0.159 writeln(c); Readln; End.
- duoda rezultatą 499999986.434272 po 25 sekundžių su 2.6 GHz procesorium.
Teoretinis sinuso benchmark'as
[keisti]- Šis Free Pascal kodas skaičiuoja teisingai visus skaitmenis tik skaičiamas nuo 0 iki 1.09, o skaičiams didesniems nei 1.09 tikslumas mažėja, o labai dideliems tikslumas iš vis prarandamas. Kodas yra toks (tikrinamas teorinis sinuso greitis mažiems skaičiams):
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+(a-0.16666666666666667*a*a*a+0.0083333333333333333*a*sqr(sqr(a*1.0))- 0.00019841269841269841*a*sqr(a*1.0)*sqr(sqr(a*1.0))+ 0.0000027557319223985891*a*sqr(sqr(sqr(a*1.0)))- 0.000000025052108385441718775*a*sqr(a*1.0)*sqr(sqr(sqr(a*1.0)))+ 0.000000000160590438368216146*a*sqr(sqr(a*1.0))*sqr(sqr(sqr(a*1.0)))- 0.00000000000076471637318198164759*a*sqr(a)*sqr(sqr(a))*sqr(sqr(sqr(a*1.0)))+ 0.000000000000002811457254345520763*a*sqr(sqr(sqr(sqr(a*1.0))))); writeln(c); Readln; End.
- kuris duoda rezultatą po 43 sekundžių su 2.6 GHz procesorium.
- Va toks Free Pascal kodas:
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+(a-0.16666666666666667*a*a*a+0.0083333333333333333*a*sqr(sqr(a))- 0.00019841269841269841*a*sqr(a)*sqr(sqr(a))+ 0.0000027557319223985891*a*sqr(sqr(sqr(a)))- 0.000000025052108385441718775*a*sqr(a)*sqr(sqr(sqr(a)))+ 0.000000000160590438368216146*a*sqr(sqr(a))*sqr(sqr(sqr(a)))- 0.00000000000076471637318198164759*a*sqr(a)*sqr(sqr(a))*sqr(sqr(sqr(a)))+ 0.000000000000002811457254345520763*a*sqr(sqr(sqr(sqr(a*1.0))))); writeln(c); Readln; End.
- duoda rezultatą po 54 sekundžių su 2.6 GHz procesorium.
- Štai toks Free Pascal kodas:
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+a*(1-0.16666666666666667*sqr(a*1.0)+0.0083333333333333333*sqr(sqr(a*1.0))- 0.00019841269841269841*sqr(a*1.0)*sqr(sqr(a*1.0))+ 0.0000027557319223985891*sqr(sqr(sqr(a*1.0)))- 0.000000025052108385441718775*sqr(a*1.0)*sqr(sqr(sqr(a*1.0)))+ 0.000000000160590438368216146*sqr(sqr(a*1.0))*sqr(sqr(sqr(a*1.0)))- 0.00000000000076471637318198164759*sqr(a*1.0)*sqr(sqr(a*1.0))*sqr(sqr(sqr(a*1.0)))+ 0.000000000000002811457254345520763*sqr(sqr(sqr(sqr(a*1.0))))); writeln(c); Readln; End.
- duoda rezultatą po 41 sekundės su 2.6 GHz procesorium.
- Toks Free Pascal kodas:
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+a*(1-sqr(a*1.0)*(0.16666666666666667+0.0083333333333333333*sqr(a*1.0)- 0.00019841269841269841*sqr(sqr(a*1.0)))+ 0.0000027557319223985891*sqr(sqr(sqr(a*1.0)))- 0.000000025052108385441718775*sqr(a*1.0)*sqr(sqr(sqr(a*1.0)))+ 0.000000000160590438368216146*sqr(sqr(a*1.0))*sqr(sqr(sqr(a*1.0)))- 0.00000000000076471637318198164759*sqr(a*1.0)*sqr(sqr(a*1.0))*sqr(sqr(sqr(a*1.0)))+ 0.000000000000002811457254345520763*sqr(sqr(sqr(sqr(a*1.0))))); writeln(c); Readln; End.
- duoda rezultatą po 39 sekundžių su 2.6 GHz procesorium.
- Toks Free Pascal kodas:
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+a*(1- sqr(a*1.0)*(0.16666666666666667+ sqr(a*1.0)*(0.0083333333333333333- sqr(a*1.0)*(0.00019841269841269841+ sqr(a*1.0)*(0.0000027557319223985891- sqr(a*1.0)*(0.000000025052108385441718775+ sqr(a*1.0)*(0.000000000160590438368216146- sqr(a*1.0)*(0.00000000000076471637318198164759+ sqr(a*1.0)*0.000000000000002811457254345520763)))))))); writeln(c); Readln; End.
- duoda rezultatą po 27 sekundžių su 2.6 GHz procesorium.
- Toks teisingas Free Pascal kodas:
var a:longint; c:real; begin //for a:=0 to 1000000000 do a:=2; c:=c+a*(1+ sqr(a*1.0)*(-0.16666666666666667+ sqr(a*1.0)*(0.0083333333333333333+ sqr(a*1.0)*(-0.00019841269841269841+ sqr(a*1.0)*(0.0000027557319223985891+ sqr(a*1.0)*(-0.000000025052108385441718775+ sqr(a*1.0)*(0.000000000160590438368216146+ sqr(a*1.0)*(-0.00000000000076471637318198164759+ sqr(a*1.0)*(0.000000000000002811457254345520763- sqr(a*1.0)*0.000000000000000008220635246624329717))))))))); writeln(c); writeln(sin(2)); Readln; End.
- duoda rezultatą (neteisingai tik du paskutinius skaitmenis):
- 0.909297426825641 ir
- 0.90929742682568170.
- Toks Free Pascal kodas:
var a:longint; c:real; begin //for a:=0 to 1000000000 do a:=6; c:=c+a*(1+ sqr(a*1.0)*(-0.16666666666666667+ sqr(a*1.0)*(0.0083333333333333333+ sqr(a*1.0)*(-0.00019841269841269841+ sqr(a*1.0)*(0.0000027557319223985891+ sqr(a*1.0)*(-0.000000025052108385441718775+ sqr(a*1.0)*(0.000000000160590438368216146+ sqr(a*1.0)*(-0.00000000000076471637318198164759+ sqr(a*1.0)*(0.000000000000002811457254345520763+ sqr(a*1.0)*(-0.000000000000000008220635246624329717+ sqr(a*1.0)*(0.00000000000000000001957294106339126123- sqr(a*1.0)*0.000000000000000000000038681701706306840377))))))))))); writeln(c); writeln(sin(6)); Readln; End.
- duoda rezultatą:
- -0.279417241102534 ir
- -0.27941549819892587.
- Toks Free Pascal kodas (iki dalint iš 29 faktoriale):
var a:longint; c:real; begin //for a:=0 to 1000000000 do a:=6; c:=c+a*(1+ sqr(a*1.0)*(-0.16666666666666667+ sqr(a*1.0)*(0.0083333333333333333+ sqr(a*1.0)*(-0.00019841269841269841+ sqr(a*1.0)*(0.0000027557319223985891+ sqr(a*1.0)*(-0.000000025052108385441718775+ sqr(a*1.0)*(0.000000000160590438368216146+ sqr(a*1.0)*(-0.00000000000076471637318198164759+ sqr(a*1.0)*(0.000000000000002811457254345520763+ sqr(a*1.0)*(-0.000000000000000008220635246624329717+ sqr(a*1.0)*(0.00000000000000000001957294106339126123+ sqr(a*1.0)*(-0.000000000000000000000038681701706306840377+ sqr(a*1.0)*(0.00000000000000000000000006446950284384473396+ sqr(a*1.0)*(-0.000000000000000000000000000091836898637955461484+ sqr(a*1.0)*0.0000000000000000000000000000001130996288644771693)))))))))))))); writeln(c); writeln(sin(6));
- duoda rezultatą:
- -0.279415498042951 ir
- -0.27941549819892587.
- Toks Free Pascal kodas:
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+a*(1+ sqr(a*1.0)*(-0.16666666666666667+ sqr(a*1.0)*(0.0083333333333333333+ sqr(a*1.0)*(-0.00019841269841269841+ sqr(a*1.0)*(0.0000027557319223985891+ sqr(a*1.0)*(-0.000000025052108385441718775+ sqr(a*1.0)*(0.000000000160590438368216146+ sqr(a*1.0)*(-0.00000000000076471637318198164759+ sqr(a*1.0)*(0.000000000000002811457254345520763+ sqr(a*1.0)*(-0.000000000000000008220635246624329717+ sqr(a*1.0)*(0.00000000000000000001957294106339126123+ sqr(a*1.0)*(-0.000000000000000000000038681701706306840377+ sqr(a*1.0)*(0.00000000000000000000000006446950284384473396+ sqr(a*1.0)*(-0.000000000000000000000000000091836898637955461484+ sqr(a*1.0)*0.0000000000000000000000000000001130996288644771693)))))))))))))); writeln(c); Readln; End.
- duoda rezultatą po 47 sekundžių su 2.6 GHz procesorium. Tokiu atveju jei skaičiuoti, kad padaroma 14 daugybų per 1 iteraciją, tada padaroma 47*2.6/14=8.73 taktų vienai daugybai. O jei skaičiuoti, kad padaromos 28 daugybos, tada padaromi 47*2.6/28=4 taktai per vieną daugybą. Jei skaičiuoti, kad dar padaroma 14 sudeties operacijų, tada iš viso su daugybomis padaromi 47*2.6/42=2.9 ciklai vienai operacijai. Jei skaičiuoti, kad padaroma 14*3=42 daugybos ir 14 sudėties operacijų, tada iš viso padaromos 56 operacijos, o vienai operacijai tenka 47*2.6/56=2.18 procesoriaus ciklo.
- Skaičiuojant sinusą be iteracijų užtektų a pakelti kvadratu tik vieną kartą, todėl testuojamas toks Free Pascal kodas:
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+a*(1+ sqr(a*1.0)*(-0.16666666666666667+ a*(0.0083333333333333333+ a*(-0.00019841269841269841+ a*(0.0000027557319223985891+ a*(-0.000000025052108385441718775+ a*(0.000000000160590438368216146+ a*(-0.00000000000076471637318198164759+ a*(0.000000000000002811457254345520763+ a*(-0.000000000000000008220635246624329717+ a*(0.00000000000000000001957294106339126123+ a*(-0.000000000000000000000038681701706306840377+ a*(0.00000000000000000000000006446950284384473396+ a*(-0.000000000000000000000000000091836898637955461484+ a*0.0000000000000000000000000000001130996288644771693)))))))))))))); writeln(c); Readln; End.
- kuris duoda rezultatą po 41 sekundės su 2.6 GHz procesorium.
- Toks Free Pascal kodas (neturintis nieko bendro su sinusu):
var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+a*(1+ a*(0.16666666666666667+ a*(0.0083333333333333333+ a*(0.00019841269841269841+ a*(0.0000027557319223985891+ a*(0.000000025052108385441718775+ a*(0.000000000160590438368216146+ a*(0.00000000000076471637318198164759+ a*(0.000000000000002811457254345520763+ a*(0.000000000000000008220635246624329717+ a*(0.00000000000000000001957294106339126123+ a*(0.000000000000000000000038681701706306840377+ a*(0.00000000000000000000000006446950284384473396+ a*(0.000000000000000000000000000091836898637955461484+ a*0.0000000000000000000000000000001130996288644771693)))))))))))))); writeln(c); Readln; End.
- duoda rezultatą po lygiai 40 sekundžių su 2.6 GHz procesorium. Iš viso padaroma 15 daugybų ir 15 sudėčių per vieną iteraciją. Taigi, padaroma 30 operacijų per 1 iteraciją. Vienai operacijai reikia 40*2.6/30=3.4(6) ciklų. Apytiksliai reikia 3,5 ciklo vienai operacijai.
PASTEBĖJIMAS
[keisti]- Skyreliuose Something wrong ir Too fast multiplication (multiplication benchmark) iš dalies yra klaidos koduose.
- Pavyzdžiui, skyrelyje Too fast multiplication (multiplication benchmark) yra toks tekstas:
- "And this benchmark
var a:longint; b,d,c:real; begin b:=0; c:=123456789012345; for a:=1 to 1000000000 do b:=b+a; d:=c*b; writeln(d); readln(); end.
- gives result also after 5 seconds on ~3GHz CPU."
- Šitas kodas iš dalies yra su klaidom, nes programa Free Pascal nedaro antros eilutės po žodelio do (daro tik pirmą eilutę, o antrą eilutę padaro/apskaičiuoja [vieną kartą] tik po pirmos eilutės milijardo iteracijų skaičiavimo). Todėl ką tik pateiktas kodas yra ekvivalentus tokiam kodui:
var a:longint; b,d,c:real; begin b:=0; c:=123456789012345; for a:=1 to 1000000000 do b:=b+a; writeln(c*b); readln(); end.
- (kuriame nėra eilutės "d:=c*b;").
- Todėl tuos kodus taip greitai ir skaičiuoja kompiuteris.
How many CPU cycles need for sin(x) function operation?
[keisti]- In popular example of Fourier integral I showed that need about 133 cycles for one sin(x) operation on 4.16 GHz CPU (with heavely loaded one internet browser, but nothing was playing in browser or anywhere else).
- In Sinuso benchmark'as 2 is written:
- "Šitas sinuso kodas:
Uses math; var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+sin(a); writeln(c); Readln; End.
- duoda atsakymą 0,421294486750096 po 47 sekundžių su 2,6 GHz procesorium. ..."
- So on 2.6 GHz CPU with nothing loaded, just Free Pascal code, after 1 bilion iterations was gotten result "0.421294486750096" (not after first time, after second and consecunent times) after 47 seconds. So for one sin(x) operation need 47*2.6=122.2 cycles on 2.6 GHz Dual core AMD CPU. About 122 cycles.
- This Free Pascal code:
Uses math; var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+sin(a); writeln(c); Readln; End.
- gives result [4.2129448675010567E-001 which means 4.2129448675010567/10] after about 1 minute and 56 seconds (~116 seconds), when running first time on 4.16 GHz CPU (with heavely loaded one internet browser, but nothing was playing in browser). Second time and third time result [4.2129448675010567E-001] was gotten after 34-35 seconds on 4.16 GHz CPU (with heavely loaded one internet browser, but nothing was playing). So for one sin(x) operation need about 34*4.16=141.44 cycles on this 4.16 GHz CPU. So 141 cycles for sin(x) is more than 122 cycles on 2.6 GHz CPU. Of course you could blame heavily loaded internet Browser, but FPU by official teory almost not needed in Windows and software calculations, so in no loaded with intenet browser situation should be about the same 141 cycles for sin(x) for 4.16 GHz CPU. And I seems remember that loaded or not something in Windows, there Free Pascal calculations was almost with same speed. On the over hand here I got, that for calculations of parabola lenght need the same amount cycles with 2.6 GHz old CPU and 4.16 GHz newer CPU. But everywhere else 2.6 GHz CPU needed less cycles in Free Pascal calculations than 4.16 GHz CPU. As it known RAM memory latencies on some CAS (colum address strobe) or something [like there is Row selection of memory, which is almost the same speed (latency) for all types of memory, and there is Column selection of memory, which for some reasons have growing speed with newer and newer DDRn (DDR2, DDR3, DDR4, DDR5, ...) models] very rapidly improving (CAS latencies decreasing with newer DDR generation) with newer DDR generations and RAS (row address strobe, row selection or something) latencies almost don't decreasing [with newer DDR generation]. So if need to jump to farer address in RAM then need select another ROW or RAS and this is slow enough and, I think, if don't need to jump to some distant memory RAM address (in same selected ROW or RAS), then DDR RAM is faster on newer Generation (because CAS selection or Column selection in RAM is very fast). So depending on code there can be different behavior of RAM memory speed. Who knows, maybe some alocation in memory for FPU calculations takes some time and with no increasing RAS speed with DDR newer generation maybe this becomes bottleneck... As it is official, FPU is given it's own memory place in RAM and it don't comunicates with CPU directly, but both fetching/passing code to Some memory location... But cache should solve many things, so hard to say why need for FPU memory RAM [in calculations], unless cache is shared between CPU and FPU...
- I little bit make mistake about CAS latency. I know for sure that RAS or Row selection is slow and is for much longer time than CAS or Column selection. But actualy CAS latency also almost don't improving over time with newer generation of DDR RAM.
- From here:
- https://en.wikipedia.org/wiki/CAS_latency
"The CAS latency is the delay between the time at which the column address and the column address strobe signal are presented to the memory module and the time at which the corresponding data is made available by the memory module. The desired row must already be active; if it is not, additional time is required."
- CAS latency of SDRAM 100 MHz is 2 cycles or 20 ns (1/(20*10^{-9})=1/0.00000002=50,000,000= 50 MHz). Here https://en.wikipedia.org/wiki/CAS_latency#Memory_timing_examples
- is written that SDRAM 100 MHz need 20 ns (nanoseconds) for first word (word is 16 bit integer or peace of data) and after transfering fourth word (16 bits of data) will pass 50 ns. And after transfering Eigth word will pass 90 ns. So for 100 MHz SDRAM first time (aka when column is chosen) for word transfering need wait 20 ns. And for any later consequent word transfering need wait 10 ns (20+10+10+10=50 ns, then 50+10+10+10+10=90 ns).
- CAS latency of DDR2-800 (400 MHz like on 2.6 GHz CPU) is 6 cycles. And first word transfering goes in 15 ns (1/(15*10^{-9})=1/0.000000015=66,666,666= 66 MHz; little bit improved transfer speed of first 16 bits of data or first word over 100 MHz SDRAM). In that table written that DDR2-800 RAM transfer time of any, but not first word is 1.25 ns (for SDRAM 100 MHz this time was 10 ns). So after transfering 4 words will be spend 15+3*1.25=18.75 ns. And after transfering 8 words spend time is 15+3*1.25+4*1.25=23.75 ns. This 1.25 ns comes from 1/(1.25*10^{-9})=1/0.00000000125=800,000,000 =800 MHz. So according to this if CAS is selected and each newer RAM address is by one bigger then Data transfering goes at 800 MHz. But if need jump to some over address, which is +/- more than 1, then need wait for first data transfer 15 ns, like if transfering goes with 66 MHz speed.
- CAS latency of DDR3-1600 (800 MHz like on 4.16 GHz CPU) is 11 cycles. And first word transfering goes in 13.75 ns ( 1/(13.75*10^{-9})=1/0.00000001375=72,727,272= 72 MHz; BTW on my computer with 4.16 GHz CPU and DDR3-1600 RAM program CPUID CPU-Z also showing that memory CAS# Latency (CL) is 11 clocks [also RAS# to CAS# Delay (tRCD)=11 clocks; RAS# Precharge (tRP)=11 clocks; Cycle Time (tRAS)=30 clocks; Bank Cycle Time (tRC)=39 clocks; so for this my RAM DDR3-1600 timings are 11-11-11-30-39] ). Any not first word is transfered in 0.625 ns or at 1600 MHz speed ( 1/(1600*10^6)=0.000000000625 = 0.625 ns ). So after transfering 4 words, waisted time is 13.75+3*0.625=15.625 ns. And after transfering 8 words waisted time is 13.75+3*0.625+4*0.625= 18.125 ns.
- First word trasnfering time is gotten like this:
- 1/(800*10^6) *11=0.00000001375= 1.375*10^(-8)=13.75*10^(-9)= 13.75 ns.
- CAS latency of DDR4-4800 (2400 MHz) is 19 cycles. And first word transfering goes in 7.92 ns ( 1/(7.92*10^{-9})=1/0.00000000792=126,262,626= 126 MHz speed ). Number 7.92 ns comes from 1/(4800*10^6) *19=2.08333*10^{-10} *19 = 3.958333*10^{-9}. But because there is Double Data Rate, then 3.958333*2= 7.91666 ns. And 1/(4800*10^6)=2.08333*10^{-10}=0.2083 ns. After 4 words transfering taken time is 7.91666+3*0.2083=8.54156 ns (in wikipedia table in given link this result is 8.54 ns). After transfering 8 words taken time is 7.91666+3*0.2083+4*0.2083=9.37476 ns (in wikipedia table this time is 9.38 ns).
- CAS latency of DDR5-6600 (3300 MHz) is 34 cycles (there in table is another example: DDR5-6400 (3200 MHz) with CAS latency 32 cycles). And first word transfering goes in 10.30 ns ns ( 1/(10.3*10^{-9})=1/0.0000000103=97,087,378= 97 MHz speed ). Number 10.30 ns comes from 1/(3300*10^6) *34=3.030303*10^{-10} *34 = 1.0303*10^{-8} =10.303*10^{-9}=10.303 ns. Each not first word transfered after 1/(6600*10^6)=1.51515*10^{-10} s=0.1515*10^{-9} s=0.1515 ns. So 4 words transfered after 10.303 + 3*0.1515 = 10.7575 ns (in wikipedia 10.76 ns). And 8 words transfered after 10.303 + 3*0.1515 + 4*0.1515 = 11.3635 ns (in wikipedia table this is 11.36 ns).
- So we can see that DDR4-4800 with CAS 19 cycles transfering data faster than DDR5-6600 with CAS 34 cycles. (7.92 vs 10.30; 8.54 vs 10.76; 9.38 vs 11.36).
- There is possible that in wikipedia article about CAS latency, word "word" means one bit transfer. And since there is 8 banks in each memory (RAM) chip, and also there is 8 memory chips on RAM module (which you insert into motherboard), then 8*8=64 bits data can be transfered from one RAM module or 128 bits of data from two modules RAM in dual channel configuration.
- In this case also possible that after CAS was selected you can take very fast 64 bits of data from one and the same ROW, but from different Columns. But more realisticly is, that you can take 64 bits of data or from only subsequent column each time or even only from the SAME Column (in this case memory is always very slow for all DDR generations and only in small number of situations gives fast transfers from/to memory [by transfering from/to one and the same memory address]).
- There also possible two over explanations.
- First explanation is that word means 16 bits of data, but if needed you can transfer also 64 bits of data from one RAM module at same speed like 16 bits of data (like word).
- Second explanation is that now is popular Dual Channel RAM configuration. Dual channel have two 64 bits data buses, so total 128 bits data bus. And in CPU operations often enough 16 bits of data for some instructions/calculations... So if there is 8 banks in each memory chip and there is 2 modules with 8 chips each, then there is 2*8*8=128 bits to select/address in those two memory modules. So possible that RAM memory engineers/designers desided that when selected ROW (or RAS), then send signal to all the banks in all chips (of those two RAM modules) to activate Column (CAS) with of course slow speed. But in each bank there is diferent bits values (in two RAM modules is total 128 banks and each bank have different bit value, say first bank of this address have 1, second bank have 0, third bank have 0, 4th bank have 1, 5th bank have 0; each bank have some independent bit value). So if for CPU needed only 16 bits of data for some instructions, then it gets from first DDR RAM module this 16 bits of data from 16 banks. All the RAS and CAS signals are activated, so then need to activate only those 16 banks and take those precious 16 bits. Then when this 16 bits used in some instructions/operations then with RAS and CAS signals activated for given address, only need to activate another 16 banks and this time to get 16 bits of data much faster than in first time (because in first time needed to activate RAS and CAS signals for all 128 banks). Third time RAM address also selected with active RAS and CAS signals, so only need select another 16 bits (16 banks) from 128 bits (banks) possible. So 128/16=8 words. So after 8 words was transfered and used for some insturctions or operations, then need again select another address in RAM. And there is no big speed up gain in CPU speed. For FPU calculations there is no speed up gain at all [in single channel configuration], because FPU loading 64 bits of data each time if calculating in big precission. Dual channel can give some speed up gain for FPU but smaller than 2x...
- Some powerful server processors have more than 2 channels RAM. Often have 4, 8 or 12 channels. And also can have about 100 cores. Maybe for this servers you can get whooping speed up in Reading/writing from/to RAM if CAS latency functioning like described in Second explanation. But still not too significant speed up and not in all calculations/operations that big...
- Another most logical and realistic explanations is that RAM memory besides ROWs and Columns have something for fast accessing similar like explained about banks in Second explanation. Then CPU selects Row in RAM with big latency, then selects Column with smaller, but still big enough latency and then CPU have some say 1 kilobyte of some banks. And in this 1 KB of space CPU can very fast write/read anything to/from RAM and with no limitations of reading and writing number of times and not necessery in some subsequent order. Like CPU have 1 kilobyte of very fast memory for this given time until no need data from over memory locations.
- [Update after few days. Today when I try to wake up computer from sleep by pushing few times "Enter" button, my this 4.16 GHz CPU restarted (and he was lagging one or few days before this). This is not first time when after long using this (with heavy enough loaded Opera Browser) computer with this CPU, it not working (usualy it not responding (pakimba)) and need to restart it by pushing restart button on computer. So this 4.16 GHz CPU restarted itself and I wait until it fully load Windows 10... When Windows 10 was loaded I started Free Pascal sine benchmark:
Uses math; var a:longint; c:real; begin for a:=0 to 1000000000 do c:=c+sin(a); writeln(c); Readln; End.
- and it gave result [4.2129448675010567E-001] after 2 minutes and 6 seconds (126 seconds) on 4.16 GHz CPU (with nothing loaded in Windows 10, just Free Pascal prgram). Then I launch second time this code and it gave result after 35 seconds.
- Then I close Free Pascal program and launch again. Then I again launched this sine benchmark code and it gave result [4.2129448675010567E-001] after 1 minute and 57 seconds (117 seconds) on 4.16 GHz CPU (with nothing loaded in Windows 10). Then launch second time this code and got result [4.2129448675010567E-001] after 35 seconds (when launched third time result was gotten also after 35 s).
- Then I launched Opera internet browser and usualy after when Windows 10 not responding after long using Opera loaded (can be a few weeks or months), then after restart by launching Opera it loading pages, which was not closed. So Opera will be loading this pages forever. The trick is, that need to close Opera and launch again and it will fast enough load all pages before crash of Windows 10. After seeing that Opera working, I close Opera. So I again launched Free Pascal and this time, when launch first time this sine benchmark code, I got result [4.2129448675010567E-001] after 1 minute and 54 seconds (114 s) on 4.16 GHz CPU (with nothing loaded in Windows 10). When I launched second time this sine code I got result [4.2129448675010567E-001] after 34 seconds.
- And you know what? Results are identical with heavily loaded or no loaded Internet Browser(s). (116 s vs 114 s; 34-35 s vs 34-35 s).
- Also after this all sine benchmarks, I launched Free Pascal and open division benchmark:
var a:longint; c:real; begin for a:=1 to 1000000000 do c:=c+1/a; writeln(c); Readln; End.
- which gave result [2.1300481502506980E+001] first time after 1 minute and 40 seconds (100 seconds) on 4.16 GHz CPU (with nothing loaded in Windows 10). Then I lauchned second time this division benchmark and got result [2.1300481502506980E+001] after 6 seconds.
- Then I close Free Pascal and lauchned first time this division benchmark and got result [2.1300481502506980E+001] after 1 minute and 41 second (101 second) on 4.16 GHz CPU (with nothing loaded in Windows 10). Second time got result after 6 s.
- Then I was writting here and decided to be sure about this "1 minute and 40 s" and lauchned Free Pascal and after first time launching this code (by pressing "Run") I was surprised, because get result [2.1300481502506980E+001] after 1 minute and 5 seconds (65 seconds) on 4.16 GHz CPU (with heavily enough loaded internet browser Opera). By lauching second time this division code, I got result [2.1300481502506980E+001] after 6 seconds on 4.16 GHz CPU (with heavily enough loaded internet browser Opera).
- Here is result of division benchmark with loaded internet browser(s) on 4.16 GHz CPU. There first time result is gotten after about 103 seconds and second time after 6 seconds. So second time result is identical to results gotten now (6 seconds and "2.1300481502506980E+001"). And first launch time result is better now (100-101 s vs 103 s OR 65 s vs 103 s)...]