SATOCONOR.COM
Anonymous ‘Notes On Last Digit Distribution of the Prime Numbers’ 6.4. (2007)
Communication to the editor
Notes on Last Digit Distribution of the Prime Numbers
Email correspondence with professional mathematician who, for his own reasons, wants to remain anonymous
Communication with Johan
Gerard van der Galiën
For comments: johan.van.der.galien@satoconor.com
Version 1.0 November 29, 2007
----- Original Message -----
From: "Anonymous"
To: <johan.van.der.galien@satoconor.com>
Subject: A comment on "Last Digit Distribution
of the Prime Numbers” Are the Prime Numbers Randomly Distributed? Part 2"
SATO 5.3. (2006)
Just as an experiment I used Mathematica
to select 50000 random
numbers between 10^6 and 10^7-1,
collected only those that were
prime, and did a Chi Squared
calculation on the distribution of
the last digit. I repeated this whole process 100 times, and
sorted the resulting values. As an example, I got
0.113208, 0.200373, 0.227615, 0.278162, 0.306059,
0.354415, 0.357572, 0.357677, 0.449953, 0.504005,
0.512089, 0.767157, 0.770541, 0.779788, 0.941967,
1.00554, 1.08613, 1.10349, 1.11877, 1.13043,
1.18732, 1.24862, 1.249, 1.26835, 1.26924,
1.27138, 1.27453, 1.30117, 1.31329, 1.35723,
1.37658, 1.3942, 1.45258, 1.53429, 1.55221,
1.61485, 1.70891, 1.80764, 1.82649, 1.85961,
1.92429, 1.93912, 2.02925, 2.03448, 2.18291,
2.21013, 2.32982, 2.39387, 2.39969, 2.42981,
2.4572, 2.48232, 2.65632, 2.67554, 2.70417,
2.72791, 2.7325, 2.75107, 2.75417, 2.7898,
2.79956, 2.92413, 2.94878, 2.9511, 3.26214,
3.29618, 3.30631, 3.3789, 3.38745, 3.40358,
3.7597, 3.96487, 3.99733, 4.00215, 4.22697,
4.34551, 4.43908, 4.44759, 4.49261, 4.87123,
4.87198, 5.08945, 5.17533, 5.26189, 5.28086,
5.66526, 5.75077, 6.31547, 6.54233, 6.54706,
6.66218, 7.61201, 7.67855, 8.39802, 8.52527,
8.92465, 9.48768, 10.1931, 11.8016, 12.2043
And I find about 5 below the 5% level and 5 above
the 95% level.
If I repeat this whole process I tend to get roughly
the same
results, but not exactly each time,
as I would expect.
Hopefully I haven't made a mistake in doing
this. But if not
this seems to give different
results from the author's tests.
I am puzzled why we get different results and wonder
if there
is an explanation that could
resolve this.
Thank you
----- Original Message -----
From: “Anonymous”
To:
Johan van der Galiën
Subject: A comment on "Last Digit Distribution
of the Prime Numbers” Are the Prime Numbers Randomly Distributed? Part 2"
SATO 5.3. (2006)
Johan van der Galiën wrote: >
>Dear Sir,
>I think the difference comes from that I used
consecutive primes and you
>used primes picked at random! That this will be
a difference is also clearly
>stated in the 'Last Digit Distribution of the Prime Numbers' article.
>The 167 "ideal" sample size is of
course a very rough estimation based on
>only one (consecutive
primes) measurement. From you I know now that 50000 on
>a 10^6 to 10^7-1 interval also works very well!
I seem to remember that there were other conditions
for the Chi Squared
test to be applied and not give
misleading results, but it has been a long
time since I have worked on that
and do not remember the details.
For my own reasons I would remain anonymous. I do not want any credit for
anything that I do.
On that condition I will show you a few minutes work
on this:
--------------------------------------------------------------------------
Chi Squared calculation for a list of 4 items, with
expected equal
numbers for all items in the list
chiSquared[x_]:=Module[{expected=Apply[Plus,x]/4},
(x[[1]]-expected)^2/expected+(x[[2]]-expected)^2/expected+
(x[[3]]-expected)^2/expected+(x[[4]]-expected)^2/expected
]
Table of Chi Squared test on number of primes ending
in 1,3,7,9 in 50000
random integers between n and
10n-1
f[n_]:=
TableForm[(*Print
in nice row format*)
Partition[(*Divide up into groups of 5, to neatly see 5% and
95% rows*)
Sort[(*Sort the Chi Squared values in increasing order*)
Table[(*Build a table of 100 runs of the experiment*)
N[chiSquared[(*Calculate the
Map[Length,(*Find the number in each trailing
digit group*)
Sort[(*Sort trailing digits into increasing
order*)
Map[Mod[#,10]&,(*Extract trailing digit of
each prime*)
Select[(*Pick
out just the prime integers*)
Table[(*Build
a list of 50000 random integers in range*)
Random[Integer,{n,10n-1}],{50000}
],
PrimeQ
]
]
]
]
]
]
],{100}
]
],5
]
]
Let us begin with 100 experiments of sampling 50000
random integers in
10..99, extracting the primes
and calculating the 100 Chi Squared values
f[10]
{ {42.397, 51.352, 53.195,
54.268, 57.743},
{58.466,
59.688, 60.394, 62.401, 63.658},
{64.850,
64.877, 65.032, 65.123, 66.498},
{66.616,
67.089, 67.181, 67.251, 67.641},
{68.391,
69.777, 70.264, 70.585, 72.152},
{72.235,
73.181, 74.645, 74.752, 75.027},
{75.115,
75.306, 75.886, 76.193, 76.402},
{76.582,
76.771, 76.910, 77.356, 77.990},
{78.386,
78.639, 78.689, 79.260, 80.194},
{80.392,
80.919, 81.071, 81.142, 81.856},
{82.369,
83.081, 83.098, 83.998, 84.040},
{84.530,
84.920, 85.401, 85.851, 86.223},
{86.257,
86.555, 86.628, 87.429, 87.717},
{88.316,
88.764, 89.496, 89.944, 90.349},
{90.822,
90.838, 91.840, 92.709, 93.413},
{94.036,
94.437, 94.681, 96.770, 97.307},
{98.494,
100.139, 100.662, 103.817, 106.110},
{106.228,
106.498, 106.714, 107.355, 107.479},
{109.474,
111.990, 114.357, 114.853, 115.270},
{119.929,
124.612, 130.716, 132.142, 136.796} },
Chi Square is very convinced this would not arise by
sampling if 1,3,7,9
appeared equally frequently
The large number of samples makes this test very
sensitive. Look at how
many of each trailing digit
appears
Split[Sort[Map[Mod[#,10]&,Select[Range[10,99],PrimeQ]]]]
{{1,1,1,1,1},{3,3,3,3,3,3},{7,7,7,7,7},{9,9,9,9,9}}
Ah, so 3 appears 6 times while the others appear
only 5 times
And so
been from equal frequency digits
What is the
a sample?
N[chiSquared[{5,6,5,5}]]
0.142857
That statistic would occur more than 1% of the time
but thus is far less
surprising than my extreme table
values indicate, because of sample size
How about larger numbers?
f[10^2]
{ {16.251, 18.350, 19.150,
21.582, 22.723},
{24.428,
26.840, 26.938, 27.040, 28.566},
{32.105,
32.395, 32.748, 33.449, 34.539},
{34.684,
35.310, 35.609, 35.628, 36.131},
{36.235,
36.496, 36.749, 36.809, 37.067},
{37.556,
37.862, 37.866, 37.932, 38.366},
{38.908,
39.038, 39.365, 39.394, 39.538},
{39.712,
39.755, 40.278, 40.989, 41.022},
{41.304,
41.378, 42.290, 42.404, 42.611},
{42.735,
43.004, 43.511, 43.686, 44.273},
{44.516,
45.098, 45.159, 45.507, 45.585},
{45.646,
46.459, 46.919, 46.964, 47.516},
{47.546,
47.895, 48.163, 48.271, 48.296},
{48.943,
49.580, 49.721, 50.049, 50.698},
{51.557,
52.394, 52.409, 53.555, 54.283},
{54.763,
54.935, 55.122, 55.182, 55.615},
{55.696,
55.958, 56.604, 56.649, 58.084},
{60.420,
60.823, 62.450, 62.586, 65.002},
{65.022,
65.351, 67.807, 70.084, 73.685},
{75.689,
76.203, 76.490, 85.047, 90.606} },
Again
why? Just show the counts of trailing digits
Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[100,999],PrimeQ]]]]]
{35,35,40,33}
And that certainly is not equally occurring
frequencies of trailing digits
N[chiSquared[{35,35,40,33}]]
0.748252
But that lies neatly between the upper and lower
This again shows the difference in power when using
many samples versus
one sample
f[10^3]
{ {0.080, 0.198, 0.269, 0.274,
0.289},
{0.337,
0.366, 0.375, 0.486, 0.580},
{0.604,
0.851, 0.926, 0.929, 0.939},
{1.002,
1.132, 1.132, 1.173, 1.318},
{1.331,
1.336, 1.385, 1.479, 1.497},
{1.511,
1.540, 1.609, 1.790, 1.812},
{1.837,
1.838, 1.881, 1.971, 1.974},
{1.992,
2.026, 2.087, 2.136, 2.187},
{2.230,
2.268, 2.291, 2.383, 2.471},
{2.487,
2.508, 2.524, 2.618, 2.720},
{2.769,
2.771, 2.823, 2.857, 2.871},
{2.874,
2.897, 2.902, 2.955, 3.040},
{3.154,
3.188, 3.310, 3.383, 3.395},
{3.397,
3.770, 3.777, 3.841, 3.914},
{3.924,
3.982, 3.982, 4.191, 4.250},
{4.416,
4.418, 4.475, 4.490, 4.674},
{4.724,
4.923, 4.989, 5.155, 5.236},
{5.471,
5.673, 5.720, 5.744, 6.066},
{6.295,
6.361, 6.922, 7.058, 7.763},
{7.806,
7.960, 8.415, 14.458, 16.201} },
This is very close to the expected 5% and 95% values
for
0.352 and 7.815, what are
the counts?
Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[1000,9999],PrimeQ]]]]]
{266,268,262,265}
N[chiSquared[{266,268,262,265}]]
0.070688
And the single sample shows a very different answer
from large numbers
of samples
f[10^4]
{ {0.088, 0.165, 0.225, 0.279,
0.313},
{0.347,
0.356, 0.374, 0.408, 0.499},
{0.504,
0.537, 0.547, 0.582, 0.621},
{0.725,
0.750, 0.754, 0.761, 0.818},
{0.859,
1.025, 1.119, 1.131, 1.184},
{1.216,
1.244, 1.326, 1.397, 1.448},
{1.486,
1.542, 1.555, 1.668, 1.739},
{1.794,
1.798, 1.868, 1.940, 1.968},
{1.981,
2.056, 2.066, 2.132, 2.219},
{2.296,
2.458, 2.522, 2.600, 2.625},
{2.644,
2.699, 2.796, 2.880, 2.893},
{2.919,
2.983, 2.992, 3.066, 3.092},
{3.161,
3.219, 3.221, 3.255, 3.276},
{3.401,
3.451, 3.523, 3.623, 3.688},
{3.796, 3.812,
3.928, 4.174, 4.306},
{4.595,
4.626, 4.694, 5.049, 5.060},
{5.102,
5.290, 5.797, 5.820, 5.826},
{5.983,
6.085, 6.248, 6.321, 6.528},
{6.534,
6.750, 6.860, 6.942, 6.957},
{7.191,
7.359, 7.662, 12.378, 13.506} },
Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[10000,99999],PrimeQ]]]]]
{2081,2092,2103,2087}
N[chiSquared[{2081,2092,2103,2087}]]
0.124716
f[10^5]
{ {0.019, 0.073, 0.185, 0.210,
0.214},
{0.221,
0.310, 0.328, 0.358, 0.369},
{0.391,
0.486, 0.525, 0.529, 0.562},
{0.610,
0.660, 0.688, 0.691, 0.768},
{0.778,
0.814, 0.910, 0.979, 1.007},
{1.120,
1.289, 1.302, 1.350, 1.486},
{1.498,
1.565, 1.720, 1.814, 1.817},
{1.849,
1.875, 2.006, 2.006, 2.047},
{2.137,
2.148, 2.207, 2.233, 2.234},
{2.239,
2.397, 2.416, 2.443, 2.468},
{2.474,
2.521, 2.616, 2.616, 2.636},
{2.665,
2.685, 2.685, 2.691, 2.730},
{2.907,
3.016, 3.081, 3.146, 3.207},
{3.272,
3.274, 3.299, 3.307, 3.334},
{3.374,
3.379, 3.443, 3.469, 3.517},
{3.623,
3.927, 4.150, 4.154, 4.193},
{4.475,
4.904, 4.940, 5.138, 5.188},
{5.245,
5.255, 5.329, 5.442, 5.522},
{5.990,
6.145, 6.466, 7.464, 7.732},
{7.762,
9.606, 9.913, 10.952, 11.956} },
Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[100000,999999],PrimeQ]]]]]
{17230,17263,17210,17203}
N[chiSquared[{17230,17263,17210,17203}]]
0.125911
f[10^6]
{ {0.093, 0.138, 0.206, 0.246,
0.319},
{0.346,
0.375, 0.451, 0.480, 0.502},
{0.541,
0.626, 0.648, 0.807, 0.832},
{0.837,
0.844, 0.899, 0.932, 0.937},
{0.976,
1.028, 1.037, 1.069, 1.147},
{1.183,
1.315, 1.358, 1.373, 1.381},
{1.460,
1.496, 1.565, 1.576, 1.583},
{1.639,
1.715, 1.769, 1.775, 1.822},
{1.826,
1.892, 1.905, 1.922, 1.950},
{1.989,
2.185, 2.202, 2.293, 2.572},
{2.589,
2.623, 2.623, 2.659, 2.700},
{2.930,
2.938, 3.094, 3.139, 3.190},
{3.239,
3.397, 3.527, 3.567, 3.579},
{3.609,
3.622, 3.662, 3.807, 3.906},
{3.947,
4.015, 4.031, 4.061, 4.090},
{4.195,
4.574, 4.677, 4.685, 5.098},
{5.107,
5.218, 5.408, 5.536, 5.571},
{5.616, 5.654,
5.905, 5.962, 6.078},
{6.168,
6.171, 7.179, 7.225, 7.344},
{8.118,
8.355, 8.836, 9.114, 9.348} },
Map[Length,Split[Sort[Map[Mod[#,10]&,Select[Range[1000000,9999999],PrimeQ]]]]]
{146487,146565,146590,146439}
N[chiSquared[{146487,146565,146590,146439}]]
0.0994726
f[10^7]
{ {0.118, 0.125, 0.138, 0.224,
0.282},
{0.297,
0.380, 0.382, 0.415, 0.471},
{0.507,
0.546, 0.568, 0.585, 0.610},
{0.681,
0.682, 0.685, 0.750, 0.750},
{0.829,
0.831, 0.860, 0.860, 0.909},
{0.911,
0.967, 0.991, 1.010, 1.039},
{1.111,
1.143, 1.180, 1.303, 1.441},
{1.486,
1.533, 1.573, 1.617, 1.657},
{1.682,
1.763, 1.769, 1.786, 1.818},
{1.832,
1.908, 1.928, 1.973, 1.981},
{2.069,
2.113, 2.115, 2.170, 2.213},
{2.282,
2.305, 2.318, 2.357, 2.520},
{2.524, 2.608, 2.612, 2.722, 2.723},
{2.816,
2.844, 2.952, 3.023, 3.034},
{3.141,
3.196, 3.230, 3.391, 3.431},
{3.456,
3.710, 3.728, 3.889, 4.105},
{4.226,
4.242, 4.268, 4.349, 4.587},
{4.630,
4.657, 4.715, 4.926, 5.255},
{5.446,
5.637, 5.803, 6.147, 6.800},
{7.489,
7.651, 8.191, 10.400, 10.503} },
f[10^8]
{ {0.040, 0.159, 0.204, 0.229,
0.243},
{0.298,
0.365, 0.383, 0.400, 0.453},
{0.528,
0.531, 0.531, 0.586, 0.605},
{0.677,
0.686, 0.740, 0.764, 0.785},
{0.841,
0.877, 0.907, 0.923, 0.926},
{1.007,
1.036, 1.066, 1.073, 1.084},
{1.117,
1.123, 1.145, 1.152, 1.161},
{1.202,
1.260, 1.275, 1.410, 1.472},
{1.531,
1.661, 1.672, 1.753, 1.758},
{1.759,
1.761, 1.784, 1.837, 1.879},
{2.073,
2.097, 2.188, 2.214, 2.258},
{2.266,
2.328, 2.378, 2.513, 2.514},
{2.807,
2.840, 2.853, 2.861, 2.910},
{3.065,
3.133, 3.193, 3.483, 3.549},
{3.552,
3.650, 3.692, 3.714, 3.744},
{3.810,
3.958, 4.077, 4.320, 4.348},
{4.505,
4.668, 4.868, 4.916, 5.159},
{5.255, 5.458,
5.720, 5.798, 5.856},
{5.964,
5.973, 6.218, 6.223, 7.058},
{7.187,
8.450, 8.901, 14.907, 15.151} },
f[10^9]
{ {0.020, 0.129, 0.130, 0.448,
0.457},
{0.475,
0.547, 0.582, 0.624, 0.661},
{0.684,
0.719, 0.758, 0.780, 0.783},
{0.874,
0.924, 1.005, 1.035, 1.074},
{1.086,
1.284, 1.313, 1.581, 1.64},
{1.641,
1.649, 1.650, 1.653, 1.785},
{1.851,
1.872, 1.904, 1.972, 1.995},
{2.000,
2.018, 2.024, 2.124, 2.129},
{2.136,
2.142, 2.147, 2.154, 2.240},
{2.241,
2.317, 2.370, 2.375, 2.424},
{2.478,
2.523, 2.529, 2.646, 2.677},
{2.684,
2.749, 2.914, 2.923, 3.071},
{3.115,
3.133, 3.245, 3.257, 3.526},
{3.543,
3.575, 3.647, 3.743, 3.748},
{3.843,
3.864, 3.910, 4.100, 4.182},
{4.256,
4.312, 4.364, 4.392, 4.728},
{4.730, 4.764,
4.887, 4.991, 5.026},
{5.686,
5.786, 6.269, 6.433, 6.671},
{7.646,
7.748, 7.808, 7.823, 8.080},
{9.934,
10.171, 10.799, 10.915, 12.310} },
Thus it seems that with small numbers of samples it
is possible to get
values that are very different
from large numbers of samples. I gave
my statistics books away to
students long long ago and cannot remember
the subject now. But it seems like there was some rule about
when this
could be applied.
> For example the Random Function of Mathematica seems to pass this test.
>
> Kind regards,
I hope something in this might be of use to
you. Do with it what you
wish, as long as I get no
credit.
Thank
you
>Johan van der
Galiën.
>SATOCONOR.COM Chief-Editor and Webmaster