The Fundamentals of
RANDOMICIS are compression by state-of-the-art compressors, several randomness
tests and successive XOR extractions to obtain a high entropy file.
SATOCONOR.COM
SATOCONOR.COM
Journal of RANDOMICS
Fundamentals
of Randomics
Supporting paper for ‘Randomics: The Study of
Patterns and Randomness: Are the prime numbers randomly distributed? Part
By Johan G. van der Galiën M.Sc.
Version 1.0 September 1, 2006
Abstract:
In
this paper the basic tools and concepts of Randomics are explained and applied
on six different source files and extracts. The tools or concepts are:
·
True entropy estimations of any file or source by means of
state-of-the-art compression.
·
Extracting entropy from any file or source by means of successive XOR
bitwise manipulator operations.
·
E(x XOR y) bias models from Davies and Hisakado et al.
·
Third party ENT randomness test suite. A quick way of getting an idea of
the randomness of a file.
·
RANTESTS randomness test suite. 7 tests programmed accordingly to Knuth.
·
RABENZIX randomness test suite. 6 tests based on the correlation of
unorthodox 1 bits Significant, 8 bits Exponent and 8 bits Mantissa (1x8x8) Reals
with the laws of Zipf and Newcomb-Benford. These tests are developed by myself
and are experimental (BETA).
·
Third party DIEHARD randomness test suite. Calculates up to 269 p-values
for 16 different tests.
·
Third party NIST randomness test suite. Calculates around 195 p-values
and 195 proportions for 16 different tests.
1. Introduction
The name Randomics for the field
of studying patterns and randomness of data streams comes from Martin Winer.
Many thanks for that good idea! And originated from our collaboration on the
randomness of the prime number distribution.1 The present paper is a
supporting paper for our work also published on SATOCONOR.COM as ‘Randomics: the
Study of Patterns and Randomness. Are the prime numbers randomly distributed?
Part
1.1.
True entropy estimations
The basic idea behind this is
that the best multi-purpose compressors available2 produce as a
matter a fact archive files that pass al known randomness tests. This can only
mean that they achieve archive files with (or very near) maximal information at
(or very near) the
1.2.
Extracting entropy with XOR
Some people call the removing of
bias with a single XOR operation step distillation of
entropy. I would like to call it extraction of
entropy. Because I studied Chemistry and because the people who named it
distillation all know that both are separating processes. It is just a matter
of semantics and I like the word extraction more than distillation.
After a first XOR step and the
file is still not random you can continue the number of steps until the stream
fulfills the bias requirements and the passing of standard randomness tests.
The number of XOR steps (n) required gives a rough estimation of the percentage
of
1.3.
E(x XOR y) bias models from Davies and Hisakado et al.
This model is simple4
and can even be more simplified in a program. I mean by that you do not need to
read the data from the source file twice. (Once for calculating E(x) and E(y)
and once for calculating 1.1. The algorithm for calculating 1.1. is: You
subtract E(x) and (E(y) from each bit x and y respectively, multiply the
result, add cumulatively and then divide at the end by the number of bit
pairs.) I just combined the bias formula of Davies (1.4) with the covariance formula
from Hisakado et al. (1.2).5 For all of my XOR extractions both
models 1.4 gave the same results. This comes down to:
Davies Covariance(x,y) =
E[{x-E(x)}{y-E(y)}] (1.1.)
Hisakado et al. Covariance(x,y) = E(xy) –
E(x)E(y) (1.2.)
Davies and Hisakado et al. Correlation(x,y) =
Covariance(x,y)/SQRT(E(x)(1-E(x)E(y)(1-E(y))) (1.3.)
Davies Bias = E(x XOR y) =
0.5-2(E(x)-0.5)(E(y)-0.5)-2Covariance(x,y) (1.4.)
E(x) is for instance all bits
summed and divided by the number of bits, one can call this number also bias or
the Expectation of x = E(x). E(xy) means multiplying the bits and add the
results cumulatively divided by the numbers of bit pairs. When the bits are
adjacent bits from the same stream than this actually called auto-covariance
and auto-correlation.
2. Materials and Methods
·
True entropy estimations of
any file or source by means of state-of-the-art compression were done with
PASQDA.exe option –6e.6 In the case of the 638 Mb F2RAW.iso file compression
this way would take a very long time. So the mean of the faster Windows XP
folder zip and GZIP.exe7 results where regarded as a reasonable
estimation. The produced archive files must pass standard randomness tests,
then the compression ratio can be called true entropy estimation.
·
Extracting entropy from any
file or source by means of successive XOR bitwise manipulator operations. XOR
or eXclusive-OR for bits is: 0 XOR 0 = 0, 1 XOR 0 = 1, 0 XOR 1 = 1 and 1 XOR 1
= 0. (x XOR y) where x and y are adjacent bits. This is also called binary
addition modulo 2.
·
E(x XOR y) bias models from
Davies and Hisakado et al.
·
ENT randomness test suite.8
Is indeed a quick executable but has as problem that for most tests in the
suite there are no hard criteria given for randomness or non-randomness. Cannot
handle files in bit mode (ENT –b) of >= 638 Mb.
·
RANTESTS randomness tests
suite.9 All hard criteria. Can handle files of all sizes.
·
RABENZIX randomness tests
suite based on the laws of Zipf and Newcomb-Benford.10 Hard and
experimental criteria. Recommended maximal file size 125 Mb.
·
DIEHARD randomness tests
suite.11 All hard criteria. Can handle files of all sizes.
·
NIST randomness tests
suite.12 All very hard criteria. This is the best test suite known
to me. Can handle files of all sizes but not for all tests.
The following source files were
used
·
F2RAW.ISO = FC3-i386-disc2.iso = Linux Fedora
installation file (on Disc 2)13
·
ERAWNN3.dat = all primes (1 bit) all composites (0
bit)
·
INBOX.dbx = My Outlook Express 6 inbox email archive
·
CR8F8RW.dat = Collatz nodes with an 800 digits seed
and 50 relative levels upwards the tree.17
·
11MBHXCH.dat = Hexadecimal digits of PI, 2 digits
stored in a byte. This is an already random file and used as a reference to see
what happens if you do the true entropy estimation by compression and if
successive XOR operations will degenerate an already unbiased and random file.
To my opinion the hexadecimal digits of PI are the best source of (pseudo)
randomness. This file was made with: C:\>APTEST 11534336 x 16 > 11MBHXCH.TXT (x = 0: Chudnovsky bin
split). Must be converted from ASCII to binary.14
·
RAWFILE1.dat = Odd primes (1) and odd composites (0)
3. Results
3.1.
(True) entropy of the starting material
|
Sample |
File size (Kb) |
First order entropy (bits per byte) |
First order entropy (bits per bit) |
True entropy estimation (bits per bit) |
|
F2RAW.iso |
652,852 |
7.992170 |
0.999854 |
0.98* |
|
ERAWNN3.dat |
48,640 |
1.950989 |
0.300468 |
0.21 |
|
INBOX.dbx |
31,013 |
5.997796 |
0.987479 |
0.25 |
|
CR8F8RW.dat |
20,479 |
7.985174 |
0.999683 |
0.0074 |
|
11MBHXCH.dat |
11,264 |
7.999986 |
1.000000 |
1.00 |
|
RAWFILE1.dat |
4,096 |
4.135185 |
0.523445 |
0.45 |
Table 1: The different kinds of entropy
of the starting material files.
* File too large for PASQDA -6e. Measurement done with XP Folder Zip
wizard and GZIP, mean value taken (Source file size 668,520,488 bytes.
Compressed file XP 652,801,942 bytes and GZIP 653,317,253 bytes). I believe
that this .iso file is a collection of already (partly) compressed archives of
some kind and that is why the true entropy estimation is so high.
|
Archive
(compressed file) from |
KS
value |
number
p=0 or 1 out of total number of p’s |
Conclusion |
|
F2RAW.iso* |
0.000000 |
103 out of 229 |
Not passed |
|
ERAWNN3.dat |
0.972926 |
0 out of 229 |
Passed |
|
INBOX.dbx |
0.886757 |
0 out of 145 |
Passed |
|
CR8F8RW.dat** |
---------- |
---------------- |
File to small, passes ENT and
RANTESTS |
|
11MBHXCH.dat |
0.382593 |
0 out of 229 |
Passed |
|
RAWFILE1.dat |
0.866783 |
0 out of 79 |
Passed |
Table
2: The results of the DIEHARD test on the
starting materials to verify if the true entropy estimations from Table 1 are
reasonable. To be reasonable the archive files must pass this test suite or other
standard randomness tests if the file is too small.
* This file was too large for
PASQDA –6e, compression would take forever, even with –1e option. Instead the
faster but worse compression from GZIP.exe and Windows XP folder zip was used
to get a rough indication of the true entropy.
** DIEHARD is not possible with
this too small file. But it passes all the tests of ENT and RANTESTS. So the
true entropy estimation from Table 1 is reasonable.
3.2.
Extracting entropy with XOR from the starting material:
|
Sample |
True
entropy estimation (bits per bit) |
XOR
extracts that are random based on ENT |
|
F2RAW.iso |
0.98 |
14-18 |
|
ERAWNN3.dat |
0.21 |
6-14 |
|
INBOX.dbx |
0.25 |
12-14 |
|
CR8F8RW.dat |
0.0074 |
10-14 |
|
11MBHXCH.dat |
1.00 |
0-12 |
|
RAWFILE1.dat |
0.45 |
5-12 |
Table
3: The (last) random XOR extracts from the
starting material files. The successive XOR extractions were continued until
the extract file size was around 10–100 Kb. Continuing even further would not
be wise because the files would become to small for applying and significant
outcomes of randomness tests. The random extracts are always these last
extracts! Never was encountered that a random file degenerated because of XOR
operations. This phenomenon is illustrated by the fact that an already perfect
random file like 11MBHXCH.dat gave 12 perfectly random successive XOR extract
files. Also shown the true entropy estimation, because there is, counter
intuitively, no correlation between the data.
3.3.
E(x XOR y) bias models from Davies and Hisakado et al.:
|
F2RAW.iso |
||||
|
XOR step |
Chi-Bits ENT p5%=0.00 and
p95%=3.84 |
Bias observed |
Bias models |
Observed – model |
|
1 |
|
0.4924555071 |
0.4924555071 |
0 |
|
2 |
|
0.4944358299 |
0.4944358299 |
0 |
|
3 |
|
0.4962850037 |
0.4962850037 |
0 |
|
4 |
|
0.4966832518 |
0.4966832518 |
0 |
|
5 |
|
0.4973980571 |
0.4973980571 |
0 |
|
6 |
|
0.4966144102 |
0.4966144102 |
0 |
|
7 |
|
0.4964132137 |
0.4964132137 |
0 |
|
8 |
|
0.4967385889 |
0.4967385889 |
0 |
|
9 |
|
0.4969523146 |
0.4969523146 |
0 |
|
10 |
156.81 |
0.4972602902 |
0.4972602902 |
0 |
|
11 |
104.43 |
0.4968381042 |
0.4968381042 |
0 |
|
12 |
17.16 |
0.4981902372 |
0.4981871848 |
0.0000030524 |
|
13 |
7.03 |
0.4983594956 |
0.4983594956 |
0 |
|
14 |
0.94 |
0.4991636439 |
0.4991514104 |
0.0000122335 |
|
15 |
0.45 |
0.4991973039 |
0.4991728347 |
0.0000244692 |
|
16 |
0.42 |
0.5011397059 |
0.5011397059 |
0 |
|
17 |
0.74 |
0.4978676471 |
0.4978676471 |
0 |
|
18 |
0.40 |
0.4977941176 |
0.4977941176 |
0 |
Table
4: RED: Random zone. A representative compare of the Bias = E(x XOR y) observed versus the
value from the Davies and Hisakado et al. models. All other starting material
files show more or less this picture: Occasionally there is a deviation of up
to only 4 significant digits, for the rest the value from the Davies and
Hisakado et al. models are exact. This is not always in and around the random
zone as is the case here. For example: In the case of ERAWNN3.dat there is even
never a deviation between observed and the model.

Graph 1: The number of XOR steps required to make a file random as function of
the correlation(x,y) (1.3.).
3.4.
Randomness tests on the XOR extracts:
I did the ENT randomness test
suite for fast screening of XOR extract files. When they were random according
to this test the RANTESTS test suite was also applied. The according to ENT
random files also past the RANTESTS with of course occasionally a Chi-square
value outside the 5%-95% probability. And that is a must for true random files,
1 out of 20 measurements must fall > 95%! Since prime number randomness is
the focus of my research interest and in need of supporting by this article, I
give all collected randomness tests data for the sixth XOR extract of
ERAWNN3.dat. For this file also RABENZIX, DIEHARD and NIST test suites were
used.
Entropy = 1.000000 bits per bit.
Optimum compression would reduce the size
of this 6225920 bit file by 0 percent.
Chi square distribution for 6225920 samples
is 1.75, and randomly
would exceed this value 25.00 percent of the
times.
Arithmetic mean value of data bits is 0.4997
(0.5 = random).
Serial correlation coefficient is 0.000149
(totally uncorrelated = 0.0).
Entropy = 7.999754 bits per byte.
Optimum compression would reduce the size
of this 778240 byte file by 0 percent.
Chi square distribution for 778240 samples is
265.69, and randomly
would exceed this value 50.00 percent of the
times.
Arithmetic mean value of data bytes is
127.4756 (127.5 = random).
Serial correlation coefficient is -0.000062
(totally uncorrelated = 0.0).
Log
1: The ENT test results for the sixth XOR
extract of ERAWNN3.dat.
------------------------------
Size of file ERWN3E6.DAT = 778240 byte
------------------------------
Entropy =
9.99999797340934E-0001 bits per bit
------------------------------
CHI-Bits =
1.74913908306007E+0000
CHI 1 degree of freedom 5% = 0.00
CHI 1 degree of freedom 95% = 3.84
------------------------------
CHI-Hexadecimal= 1.09181949013145E+0001
CHI 15 degrees of freedom 5% = 7.261
CHI 15 degrees of freedom 95% = 25.00
------------------------------
KS-analysis
KnPlusMax =
7.48643578786869E-0001
KnMinusMax =
7.47842033199959E-0001
Kn/Probability Distribution at 1% = 7.07554703919868E-0002
Kn/Probability Distribution at 5% = 1.60012757680079E-0001
Kn/Probability Distribution at 25% = 3.79130859764700E-0001
Kn/Probability Distribution at 50% = 5.88572062802086E-0001
Kn/Probability Distribution at 75% = 8.32421662701563E-0001
Kn/Probability Distribution at 95% = 1.22374046688492E+0000
Kn/Probability Distribution at 99% = 1.51729418092873E+0000
------------------------------
CHI-Serial =
2.65688815789297E+0002
CHI 255 degrees of freedom 5% = 219.0
CHI 255 degrees of freedom 95% = 293.3
----------------------------
CHI-Differential = 2.32465554022929E+0001
CHI 30 degrees of freedom 5% = 18.49
CHI 30 degrees of freedom 95% = 43.77
------------------------------
CHI-Gap =
4.96455858484260E+0001
CHI 50 degrees of freedom 5% = 34.8
CHI 50 degrees of freedom 95% = 67.5
------------------------------
Log
2: The RANTESTS test results for the sixth XOR extract
of ERAWNN3.dat.
FinalAnalysisReport.txt:
----------------------------
RABENZIX
VERSION 3.0 BETA SOFTWARE COPYRIGHT (c) 2004, 2005, 2006
ALL
RIGHTS RESERVED
JOHAN
GERARD VAN DER GALIEN johan.van.der.galien@satoconor.com
----------------------------
Newcomb-Benford
and Zipf randomness tests with 8 bits
exponent
and 8 bits mantissa reals first two digits for ERWN3E6.DAT
-------------8x8
REALS---------------
Total
amount 16 bits blocks (8x8 Reals) read = 389118
Total
amount of Real numbers between 1.0E-38 and
1.0E+38 =
383769
Total
amount of Reals tested (no zeroes) = 383769
----------------------------
Number of
samples = 6
Size of
one sample (bytes) = 129706
----------------------------
----------------------------
Test
NEWCOMB-BENFORD (8x8 Real SAMPLED) Fd=log10(1+1/d)
First two
digits 10 up to 99. So 89 degrees of freedom.
Approximate
proportion 95% p's > 0.05 for 6 samples >= 0.6831
Proportion 95%
observed = 0.8333
Approximate
proportion 99% p's > 0.01 for 6 samples >= 0.8681
Proportion 99%
observed = 0.8333
----------------------------
----------------------------
I REMIND
YOU THAT THE
CONFIDENCE
INTERVALS (CI) OF R TO PASS THE TEST!
Test ZIPF
(8x8 Real UNSAMPLED) Fd=(10^b)*d^a --> log10(Fd)=a*log10(d)+b
Slope observed (a) = -9.8399E-01
CRITERION
a total number space = -9.8384E-01
Intersection observed (b) = -3.9446E-01
CRITERION
b total number space = -3.9461E-01
Correlation
observed (R) = -9.9944E-01
CI 95% -9.9963E-01 <= R <=
-9.9915E-01
CI 99% -9.9968E-01 <= R <=
-9.9903E-01
p-value
difference = 0.0000
CRITERION 95% p >= 0.0500
CRITERION 99% p >= 0.0100
----------------------------
Benford\Stats.txt:
Thu Nov
23 13:26:03 2006
Chi-sq
sample 1 from ERWN3E6.DAT = 84.29
p-value =
0.6215
Chi-sq
sample 2 from ERWN3E6.DAT = 93.59
p-value =
0.3491
Chi-sq
sample 3 from ERWN3E6.DAT = 125.03
p-value =
0.0071
Chi-sq
sample 4 from ERWN3E6.DAT = 75.65
p-value =
0.8425
Chi-sq
sample 5 from ERWN3E6.DAT = 104.21
p-value =
0.1292
Chi-sq
sample 6 from ERWN3E6.DAT = 97.07
p-value =
0.2619
Log
3: The RABENZIX v3.0 BETA test results of the
sixth XOR extract of ERAWNN3.dat.
All p-values:
0.4093,0.6137,0.0282,0.8625,0.7261,0.7405,0.5319,0.5984,0.0401,0.2792,
0.7502,
Overall p-value after applying KStest on 11
p-values = 0.677238
Log
6: The DIEHARD test results of the sixth XOR
extract of ERAWNN3.dat. This is actually only the Minimum Distance test of the
suite. The file is too small for all the other 16 tests.
------------------------------------------------------------------------------
RESULTS FOR THE UNIFORMITY OF P-VALUES AND THE PROPORTION OF PASSING
SEQUENCES
------------------------------------------------------------------------------
------------------------------------------------------------------------------
C1 C2
C3
C4 C5 C6 C7
C8
C9 C10 P-VALUE
PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
10 7
15
13 11 9 9
13
6 6
0.307407
1.0000 frequency
5 12
13
10 5 4 14
6
9 21
0.000648
1.0000 block-frequency M=100
7 9
11
17 10 8 9
9
9 10 0.500934
1.0000 cumulative-sums
6 18
13
9 11 7 11
7
9 8
0.134686
1.0000 cumulative-sums
12 8
12
8 10 9 14
10
7 9
0.772760
0.9899 runs
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
The minimum pass rate for each statistical test with the exception of
the random
excursion (variant) test is approximately = 0.960000 for a sample size =
99
binary sequences.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
------------------------------------------------------------------------------
C1 C2
C3
C4 C5 C6 C7
C8
C9 C10 P-VALUE
PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
7 9 16 13
12
11 8
13
6 4
0.097224
0.9697 longest-run
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
The minimum pass rate for each statistical test with the exception of
the random
excursion (variant) test is approximately = 0.960000 for a sample size =
99
binary sequences.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
------------------------------------------------------------------------------
C1 C2
C3
C4 C5 C6 C7 C8 C9 C10
P-VALUE
PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
0 2
2
4 2 2 3
0
3 2
0.637119
1.0000 rank
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
The minimum pass rate for each statistical test with the exception of
the random
excursion (variant) test is approximately = 0.923254 for a sample size =
20
binary sequences.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
------------------------------------------------------------------------------
C1 C2
C3
C4 C5 C6 C7
C8
C9 C10 P-VALUE
PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
13 11
9
13 8 10 9
10
10 6
0.793973
0.9697 fft
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
The minimum pass rate for each statistical test with the exception of
the random
excursion (variant) test is approximately = 0.960000 for a sample size =
99
binary sequences.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
sample size = 1
NONPERIODIC TEMPLATES TEST
--------------------------------------------------------------------------------
COMPUTATIONAL INFORMATION
--------------------------------------------------------------------------------
LAMBDA = 759.991211 M = 778240
N
= 8 m = 10
n
= 6225920
--------------------------------------------------------------------------------
F R E Q U E N C Y
Template W_1 W_2
W_3
W_4 W_5
W_6
W_7 W_8
Chi^2 P_value Assignment Index
--------------------------------------------------------------------------------
0000000001 767 772 800
784
747 743
729
775 5.380962 0.716190 SUCCESS 0
0000000011 756 786 782
761
739 706
768
770 6.298163 0.613872 SUCCESS 1
0000000101 719 793 798
812
733 734
794
741 13.193182 0.105373 SUCCESS 2
0000000111 773 777 719
762
760 756
725
758 4.540996 0.805318 SUCCESS 3
0000001001 789 779 770
787
713 772
725
737 8.228815 0.411444 SUCCESS 4
0000001011 770 778 746
794
761 763
804
781 5.583947 0.693723 SUCCESS 5
0000001101 748 791 767
746
728 715
780
755 6.466191 0.595160 SUCCESS 6
0000001111 778 784 755
761
764 778
748
744 2.234303 0.972973 SUCCESS 7
0000010001 756 798 803
734
802 698
804
749 15.620257 0.048149 SUCCESS 8
0000010011 831 790 771
779
756 784
760
766 9.456730 0.305242 SUCCESS 9
0000010101 747 742 786
773
730 730
742
764 4.661279 0.793089 SUCCESS 10
0000010111 732 750 730
764
779 744
772
767 3.498147 0.899333 SUCCESS 11
0000011001 730 815 776
731
791 746
728
786 10.563643 0.227670 SUCCESS 12
0000011011 778 737 790
773
771 688
806
761 12.527741 0.129165 SUCCESS 13
0000011101 752 743 750
737
793 717
753
776 5.662920 0.684931 SUCCESS 14
0000011111 802 749 733
784
769 752
722
722 8.341852 0.400809 SUCCESS 15
0000100011 772 795 786
740
774 733
790
744 6.069124 0.639489 SUCCESS 16
0000100101 819 817 758
743
774 788
703
720 17.231218 0.027789 SUCCESS 17
0000100111 757 777 760
801
714 779
739
798 8.502219 0.386009 SUCCESS 18
0000101001 750 784 788
757
777 765
731
742 3.952611 0.861373 SUCCESS 19
0000101011 757 739 794
818
759 742
747
760 7.326243 0.501877 SUCCESS 20
0000101101 727 769 762
815
758 750
778
712 9.291939 0.318270 SUCCESS 21
0000101111 730 752 774
795 757 741
770
744 4.170409 0.841430 SUCCESS 22
0000110001 710 757 762
791
749 771
745
802 7.648660 0.468520 SUCCESS 23
0000110011 715 817 752
737
801 776
737
780 11.708854 0.164674 SUCCESS 24
0000110101 728 775 713
731
778 746 753
760
6.524096 0.588736
SUCCESS 25
0000110111 772 705 776
777
794 702
788
770 12.224279 0.141474 SUCCESS 26
0000111001 730 808 776
773
799 776
780
768 7.872769 0.445996 SUCCESS 27
0000111011 757 770 753
720
791 732
736
760 5.467136 0.706678 SUCCESS 28
0000111101 740 729 759
743
762 789
733
785 5.160619 0.740279 SUCCESS 29
0000111111 796 751 735
754
783 747
726
749 5.379171 0.716387 SUCCESS 30
0001000011 777 781 792
739
786 802
791
776 7.849438 0.448314 SUCCESS 31
0001000101 724 786 773
747
723 715
781
748 8.429488 0.392679 SUCCESS 32
0001000111 755 831 803
751
781 769
803
707 16.326767 0.037935 SUCCESS 33
0001001001 795 711 728
775
782 803
788
761 10.717422 0.218230 SUCCESS 34
0001001011 804 820 758
743
770 781
750
715 11.390388 0.180545 SUCCESS 35
0001001101 787 754 750
741
758 756
813
715 8.151103 0.418851 SUCCESS 36
0001001111 752 776 769
807
745 768
748
728 5.452792 0.708264 SUCCESS 37
0001010011 814 777 779
761
731 781
779
782 7.636670 0.469741 SUCCESS 38
0001010101 743 756 761
771
736 722
738
739 4.518037 0.807626 SUCCESS 39
0001010111 730 746 760
796
745 734
724
722 8.085335 0.425179 SUCCESS 40
0001011001 758 777 813
774
762 790
786
774 6.806165 0.557683 SUCCESS 41
0001011011 729 756 780
806
779 737
762
719 8.134874 0.420407 SUCCESS 42
0001011101 759 780 713
729
731 758
745
796 7.957903 0.437593 SUCCESS 43
0001011111 738 740 759
788
733 727
776
760 5.016720 0.755788 SUCCESS 44
0001100101 784 751 762
750
790 767
751
823 7.724541 0.460830 SUCCESS 45
0001100111 741 792 762
719
786 789
749
759 6.313593 0.612150 SUCCESS 46
0001101001 733 788 791
752
785 785
754
729 6.415955 0.600744 SUCCESS 47
0001101011 696 762 750
750
720 777
779
749 8.941292 0.347273 SUCCESS 48
0001101101 796 765 769
749
788 773
763
757 3.345505 0.910844 SUCCESS 49
0001101111 780 693 775
754 756 748
775
752 7.505277 0.483222 SUCCESS 50
0001110011 751 811 738
757
746 817
808
752 12.052255 0.148888 SUCCESS 51
0001110101 734 783 765
730
739 733
777
732 5.860679 0.662834 SUCCESS 52
0001110111 748 739 768
756
760 770 793
827
8.505748 0.385687
SUCCESS 53
0001111001 772 765 755
752
756 815
770
769 4.667259 0.792475 SUCCESS 54
0001111011 750 700 778
710
743 762
776
815 13.536953 0.094662 SUCCESS 55
0001111101 785 772 775
767
789 757
782
778 3.624083 0.889349 SUCCESS 56
0001111111 757 771 727
763
784 721
762
725 6.103740 0.635613 SUCCESS 57
0010000011 748 746 738
736
802 721
764
817 10.658100 0.221834 SUCCESS 58
0010000101 751 780 756
773
767 736
739
734 3.227285 0.919295 SUCCESS 59
0010000111 756 789 764
755
775 787
765
801 4.772727 0.781568 SUCCESS 60
0010001011 746 705 763
778
728 769
788
764 7.317772 0.502767 SUCCESS 61
0010001101 795 754 785
739
803 765
770
752 5.853960 0.663587 SUCCESS 62
0010001111 725 793 803
771
748 748
783
699 11.827113 0.159086 SUCCESS 63
0010010011 787 721 741
753
772 798
773
764 5.943853 0.653521 SUCCESS 64
0010010101 848 770 717
716
775 795
754
732 18.634541 0.016941 SUCCESS 65
0010010111 787 781 811
768
755 756
708
799 10.862732 0.209599 SUCCESS 66
0010011011 786 742 740
728
753 733
828
741 10.975343 0.203100 SUCCESS 67
0010011101 801 765 725
767
724 756
724
844 16.951983 0.030613 SUCCESS 68
0010011111 760 796 761
818
771 795
758
720 10.206136 0.250855 SUCCESS 69
0010100011 724 761 754
760
700 785
784
725 9.863905 0.274708 SUCCESS 70
0010100111 807 812 770
769
703 743
777
738 12.609875 0.125996 SUCCESS 71
0010101011 790 760 738
744
763 739
750
755 2.968608 0.936312 SUCCESS 72
0010101101 762 755 789
754
702 731
743
777 7.625466 0.470883 SUCCESS 73
0010101111 699 735 769
751
785 792
749
783 9.125508 0.331819 SUCCESS 74
0010110011 763 792 787
790
769 753
772
740 4.474486 0.811980 SUCCESS 75
0010110101 753 713 782
779
790 768
759
749 5.616353 0.690118 SUCCESS 76
0010110111 751 725 744
743
791 735
734
733 6.488579 0.592675 SUCCESS 77
0010111011 797 775 730
740
727 752
744
783 6.477340 0.593922 SUCCESS 78
0010111101 747 752 763
790
739 721
782
740 5.345489 0.720092 SUCCESS 79
0010111111 735 735 761
786
747 785
814
760 7.558343 0.477755 SUCCESS 80
0011000101 729 736 792
788 765 757
758
773 4.762467 0.782636 SUCCESS 81
0011000111 696 733 800
765
751 716
767
788 12.466678 0.131565 SUCCESS 82
0011001011 795 748 790
759
777 755
763
755 3.511323 0.898309 SUCCESS 83
0011001101 761 784 755
747
748 722 728
758
4.539019 0.805517
SUCCESS 84
0011001111 746 766 755
714
756 771
739
762 3.959993 0.860715 SUCCESS 85
0011010101 745 730 726
738
749 750
766
742 4.482708 0.811161 SUCCESS 86
0011010111 715 779 775
771
757 809
802
759 9.262025 0.320676 SUCCESS 87
0011011011 747 793 774
755
786 729
820
742 9.439838 0.306560 SUCCESS 88
0011011101 746 753 774
792
773 754
723
787 5.052145 0.751989 SUCCESS 89
0011011111 790 726 788
781
710 719
774
745 10.567356 0.227438 SUCCESS 90
0011100101 739 791 771
809
781 767
773
773 6.373764 0.605441 SUCCESS 91
0011101011 791 747 774
730
722 749
754
786 6.036379 0.643157 SUCCESS 92
0011101101 813 797 746
743
729 737
702
732 13.808283 0.086901 SUCCESS 93
0011101111 737 748 785
755
761 727
780
765 3.804290 0.874335 SUCCESS 94
0011110101 734 719 800
718
767 761
713
776 11.039550 0.199468 SUCCESS 95
0011110111 767 733 706
783
813 743
741
810 13.651030 0.091327 SUCCESS 96
0011111011 722 774 783
761
775 806
740
757 6.596969 0.580673 SUCCESS 97
0011111101 771 750 811
759
771 803
734
721 9.372193 0.311877 SUCCESS 98
0011111111 775 755 734
755
778 719
752
747 4.273850 0.831609 SUCCESS 99
0100000011 763 752 755
753
767 780
749
748 1.154007 0.997076 SUCCESS 100
0100000111 775 755 717
723
780 711
747
772 8.821902 0.357542 SUCCESS 101
0100001011 736 766 738
767
751 757
773
738 2.529904 0.960322 SUCCESS 102
0100001111 783 772 775
722
788 829
761
791 11.866869 0.157243 SUCCESS 103
0100010011 762 749 723
791
793 730
743
765 6.378320 0.604934 SUCCESS 104
0100010111 722 719 706
793
749 725
763
789 12.500357 0.130236 SUCCESS 105
0100011011 799 739 789
784
781 754
779
726 7.205068 0.514678 SUCCESS 106
0100011111 740 796 764
783
762 751
780
708 7.279907 0.506756 SUCCESS 107
0100100011 766 763 764
778
780 773
781
735 2.709558 0.951242 SUCCESS 108
0100100111 760 721 744
780
760 823
711
766 11.506576 0.174614 SUCCESS 109
0100101011 754 753 724
720
798 788
772
768 7.262313 0.508614 SUCCESS 110
0100101111 756 793 770
768
741 756
776
761 2.552253 0.959250 SUCCESS 111
0100110011 801 714 782
717
737 744
768
741 9.838692 0.276531 SUCCESS 112
0100110111 755 717 744
753
764 770
767
788 4.193083 0.839296 SUCCESS 113
0100111011 820 810 717
785
725 692
752
776 19.765356 0.011261 SUCCESS 114
0100111111 779 792 775
795
732 786
733
777 7.125052 0.523203 SUCCESS 115
0101000011 716 745 746
715
809 748
761
772 9.479582 0.303466 SUCCESS 116
0101000111 802 764 683
761
714 813
759
784 17.712868 0.023485 SUCCESS 117
0101001011 783 779 748
741
754 759
794
767 3.536395 0.896347 SUCCESS 118
0101001111 775 795 742
729
739 759
703
753 8.678804 0.370112 SUCCESS 119
0101010011 747 760 793
755
743 797
736
791 6.004489 0.646729 SUCCESS 120
0101010111 772 788 751
729
789 722
761
761 5.707047 0.680009 SUCCESS 121
0101011011 797 783 748
726
768 751
732
766 5.580978 0.694053 SUCCESS 122
0101011111 723 802 751
717
778 755
755
792 8.661873 0.371618 SUCCESS 123
0101100011 772 802 768
790
796 783
812
797 11.763290 0.162082 SUCCESS 124
0101100111 751 741 801
772
728 735
774
767 5.578249 0.694356 SUCCESS 125
0101101111 764 771 759
707
751 755
715
725 8.447079 0.391059 SUCCESS 126
0101110011 789 793 787
814
750 759
792
774 9.249422 0.321694 SUCCESS 127
0101110111 772 750 744
739
706 742
770
786 6.644004 0.575483 SUCCESS 128
0101111011 718 758 759
792
772 714
782
759 7.423991 0.491652 SUCCESS 129
0101111111 750 760 775
729
728 782
819
764 8.434761 0.392193 SUCCESS 130
0110000111 750 746 749
767
788 747
773
798 4.065793 0.851139 SUCCESS 131
0110001111 748 744 826
740
783 750
769
780 8.402022 0.395216 SUCCESS 132
0110010111 757 768 758
740
755 787
772
761 1.845180 0.985396 SUCCESS 133
0110011111 759 779 789
718
734 784
761
773 5.884581 0.660159 SUCCESS 134
0110100111 754 739 762
735
734 797
760
754 4.271640 0.831821 SUCCESS 135
0110101111 736 785 769
800
789 803
777
731 8.987731 0.343332 SUCCESS 136
0110110111 732 765 756
773 755 734
807
723 7.068503 0.529260 SUCCESS 137
0110111111 765 739 752
762
709 758
782
751 4.964363 0.761378 SUCCESS 138
0111001111 763 754 774
771
814 752
787
760 5.460087 0.707458 SUCCESS 139
0111011111 746 725 790
719
767 723 752
797
9.186141 0.326838 SUCCESS
140
0111111111 751 763 736
737
723 753
738
738 4.797600 0.778974 SUCCESS 141
1000000000 767 772 800
784
747 743
729
775 5.380962 0.716190 SUCCESS 142
1000100000 715 795 790
796
783 753
782
793 10.188026 0.252076 SUCCESS 143
1000110000 732 741 725
752
761 753
786
745 4.536149 0.805806 SUCCESS 144
1001000000 747 837 787
769
715 710
758
764 15.354803 0.052603 SUCCESS 145
1001001000 773 807 757
770
754 767
779
719 6.186914 0.626304 SUCCESS 146
1001010000 766 787 753
755
744 746
748
732 2.973782 0.935992 SUCCESS 147
sample size = 1
OVERLAPPING TEMPLATE OF ALL ONES TEST
-----------------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------------
(a) n
(sequence_length) = 6225920
(b) m (block length of
1s) = 10
(c) M (length of
substring) = 1032
(d) N (number of
substrings) = 6032
(e) lambda
[(M-m+1)/2^m] = 0.999023
(f) eta
= 0.499512
-----------------------------------------------
F R E Q U E N C Y
0 1
2
3 4 >=5
Chi^2 P-value Assignment
-----------------------------------------------
3678 894 602 310 236
312 9.510428 0.090357 SUCCESS
------------------------------------------------------------------------------
C1 C2
C3
C4 C5 C6 C7
C8
C9 C10 P-VALUE
PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
2 0 3 3
3
0 2
0
1 2
0.035174
1.0000 universal
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
The minimum pass rate for each statistical test with the exception of
the random
excursion (variant) test is approximately = 0.915376 for a sample size =
16
binary sequences.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
------------------------------------------------------------------------------
C1 C2
C3
C4 C5 C6 C7
C8
C9 C10 P-VALUE
PROPORTION STATISTICAL TEST
------------------------------------------------------------------------------
17 8
10
16 4 8 9
6
11 10
0.045348
0.9596 * apen
M=10
17 8
7
9 13 7 6
12
8 12
0.162606
0.9899 serial M=10
8 12
7
12 13 9 9
9
9 11
0.853234
0.9899 serial
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
The minimum pass rate for each statistical test with the exception of
the random
excursion (variant) test is approximately = 0.960000 for a sample size =
99
binary sequences.
- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - -
- - - -
sample size = 6 for the first three there was an insufficient amount of
cycles.
RANDOM EXCURSIONS TEST
--------------------------------------------
COMPUTATIONAL
INFORMATION:
--------------------------------------------
(a) Number Of Cycles
(J) = 0567
(b) Sequence Length
(n) = 1000000
(c) Rejection
Constraint = 500.000000
-------------------------------------------
SUCCESS x = -4 chi^2
= 2.146930 p_value = 0.828464
SUCCESS x = -3 chi^2
= 6.459386 p_value = 0.264048
SUCCESS x = -2 chi^2
= 2.688767 p_value = 0.747836
SUCCESS x = -1 chi^2
= 1.631393 p_value = 0.897427
SUCCESS x = 1 chi^2 =
2.703704 p_value = 0.745552
SUCCESS x = 2 chi^2 =
2.658110 p_value = 0.752518
SUCCESS x = 3 chi^2 =
8.232694 p_value = 0.143869
SUCCESS x = 4 chi^2 =
7.628949 p_value = 0.177905
RANDOM EXCURSIONS TEST
--------------------------------------------
COMPUTATIONAL
INFORMATION:
--------------------------------------------
(a) Number Of Cycles
(J) = 1369
(b) Sequence Length
(n) = 1000000
(c) Rejection
Constraint = 500.000000
-------------------------------------------
SUCCESS x = -4 chi^2
= 2.269473 p_value = 0.810740
SUCCESS x = -3 chi^2
= 1.283278 p_value = 0.936643
SUCCESS x = -2 chi^2
= 4.029245 p_value = 0.545213
SUCCESS x = -1 chi^2
= 2.551497 p_value = 0.768720
SUCCESS x = 1 chi^2 =
2.346969 p_value = 0.799344
SUCCESS x = 2 chi^2 =
3.751923 p_value = 0.585656
SUCCESS x = 3 chi^2 =
6.638199 p_value = 0.248968
SUCCESS x = 4 chi^2 =
6.917782 p_value = 0.226827
RANDOM EXCURSIONS TEST
--------------------------------------------
COMPUTATIONAL
INFORMATION:
--------------------------------------------
(a) Number Of Cycles
(J) = 0516
(b) Sequence Length
(n) = 1000000
(c) Rejection
Constraint = 500.000000
-------------------------------------------
SUCCESS x = -4 chi^2
= 3.693564 p_value = 0.594322
SUCCESS x = -3 chi^2
= 4.355367 p_value = 0.499465
SUCCESS x = -2 chi^2
= 4.207867 p_value = 0.519893
SUCCESS x = -1 chi^2
= 5.220930 p_value = 0.389517
SUCCESS x = 1 chi^2 =
6.046512 p_value = 0.301719
SUCCESS x = 2 chi^2 =
6.223753 p_value = 0.285052
SUCCESS x = 3 chi^2 =
5.639926 p_value = 0.342846
SUCCESS x = 4 chi^2 =
1.900900 p_value = 0.862680
sample size = 6, for the first three there was an insufficient amount of
cycles.
RANDOM EXCURSIONS
VARIANT TEST
--------------------------------------------
COMPUTATIONAL
INFORMATION:
--------------------------------------------
(a) Number Of Cycles
(J) = 567
(b) Sequence Length
(n) = 1000000
--------------------------------------------
SUCCESS (x = -9) Total
visits = 588; p-value = 0.879780
SUCCESS (x = -8) Total
visits = 593; p-value = 0.841987
SUCCESS (x = -7) Total
visits = 557; p-value = 0.934360
SUCCESS (x = -6) Total
visits = 511; p-value = 0.616089
SUCCESS (x = -5) Total
visits = 498; p-value = 0.494606
SUCCESS (x = -4) Total
visits = 497; p-value = 0.432058
SUCCESS (x = -3) Total
visits = 499; p-value = 0.366493
SUCCESS (x = -2) Total
visits = 513; p-value = 0.354539
SUCCESS (x = -1) Total
visits = 537; p-value = 0.372998
SUCCESS (x = 1) Total visits = 584; p-value = 0.613680
SUCCESS (x = 2) Total visits = 612; p-value = 0.440401
SUCCESS (x = 3) Total visits = 642; p-value = 0.319239
SUCCESS (x = 4) Total visits = 643; p-value = 0.393649
SUCCESS (x = 5) Total visits = 623; p-value = 0.579360
SUCCESS (x = 6) Total visits = 608; p-value = 0.713547
SUCCESS (x = 7) Total visits = 601; p-value = 0.779456
SUCCESS (x = 8) Total visits = 587; p-value = 0.878124
SUCCESS (x = 9) Total visits = 577; p-value = 0.942584
RANDOM EXCURSIONS
VARIANT TEST
--------------------------------------------
COMPUTATIONAL
INFORMATION:
--------------------------------------------
(a) Number Of Cycles
(J) = 1369
(b) Sequence Length
(n) = 1000000
--------------------------------------------
SUCCESS (x = -9) Total
visits = 984; p-value = 0.074340
SUCCESS (x = -8) Total
visits = 1046; p-value = 0.110976
SUCCESS (x = -7) Total
visits = 1043; p-value = 0.083999
SUCCESS (x = -6) Total
visits = 1031; p-value = 0.051461
SUCCESS (x = -5) Total
visits = 1077; p-value = 0.062866
SUCCESS (x = -4) Total
visits = 1185; p-value = 0.183821
SUCCESS (x = -3) Total
visits = 1270; p-value = 0.397484
SUCCESS (x = -2) Total
visits = 1295; p-value = 0.414216
SUCCESS (x = -1) Total
visits = 1336; p-value = 0.528261
SUCCESS (x = 1) Total visits = 1352; p-value = 0.745267
SUCCESS (x = 2) Total visits = 1305; p-value = 0.480089
SUCCESS (x = 3) Total visits = 1319; p-value = 0.669135
SUCCESS (x = 4) Total visits = 1355; p-value = 0.919451
SUCCESS (x = 5) Total visits = 1330; p-value = 0.803792
SUCCESS (x = 6) Total visits = 1342; p-value = 0.876365
SUCCESS (x = 7) Total visits = 1340; p-value = 0.877836
SUCCESS (x = 8) Total visits = 1267; p-value = 0.614744
SUCCESS (x = 9) Total visits = 1276; p-value = 0.666422
RANDOM EXCURSIONS
VARIANT TEST
--------------------------------------------
COMPUTATIONAL
INFORMATION:
--------------------------------------------
(a) Number Of Cycles
(J) = 516
(b) Sequence Length
(n) = 1000000
--------------------------------------------
SUCCESS (x = -9) Total
visits = 603; p-value = 0.511288
SUCCESS (x = -8) Total
visits = 671; p-value = 0.212840
SUCCESS (x = -7) Total
visits = 676; p-value = 0.167167
SUCCESS (x = -6) Total
visits = 651; p-value = 0.205133
SUCCESS (x = -5) Total
visits = 654; p-value = 0.152167
SUCCESS (x = -4) Total
visits = 624; p-value = 0.203844
SUCCESS (x = -3) Total
visits = 560; p-value = 0.540187
SUCCESS (x = -2) Total
visits = 579; p-value = 0.257532
SUCCESS (x = -1) Total
visits = 586; p-value = 0.029331
SUCCESS (x = 1) Total visits = 456; p-value = 0.061801
SUCCESS (x = 2) Total visits = 430; p-value = 0.122200
SUCCESS (x = 3) Total visits = 411; p-value = 0.143818
SUCCESS (x = 4) Total visits = 405; p-value = 0.191562
SUCCESS (x = 5) Total visits = 449; p-value = 0.486926
SUCCESS (x = 6) Total visits = 507; p-value = 0.932682
SUCCESS (x = 7) Total visits = 533; p-value = 0.883314
SUCCESS (x = 8) Total visits = 523; p-value = 0.955133
SUCCESS (x = 9) Total visits = 465; p-value = 0.700208
sample size = 1
-----------------------------------------------------
L I N E A R C O M P L E X I T Y
-----------------------------------------------------
M (substring length) = 500
N (number of substrings) =
12451
-----------------------------------------------------
F R E Q U E N C Y
-----------------------------------------------------
C0 C1
C2
C3 C4
C5
C6 CHI2
P-value
-----------------------------------------------------
Note: 420 bits were
discarded!
122 369 1593 6191 3151 752
273
4.692328 0.583835
sample size = 6
LEMPEL-ZIV COMPRESSION TEST
-----------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------
(a) W (# of words) =
69591
-----------------------------------------
SUCCESS p_value =
0.628152
LEMPEL-ZIV COMPRESSION TEST
-----------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------
(a) W (# of words) =
69579
-----------------------------------------
SUCCESS p_value =
0.141130
LEMPEL-ZIV COMPRESSION TEST
-----------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------
(a) W (# of words) =
69587
-----------------------------------------
SUCCESS p_value =
0.444155
LEMPEL-ZIV COMPRESSION TEST
-----------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------
(a) W (# of words) =
69590
-----------------------------------------
SUCCESS p_value =
0.583209
LEMPEL-ZIV COMPRESSION TEST
-----------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------
(a) W (# of words) =
69577
-----------------------------------------
SUCCESS p_value =
0.095274
LEMPEL-ZIV COMPRESSION TEST
-----------------------------------------
COMPUTATIONAL
INFORMATION:
-----------------------------------------
(a) W (# of words) =
69584
-----------------------------------------
SUCCESS p_value =
0.311714
Log
7: The NIST test results of the sixth XOR
extract of ERAWNN3.dat.
4. Discussion
The true entropy estimates of the
starting material files had a wide range between 0.0074 and 1.00 bits per byte.
(Table 1) Since all, except one, archive files pass standard randomness tests
the numbers in this range are a good estimate. (Table 2) I remind you that a
PRNG really fails big for the DIEHARD tests suite if there are p=1 or
There is no simple correlation
between the true entropy estimates and the number of successive XOR steps when
the files start to pass the ENT randomness test suite. (Table 3) For me this is
countering intuition. XOR extracts entropy and should be dependent on the true
entropy of the source. Nevertheless the bias development of the extract files
follows more or less exactly the Davies and Hisakado et al. models. (Table 4)
You can even calculate the expected Chi-Bits from the predicted bias of the
models for the next step:
·
F2RAW.iso = 652,852 Kb *
1024 = 668,520,448 byte
·
14th successive XOR extract
= 2-14*668,520,448 = 40,803.25 byte * 8 = 326,426 bits
·
predicted E(x XOR y) bias
by the models = fraction 1 bits (p1) = 0.4991514104
·
Number of 1 bits = p1 *
326,426 = 162,936 bits
·
Number of 0 bits = (1 - p1)
* 326,426 = 163,490 bits
·
Expected number of 1 or 0
bits = 326,426 / 2 = 163,213
·
Chi-square on the bits =
[(162,936 – 163,213)2 / 163,213] + [(163,490 – 163,213)2 / 163,213] = 0.94
Exactly the value found by ENT
from Table 4. Additional evidences how good the Davies and Hisakado et al.
models are. My intention was to discover a formula from which one can calculate
the number of successive XOR steps needed to make a file random. But randomness
has of course many aspects and with the models one can only calculate the bias,
which can as I demonstrated be recalculated as Chi-Bits but that is only one,
most simple and yet a fundamental aspect of randomness. With E(x XOR y) you get
the fraction of the one bits but not the E(x) and E(y) of the adjacent bits.
You got to measure them by a computer program. So you can only calculate, after
the measurement, Chi-Bits for the next step and not for a series of successive
steps.
From Graph 1 is clear that a file
with negative correlation requires less XOR steps than one with the
corresponding positive correlation. This can be understood as follows: When the
(x,y) bits are completely identical the correlation is +1. When they are
statistically independent the correlation is 0. When they are completely
different the correlation is –1. It is just a matter of positive and negative
correlation. When the adjacent bits are more different there is more
information already present and the amount of entropy extraction per XOR step
is higher than when the bits are more alike.
The randomness tests of the sixth
XOR extract of ERAWNN3.dat all looks very well. I have only a few comments:
·
As I said in 2. Materials
and Methods for some of the ENT tests there are no hard criteria. Judging these
comes from common sense and experience. Also for instance the
·
The RABENZIX is, as I said
earlier, an experimental test suite in the BETA phase. I fully belief that the
Chi-based Newcomb-Benford correlation tests are sound and the hard statistical
criteria can be correctly applied. The number of samples (six) in Log 3 is to
small to draw conclusions. Actually the number of samples for the full test
should be >= 50, then RABENZIX also calculates the P-of-the-p’s for
Newcomb-Benford. Now the proportions are unreliable and consequently only the
95% criterion is passed. For accurate measurements there should be a reasonable
change that at least one sample is under the α criterion (p > 0.01). So
around 100 samples is the minimum of the proportions tests. The sixth XOR
extract file from ERAWNN3.DAT is also to small (760 Kb) for the Zipf
correlation test as it is for the moment. Because I know for a fact that good
random files must be around 5 Mb to pass this test. This comes from the fact
that the relative frequencies of the digit pairs become more accurate with
increasing cumulative frequencies if the file is truly random. In other words
the confidence intervals and the p-value of the difference with
·
The most critical test
suite (NIST) has only one asterix (*), indicating a particular part of the test
that is not passed: For 99 samples of the Approximate Entropy (apen) test. I
can explain that; the 0.9600 proportion that has to be above the p = 0.01
criterion, p > 0.01 is also called the Significance of the tests, is an
approximate value and apen scores 0.9596, very close to the border. In dubio pro deo. Considering this the
sixth XOR extract of ERAWNN3.dat is innocent until proven guilty. The other
tests did not find the file guilty. So I rest my case.
4.1.
Reactions in the sci.math newsgroup to the successive XOR entropy extraction
concept:
A posting was done on Monday
August 7 2006 by Mensanator in the sci.math newsgroup of USENET called ‘Prime
randomness’.15 Mensanator is an acquaintance of mine and I asked him
to peer review the ‘Randomics: Study of Patterns and Randomness; Are the prime
numbers randomly distributed? Part
5. Conclusions
·
True entropy estimations by
compression come from archive files that pass standard randomness tests.
(Tables 1 and 2)
·
There is no obvious
correlation between true entropy estimations and the number of successive XOR
steps required making a biased file pass standard randomness tests. No formula
is found in which true entropy estimations is a part of that correlate with
this number of steps. (Table 3)
·
The bias of the XOR
extracts can be fully explained by the Davies and Hisakado et al. models. You
can even verify the Chi-bits found by ENT by calculating it from the bias of
the models. (Table 4)
·
Negative correlation of
adjacent bits requires less successive XOR steps for a file to become random
than a corresponding positive correlation. (Graph 1)
·
XOR extract file can pass
standard randomness tests.
·
XOR operations will never
corrupt a random file.
-o0o-
Notes &
References:
1) Winer
M., van der Galiën J.G. ‘Randomics: Are the prime numbers randomly distributed?
(Part 3)’ SATOCONOR.COM
5.6. (2006)
2)
Anonymous ‘Help File Compression Test’ (2006)
http://www.maximumcompression.com/data/hlp.php
3) Van der Galiën J.G.
"State-of-the-art compressors as tools for true entropy estimations,"
SATOCONOR.COM 4.4. (2005)
http://home.versatel.nl/galien8
4) Davies
R.B. ‘Exclusive OR (XOR) and hardware random number generators’
http://www.robertnz.net/pdf/xor2.pdf
5)
Hisakado M., K. Kitsukawa K., Mori S. ‘Correlated binomial models and
correlation structures’ ArXif.org Physics (2006)
http://arxiv.org/abs/physics/0605189
6)
Mahoney M. ‘The paq data compression programs’
http://cs.fit.edu/~mmahoney/compression/
7) Gailly
J-L. ‘The gzip homepage’
8) Walker
J. ‘Ent: A pseudo random sequence test program’
http://www.fourmilab.ch/random/
9) Knuth,
D.E 'The art of computer programming, Volume 2 / Seminumerical algorithms'
10) Van
der Galiën J.G. ‘Rabenzix v3.0 beta’ Scientia
Araneae Totius Orbis 5.4. (2006)
11) Anonymous
‘DIEHARD battery of tests of randomness v0.2 beta’ http://www.cs.hku.hk/~diehard/
12) NIST 'Random Number Generation and Testing'
13)
Anonymous ‘Fresh rpms: Red hat linux mirror sites’
http://freshrpms.net/mirrors/redhat/7.3.html
14) Tommila M. ‘ApFloat Home Page’ (I downloaded APTESTC5.ZIP)
15)
Google Groups sci.math ‘Prime randomness’ (2006)
16)
Anonymous ‘Mersenne twister: A study on random number generators’
http://student.vub.ac.be/~nkaraogl/mt/mt.html
17) Van der Galiën J.G. ‘Collatz
Randomics’ SATOCONOR.COM 5.7. (2006)