Fundamentals of RANDOMICS

The Fundamentals of RANDOMICIS are compression by state-of-the-art compressors, several randomness tests and successive XOR extractions to obtain a high entropy file.

 

Copyright & Disclaimer

 

 

SATOCONOR.COM

J.G. van der Galiën ‘Fundamentals of Randomics’ 5.5. (2006)

Full research paper

SATOCONOR.COM Journal of RANDOMICS

 

 

Fundamentals of Randomics

Supporting paper for ‘Randomics: The Study of Patterns and Randomness: Are the prime numbers randomly distributed? Part 3’

By Johan G. van der Galiën M.Sc.

Version 1.0 September 1, 2006

 

 HOME of SATOCONOR.COM

 

Abstract:

In this paper the basic tools and concepts of Randomics are explained and applied on six different source files and extracts. The tools or concepts are:

·         True entropy estimations of any file or source by means of state-of-the-art compression.

·         Extracting entropy from any file or source by means of successive XOR bitwise manipulator operations.

·         E(x XOR y) bias models from Davies and Hisakado et al.

·         Third party ENT randomness test suite. A quick way of getting an idea of the randomness of a file.

·         RANTESTS randomness test suite. 7 tests programmed accordingly to Knuth.

·         RABENZIX randomness test suite. 6 tests based on the correlation of unorthodox 1 bits Significant, 8 bits Exponent and 8 bits Mantissa (1x8x8) Reals with the laws of Zipf and Newcomb-Benford. These tests are developed by myself and are experimental (BETA).

·         Third party DIEHARD randomness test suite. Calculates up to 269 p-values for 16 different tests.

·         Third party NIST randomness test suite. Calculates around 195 p-values and 195 proportions for 16 different tests.

 

1. Introduction

The name Randomics for the field of studying patterns and randomness of data streams comes from Martin Winer. Many thanks for that good idea! And originated from our collaboration on the randomness of the prime number distribution.1 The present paper is a supporting paper for our work also published on SATOCONOR.COM as ‘Randomics: the Study of Patterns and Randomness. Are the prime numbers randomly distributed? Part 3’.1 There was a need for a supporting paper because from reactions in the newsgroups, originated from a post of the peer reviewer Mensanator, there was skepticism if one of the Randomics tools, successive XOR extractions, really could extract entropy from the prime number distribution. I come to these reactions later in the section: 4.1. Reactions in the sci.math newsgroup to the successive XOR entropy extraction.

 

1.1. True entropy estimations

The basic idea behind this is that the best multi-purpose compressors available2 produce as a matter a fact archive files that pass al known randomness tests. This can only mean that they achieve archive files with (or very near) maximal information at (or very near) the Shannon limit of compression (maximal compression).3 A random file with a true entropy of 1 bits per bit is maximal information and maximal information will always be random and pass all tests. The compression ratio (size compressed file / size source file) is the true entropy in bits per bit. It is only an estimation with 2 significant digits because the compression table is also stored in the archive file raising the compression ratio and hence the true entropy estimation. Formally you can only talk about true entropy from an infinite source because it is the limit of the outcome of the entropy formula where order is going to infinity. Finite files can be samples of an infinite source. Like for example 11 Mb of the first hexadecimal digits of PI, this is a sample from the infinite hexadecimal expansion of PI. So my true entropy is twice estimation because of finite samples and the fact that it is accurate to only 2 significant digits.

 

1.2. Extracting entropy with XOR

Some people call the removing of bias with a single XOR operation step distillation of entropy. I would like to call it extraction of entropy. Because I studied Chemistry and because the people who named it distillation all know that both are separating processes. It is just a matter of semantics and I like the word extraction more than distillation.

After a first XOR step and the file is still not random you can continue the number of steps until the stream fulfills the bias requirements and the passing of standard randomness tests. The number of XOR steps (n) required gives a rough estimation of the percentage of Shannon information (2-n*100%) content of the source file. This is a rough estimation because the extract files become exact half the size of the source files in each XOR extraction step.

 

1.3. E(x XOR y) bias models from Davies and Hisakado et al.

This model is simple4 and can even be more simplified in a program. I mean by that you do not need to read the data from the source file twice. (Once for calculating E(x) and E(y) and once for calculating 1.1. The algorithm for calculating 1.1. is: You subtract E(x) and (E(y) from each bit x and y respectively, multiply the result, add cumulatively and then divide at the end by the number of bit pairs.) I just combined the bias formula of Davies (1.4) with the covariance formula from Hisakado et al. (1.2).5 For all of my XOR extractions both models 1.4 gave the same results. This comes down to:

 

Davies Covariance(x,y) = E[{x-E(x)}{y-E(y)}] (1.1.)

 

Hisakado et al. Covariance(x,y) = E(xy) – E(x)E(y) (1.2.)

 

Davies and Hisakado et al. Correlation(x,y) = Covariance(x,y)/SQRT(E(x)(1-E(x)E(y)(1-E(y))) (1.3.)

 

Davies Bias = E(x XOR y) = 0.5-2(E(x)-0.5)(E(y)-0.5)-2Covariance(x,y) (1.4.)

 

E(x) is for instance all bits summed and divided by the number of bits, one can call this number also bias or the Expectation of x = E(x). E(xy) means multiplying the bits and add the results cumulatively divided by the numbers of bit pairs. When the bits are adjacent bits from the same stream than this actually called auto-covariance and auto-correlation.

 

2. Materials and Methods

·        True entropy estimations of any file or source by means of state-of-the-art compression were done with PASQDA.exe option –6e.6 In the case of the 638 Mb F2RAW.iso file compression this way would take a very long time. So the mean of the faster Windows XP folder zip and GZIP.exe7 results where regarded as a reasonable estimation. The produced archive files must pass standard randomness tests, then the compression ratio can be called true entropy estimation.

·        Extracting entropy from any file or source by means of successive XOR bitwise manipulator operations. XOR or eXclusive-OR for bits is: 0 XOR 0 = 0, 1 XOR 0 = 1, 0 XOR 1 = 1 and 1 XOR 1 = 0. (x XOR y) where x and y are adjacent bits. This is also called binary addition modulo 2.

·        E(x XOR y) bias models from Davies and Hisakado et al.

·        ENT randomness test suite.8 Is indeed a quick executable but has as problem that for most tests in the suite there are no hard criteria given for randomness or non-randomness. Cannot handle files in bit mode (ENT –b) of >= 638 Mb.

·        RANTESTS randomness tests suite.9 All hard criteria. Can handle files of all sizes.

·        RABENZIX randomness tests suite based on the laws of Zipf and Newcomb-Benford.10 Hard and experimental criteria. Recommended maximal file size 125 Mb.

·        DIEHARD randomness tests suite.11 All hard criteria. Can handle files of all sizes.

·        NIST randomness tests suite.12 All very hard criteria. This is the best test suite known to me. Can handle files of all sizes but not for all tests.

 

The following source files were used

·        F2RAW.ISO = FC3-i386-disc2.iso = Linux Fedora installation file (on Disc 2)13

·        ERAWNN3.dat = all primes (1 bit) all composites (0 bit)

·        INBOX.dbx = My Outlook Express 6 inbox email archive

·        CR8F8RW.dat = Collatz nodes with an 800 digits seed and 50 relative levels upwards the tree.17

·        11MBHXCH.dat = Hexadecimal digits of PI, 2 digits stored in a byte. This is an already random file and used as a reference to see what happens if you do the true entropy estimation by compression and if successive XOR operations will degenerate an already unbiased and random file. To my opinion the hexadecimal digits of PI are the best source of (pseudo) randomness. This file was made with: C:\>APTEST 11534336 x 16 > 11MBHXCH.TXT (x = 0: Chudnovsky bin split). Must be converted from ASCII to binary.14

·        RAWFILE1.dat = Odd primes (1) and odd composites (0)

 

3. Results

3.1. (True) entropy of the starting material

 

Sample

File size (Kb)

First order entropy (bits per byte)

First order entropy (bits per bit)

True entropy estimation (bits per bit)

F2RAW.iso

652,852

7.992170

0.999854

0.98*

ERAWNN3.dat

48,640

1.950989

0.300468

0.21

INBOX.dbx

31,013

5.997796

0.987479

0.25

CR8F8RW.dat

20,479

7.985174

0.999683

0.0074

11MBHXCH.dat

11,264

7.999986

1.000000

1.00

RAWFILE1.dat

4,096

4.135185

0.523445

0.45

 

Table 1: The different kinds of entropy of the starting material files.

* File too large for PASQDA -6e. Measurement done with XP Folder Zip wizard and GZIP, mean value taken (Source file size 668,520,488 bytes. Compressed file XP 652,801,942 bytes and GZIP 653,317,253 bytes). I believe that this .iso file is a collection of already (partly) compressed archives of some kind and that is why the true entropy estimation is so high.

 

Archive (compressed file) from

KS value

number p=0 or 1 out of total number of p’s

Conclusion

F2RAW.iso*

0.000000

103 out of 229

Not passed

ERAWNN3.dat

0.972926

0 out of 229

Passed

INBOX.dbx

0.886757

0 out of 145

Passed

CR8F8RW.dat**

----------

----------------

File to small, passes ENT and RANTESTS

11MBHXCH.dat

0.382593

0 out of 229

Passed

RAWFILE1.dat

0.866783

0 out of 79

Passed

 

Table 2: The results of the DIEHARD test on the starting materials to verify if the true entropy estimations from Table 1 are reasonable. To be reasonable the archive files must pass this test suite or other standard randomness tests if the file is too small.

* This file was too large for PASQDA –6e, compression would take forever, even with –1e option. Instead the faster but worse compression from GZIP.exe and Windows XP folder zip was used to get a rough indication of the true entropy.

** DIEHARD is not possible with this too small file. But it passes all the tests of ENT and RANTESTS. So the true entropy estimation from Table 1 is reasonable.

 

3.2. Extracting entropy with XOR from the starting material:

 

Sample

True entropy estimation (bits per bit)

XOR extracts that are random based on ENT

F2RAW.iso

0.98

14-18

ERAWNN3.dat

0.21

6-14

INBOX.dbx

0.25

12-14

CR8F8RW.dat

0.0074

10-14

11MBHXCH.dat

1.00

0-12

RAWFILE1.dat

0.45

5-12

 

Table 3: The (last) random XOR extracts from the starting material files. The successive XOR extractions were continued until the extract file size was around 10–100 Kb. Continuing even further would not be wise because the files would become to small for applying and significant outcomes of randomness tests. The random extracts are always these last extracts! Never was encountered that a random file degenerated because of XOR operations. This phenomenon is illustrated by the fact that an already perfect random file like 11MBHXCH.dat gave 12 perfectly random successive XOR extract files. Also shown the true entropy estimation, because there is, counter intuitively, no correlation between the data.

 

3.3. E(x XOR y) bias models from Davies and Hisakado et al.:

 

F2RAW.iso

XOR step

Chi-Bits ENT p5%=0.00 and p95%=3.84

Bias observed

Bias models

Observed – model

1

 

0.4924555071

0.4924555071

0

2

 

0.4944358299

0.4944358299

0

3

 

0.4962850037

0.4962850037

0

4

 

0.4966832518

0.4966832518

0

5

 

0.4973980571

0.4973980571

0

6

 

0.4966144102

0.4966144102

0

7

 

0.4964132137

0.4964132137

0

8

 

0.4967385889

0.4967385889

0

9

 

0.4969523146

0.4969523146

0

10

156.81

0.4972602902

0.4972602902

0

11

104.43

0.4968381042

0.4968381042

0

12

17.16

0.4981902372

0.4981871848

0.0000030524

13

7.03

0.4983594956

0.4983594956

0

14

0.94

0.4991636439

0.4991514104

0.0000122335

15

0.45

0.4991973039

0.4991728347

0.0000244692

16

0.42

0.5011397059

0.5011397059

0

17

0.74

0.4978676471

0.4978676471

0

18

0.40

0.4977941176

0.4977941176

0

 

Table 4: RED: Random zone. A representative compare of the Bias = E(x XOR y) observed versus the value from the Davies and Hisakado et al. models. All other starting material files show more or less this picture: Occasionally there is a deviation of up to only 4 significant digits, for the rest the value from the Davies and Hisakado et al. models are exact. This is not always in and around the random zone as is the case here. For example: In the case of ERAWNN3.dat there is even never a deviation between observed and the model.

 

 

Graph 1: The number of XOR steps required to make a file random as function of the correlation(x,y) (1.3.).

 

3.4. Randomness tests on the XOR extracts:

I did the ENT randomness test suite for fast screening of XOR extract files. When they were random according to this test the RANTESTS test suite was also applied. The according to ENT random files also past the RANTESTS with of course occasionally a Chi-square value outside the 5%-95% probability. And that is a must for true random files, 1 out of 20 measurements must fall > 95%! Since prime number randomness is the focus of my research interest and in need of supporting by this article, I give all collected randomness tests data for the sixth XOR extract of ERAWNN3.dat. For this file also RABENZIX, DIEHARD and NIST test suites were used.

 

Entropy = 1.000000 bits per bit.

 

Optimum compression would reduce the size

of this 6225920 bit file by 0 percent.

 

Chi square distribution for 6225920 samples is 1.75, and randomly

would exceed this value 25.00 percent of the times.

 

Arithmetic mean value of data bits is 0.4997 (0.5 = random).

Monte Carlo value for Pi is 3.139777651 (error 0.06 percent).

Serial correlation coefficient is 0.000149 (totally uncorrelated = 0.0).

 

Entropy = 7.999754 bits per byte.

 

Optimum compression would reduce the size

of this 778240 byte file by 0 percent.

 

Chi square distribution for 778240 samples is 265.69, and randomly

would exceed this value 50.00 percent of the times.

 

Arithmetic mean value of data bytes is 127.4756 (127.5 = random).

Monte Carlo value for Pi is 3.139777651 (error 0.06 percent).

Serial correlation coefficient is -0.000062 (totally uncorrelated = 0.0).

 

Log 1: The ENT test results for the sixth XOR extract of ERAWNN3.dat.

 

------------------------------

Size of file ERWN3E6.DAT = 778240 byte

------------------------------

Entropy =  9.99999797340934E-0001 bits per bit

------------------------------

CHI-Bits =  1.74913908306007E+0000

CHI 1 degree of freedom 5% = 0.00

CHI 1 degree of freedom 95% = 3.84

------------------------------

CHI-Hexadecimal= 1.09181949013145E+0001

CHI 15 degrees of freedom 5% = 7.261

CHI 15 degrees of freedom 95% = 25.00

------------------------------

KS-analysis

KnPlusMax =  7.48643578786869E-0001

KnMinusMax =  7.47842033199959E-0001

Kn/Probability Distribution at 1% =  7.07554703919868E-0002

Kn/Probability Distribution at 5% =  1.60012757680079E-0001

Kn/Probability Distribution at 25% =  3.79130859764700E-0001

Kn/Probability Distribution at 50% =  5.88572062802086E-0001

Kn/Probability Distribution at 75% =  8.32421662701563E-0001

Kn/Probability Distribution at 95% =  1.22374046688492E+0000

Kn/Probability Distribution at 99% =  1.51729418092873E+0000

------------------------------

CHI-Serial =  2.65688815789297E+0002

CHI 255 degrees of freedom 5% = 219.0

CHI 255 degrees of freedom 95% = 293.3

----------------------------

CHI-Differential =  2.32465554022929E+0001

CHI 30 degrees of freedom 5% = 18.49

CHI 30 degrees of freedom 95% = 43.77

------------------------------

CHI-Gap =  4.96455858484260E+0001

CHI 50 degrees of freedom 5% = 34.8

CHI 50 degrees of freedom 95% = 67.5

------------------------------

 

Log 2: The RANTESTS test results for the sixth XOR extract of ERAWNN3.dat.

 

FinalAnalysisReport.txt:

 

 

----------------------------

RABENZIX VERSION 3.0 BETA SOFTWARE COPYRIGHT (c) 2004, 2005, 2006

ALL RIGHTS RESERVED

JOHAN GERARD VAN DER GALIEN johan.van.der.galien@satoconor.com

----------------------------

Newcomb-Benford and Zipf randomness tests with 8 bits

exponent and 8 bits mantissa reals first two digits for ERWN3E6.DAT

-------------8x8 REALS---------------

Total amount 16 bits blocks (8x8 Reals) read = 389118

Total amount of Real numbers between 1.0E-38 and

1.0E+38 = 383769

Total amount of Reals tested (no zeroes) = 383769

----------------------------

Number of samples  = 6

Size of one sample (bytes) = 129706

----------------------------

----------------------------

Test NEWCOMB-BENFORD (8x8 Real SAMPLED) Fd=log10(1+1/d)

First two digits 10 up to 99. So 89 degrees of freedom.

Approximate proportion 95% p's > 0.05 for 6 samples >= 0.6831

                                                            Proportion 95% observed = 0.8333

Approximate proportion 99% p's > 0.01 for 6 samples >= 0.8681

                                                            Proportion 99% observed = 0.8333

----------------------------

----------------------------

I REMIND YOU THAT THE RHO MUST FALL IN THE

CONFIDENCE INTERVALS (CI) OF R TO PASS THE TEST!

Test ZIPF (8x8 Real UNSAMPLED) Fd=(10^b)*d^a --> log10(Fd)=a*log10(d)+b

                        Slope observed (a) = -9.8399E-01

CRITERION a total number space = -9.8384E-01

          Intersection observed (b) = -3.9446E-01

CRITERION b total number space = -3.9461E-01

Correlation observed (R) = -9.9944E-01

RHO of total population  = -9.9988E-01

CI 95% -9.9963E-01 <= R <= -9.9915E-01

CI 99% -9.9968E-01 <= R <= -9.9903E-01

p-value difference = 0.0000

    CRITERION 95% p >= 0.0500

    CRITERION 99% p >= 0.0100

----------------------------

 

Benford\Stats.txt:

 

Thu Nov 23 13:26:03 2006

 

 

Chi-sq sample 1 from ERWN3E6.DAT = 84.29

p-value = 0.6215

 

Chi-sq sample 2 from ERWN3E6.DAT = 93.59

p-value = 0.3491

 

Chi-sq sample 3 from ERWN3E6.DAT = 125.03

p-value = 0.0071

 

Chi-sq sample 4 from ERWN3E6.DAT = 75.65

p-value = 0.8425

 

Chi-sq sample 5 from ERWN3E6.DAT = 104.21

p-value = 0.1292

 

Chi-sq sample 6 from ERWN3E6.DAT = 97.07

p-value = 0.2619

 

Log 3: The RABENZIX v3.0 BETA test results of the sixth XOR extract of ERAWNN3.dat.

 

All p-values:

0.4093,0.6137,0.0282,0.8625,0.7261,0.7405,0.5319,0.5984,0.0401,0.2792,

0.7502,

 

Overall p-value after applying KStest on 11 p-values = 0.677238

 

Log 6: The DIEHARD test results of the sixth XOR extract of ERAWNN3.dat. This is actually only the Minimum Distance test of the suite. The file is too small for all the other 16 tests.

 

------------------------------------------------------------------------------

RESULTS FOR THE UNIFORMITY OF P-VALUES AND THE PROPORTION OF PASSING SEQUENCES

------------------------------------------------------------------------------

------------------------------------------------------------------------------

  C1  C2  C3  C4  C5  C6  C7  C8  C9 C10  P-VALUE  PROPORTION  STATISTICAL TEST

------------------------------------------------------------------------------

  10   7  15  13  11   9   9  13   6   6  0.307407   1.0000    frequency

    5  12  13  10   5   4  14   6   9  21  0.000648   1.0000    block-frequency M=100

    7   9  11  17  10   8   9   9   9  10   0.500934   1.0000    cumulative-sums

    6  18  13   9  11   7  11   7   9   8  0.134686   1.0000    cumulative-sums

  12   8  12   8  10   9  14  10   7   9  0.772760   0.9899    runs

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

The minimum pass rate for each statistical test with the exception of the random

excursion (variant) test is approximately = 0.960000 for a sample size = 99

binary sequences.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

 

------------------------------------------------------------------------------

  C1  C2  C3  C4  C5  C6  C7  C8  C9 C10  P-VALUE  PROPORTION  STATISTICAL TEST

------------------------------------------------------------------------------

    7   9  16  13  12  11   8  13   6   4  0.097224   0.9697    longest-run

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

The minimum pass rate for each statistical test with the exception of the random

excursion (variant) test is approximately = 0.960000 for a sample size = 99

binary sequences.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

 

------------------------------------------------------------------------------

  C1  C2  C3  C4  C5  C6  C7  C8  C9 C10  P-VALUE  PROPORTION  STATISTICAL TEST

------------------------------------------------------------------------------

    0   2   2   4   2   2   3   0   3   2  0.637119   1.0000    rank

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

The minimum pass rate for each statistical test with the exception of the random

excursion (variant) test is approximately = 0.923254 for a sample size = 20

binary sequences.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

 

------------------------------------------------------------------------------

  C1  C2  C3  C4  C5  C6  C7  C8  C9 C10  P-VALUE  PROPORTION  STATISTICAL TEST

------------------------------------------------------------------------------

  13  11   9  13   8  10   9  10  10   6  0.793973   0.9697    fft

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

The minimum pass rate for each statistical test with the exception of the random

excursion (variant) test is approximately = 0.960000 for a sample size = 99

binary sequences.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

 

sample size = 1

 

                            NONPERIODIC TEMPLATES TEST

--------------------------------------------------------------------------------

                            COMPUTATIONAL INFORMATION

--------------------------------------------------------------------------------

            LAMBDA = 759.991211     M = 778240  N = 8 m = 10      n = 6225920

--------------------------------------------------------------------------------

                        F R E Q U E N C Y

Template   W_1  W_2  W_3  W_4  W_5  W_6  W_7  W_8    Chi^2   P_value Assignment Index

--------------------------------------------------------------------------------

0000000001 767  772  800  784  747  743  729  775   5.380962 0.716190 SUCCESS   0

0000000011 756  786  782  761  739  706  768  770   6.298163 0.613872 SUCCESS   1

0000000101 719  793  798  812  733  734  794  741  13.193182 0.105373 SUCCESS   2

0000000111 773  777  719  762  760  756  725  758   4.540996 0.805318 SUCCESS   3

0000001001 789  779  770  787  713  772  725  737   8.228815 0.411444 SUCCESS   4

0000001011 770  778  746  794  761  763  804  781   5.583947 0.693723 SUCCESS   5

0000001101 748  791  767  746  728  715  780  755   6.466191 0.595160 SUCCESS   6

0000001111 778  784  755  761  764  778  748  744   2.234303 0.972973 SUCCESS   7

0000010001 756  798  803  734  802  698  804  749  15.620257 0.048149 SUCCESS   8

0000010011 831  790  771  779  756  784  760  766   9.456730 0.305242 SUCCESS   9

0000010101 747  742  786  773  730  730  742  764   4.661279 0.793089 SUCCESS  10

0000010111 732  750  730  764  779  744  772  767   3.498147 0.899333 SUCCESS  11

0000011001 730  815  776  731  791  746  728  786  10.563643 0.227670 SUCCESS  12

0000011011 778  737  790  773  771  688  806  761  12.527741 0.129165 SUCCESS  13

0000011101 752  743  750  737  793  717  753  776   5.662920 0.684931 SUCCESS  14

0000011111 802  749  733  784  769  752  722  722   8.341852 0.400809 SUCCESS  15

0000100011 772  795  786  740  774  733  790  744   6.069124 0.639489 SUCCESS  16

0000100101 819  817  758  743  774  788  703  720  17.231218 0.027789 SUCCESS  17

0000100111 757  777  760  801  714  779  739  798   8.502219 0.386009 SUCCESS  18

0000101001 750  784   788  757  777  765  731  742   3.952611 0.861373 SUCCESS  19

0000101011 757  739  794  818  759  742  747  760   7.326243 0.501877 SUCCESS  20

0000101101 727  769  762  815  758  750  778  712   9.291939 0.318270 SUCCESS  21

0000101111 730  752  774  795   757  741  770  744   4.170409 0.841430 SUCCESS  22

0000110001 710  757  762  791  749  771  745  802   7.648660 0.468520 SUCCESS  23

0000110011 715  817  752  737  801  776  737  780  11.708854 0.164674 SUCCESS  24

0000110101 728  775  713  731  778  746   753  760   6.524096 0.588736 SUCCESS  25

0000110111 772  705  776  777  794  702  788  770  12.224279 0.141474 SUCCESS  26

0000111001 730  808  776  773  799  776  780  768   7.872769 0.445996 SUCCESS  27

0000111011 757  770  753  720  791  732  736  760     5.467136 0.706678 SUCCESS  28

0000111101 740  729  759  743  762  789  733  785   5.160619 0.740279 SUCCESS  29

0000111111 796  751  735  754  783  747  726  749   5.379171 0.716387 SUCCESS  30

0001000011 777  781  792  739  786  802  791  776   7.849438 0.448314 SUCCESS  31

0001000101 724  786  773  747  723  715  781  748   8.429488 0.392679 SUCCESS  32

0001000111 755  831  803  751  781  769  803  707  16.326767 0.037935 SUCCESS  33

0001001001 795  711  728  775  782  803  788  761  10.717422 0.218230 SUCCESS  34

0001001011 804  820  758  743  770  781  750  715  11.390388 0.180545 SUCCESS  35

0001001101 787  754  750  741  758  756  813  715   8.151103 0.418851 SUCCESS  36

0001001111 752  776  769  807  745  768  748  728   5.452792 0.708264 SUCCESS  37

0001010011 814  777  779  761  731  781  779  782   7.636670 0.469741 SUCCESS  38

0001010101 743  756  761  771  736  722  738  739   4.518037 0.807626 SUCCESS  39

0001010111 730  746  760  796  745  734  724  722   8.085335 0.425179 SUCCESS  40

0001011001 758  777  813  774  762  790  786  774   6.806165 0.557683 SUCCESS  41

0001011011 729  756  780  806  779  737  762  719   8.134874 0.420407 SUCCESS  42

0001011101 759  780  713  729  731  758  745  796   7.957903 0.437593 SUCCESS  43

0001011111 738  740  759  788  733  727  776  760   5.016720 0.755788 SUCCESS  44

0001100101 784  751  762  750  790  767  751  823   7.724541 0.460830 SUCCESS  45

0001100111 741  792  762  719  786  789  749  759   6.313593 0.612150 SUCCESS  46

0001101001 733  788   791  752  785  785  754  729   6.415955 0.600744 SUCCESS  47

0001101011 696  762  750  750  720  777  779  749   8.941292 0.347273 SUCCESS  48

0001101101 796  765  769  749  788  773  763  757   3.345505 0.910844 SUCCESS  49

0001101111 780  693  775  754   756  748  775  752   7.505277 0.483222 SUCCESS  50

0001110011 751  811  738  757  746  817  808  752  12.052255 0.148888 SUCCESS  51

0001110101 734  783  765  730  739  733  777  732   5.860679 0.662834 SUCCESS  52

0001110111 748  739  768  756  760  770   793  827   8.505748 0.385687 SUCCESS  53

0001111001 772  765  755  752  756  815  770  769   4.667259 0.792475 SUCCESS  54

0001111011 750  700  778  710  743  762  776  815  13.536953 0.094662 SUCCESS  55

0001111101 785  772  775  767  789  757  782  778     3.624083 0.889349 SUCCESS  56

0001111111 757  771  727  763  784  721  762  725   6.103740 0.635613 SUCCESS  57

0010000011 748  746  738  736  802  721  764  817  10.658100 0.221834 SUCCESS  58

0010000101 751  780  756  773  767  736  739  734   3.227285 0.919295 SUCCESS  59

0010000111 756  789  764  755  775  787  765  801   4.772727 0.781568 SUCCESS  60

0010001011 746  705  763  778  728  769  788  764   7.317772 0.502767 SUCCESS  61

0010001101 795  754  785  739  803  765  770  752   5.853960 0.663587 SUCCESS  62

0010001111 725  793  803  771  748  748  783  699  11.827113 0.159086 SUCCESS  63

0010010011 787  721  741  753  772  798  773  764   5.943853 0.653521 SUCCESS  64

0010010101 848  770  717  716  775  795  754  732  18.634541 0.016941 SUCCESS  65

0010010111 787  781  811  768  755  756  708  799  10.862732 0.209599 SUCCESS  66

0010011011 786  742  740  728  753  733  828  741  10.975343 0.203100 SUCCESS  67

0010011101 801  765  725  767  724  756  724  844  16.951983 0.030613 SUCCESS  68

0010011111 760  796  761  818  771  795  758  720  10.206136 0.250855 SUCCESS  69

0010100011 724  761  754  760  700  785  784  725   9.863905 0.274708 SUCCESS  70

0010100111 807  812  770  769  703  743  777  738  12.609875 0.125996 SUCCESS  71

0010101011 790  760  738  744  763  739  750  755   2.968608 0.936312 SUCCESS  72

0010101101 762  755  789  754  702  731  743  777   7.625466 0.470883 SUCCESS  73

0010101111 699  735  769  751  785  792  749  783   9.125508 0.331819 SUCCESS  74

0010110011 763  792  787  790  769  753  772  740   4.474486 0.811980 SUCCESS  75

0010110101 753  713  782  779  790  768  759  749   5.616353 0.690118 SUCCESS  76

0010110111 751  725  744  743  791  735  734  733   6.488579 0.592675 SUCCESS  77

0010111011 797  775   730  740  727  752  744  783   6.477340 0.593922 SUCCESS  78

0010111101 747  752  763  790  739  721  782  740   5.345489 0.720092 SUCCESS  79

0010111111 735  735  761  786  747  785  814  760   7.558343 0.477755 SUCCESS  80

0011000101 729  736  792  788   765  757  758  773   4.762467 0.782636 SUCCESS  81

0011000111 696  733  800  765  751  716  767  788  12.466678 0.131565 SUCCESS  82

0011001011 795  748  790  759  777  755  763  755   3.511323 0.898309 SUCCESS  83

0011001101 761  784  755  747  748  722   728  758   4.539019 0.805517 SUCCESS  84

0011001111 746  766  755  714  756  771  739  762   3.959993 0.860715 SUCCESS  85

0011010101 745  730  726  738  749  750  766  742   4.482708 0.811161 SUCCESS  86

0011010111 715  779  775  771  757  809  802  759     9.262025 0.320676 SUCCESS  87

0011011011 747  793  774  755  786  729  820  742   9.439838 0.306560 SUCCESS  88

0011011101 746  753  774  792  773  754  723  787   5.052145 0.751989 SUCCESS  89

0011011111 790  726  788  781  710  719  774  745  10.567356 0.227438 SUCCESS  90

0011100101 739  791  771  809  781  767  773  773   6.373764 0.605441 SUCCESS  91

0011101011 791  747  774  730  722  749  754  786   6.036379 0.643157 SUCCESS  92

0011101101 813  797  746  743  729  737  702  732  13.808283 0.086901 SUCCESS  93

0011101111 737  748  785  755  761  727  780  765   3.804290 0.874335 SUCCESS  94

0011110101 734  719  800  718  767  761  713  776  11.039550 0.199468 SUCCESS  95

0011110111 767  733  706  783  813  743  741  810  13.651030 0.091327 SUCCESS  96

0011111011 722  774  783  761  775  806  740  757   6.596969 0.580673 SUCCESS  97

0011111101 771  750  811  759  771  803  734  721   9.372193 0.311877 SUCCESS  98

0011111111 775  755  734  755  778  719  752  747   4.273850 0.831609 SUCCESS  99

0100000011 763  752  755  753  767  780  749  748   1.154007 0.997076 SUCCESS 100

0100000111 775  755  717  723  780  711  747  772   8.821902 0.357542 SUCCESS 101

0100001011 736  766  738  767  751  757  773  738   2.529904 0.960322 SUCCESS 102

0100001111 783  772  775  722  788  829  761  791  11.866869 0.157243 SUCCESS 103

0100010011 762  749  723  791  793  730  743  765   6.378320 0.604934 SUCCESS 104

0100010111 722  719  706  793  749  725  763  789  12.500357 0.130236 SUCCESS 105

0100011011 799  739  789  784  781  754  779  726   7.205068 0.514678 SUCCESS 106

0100011111 740  796  764  783  762  751  780  708   7.279907 0.506756 SUCCESS 107

0100100011 766  763  764  778  780  773  781  735   2.709558 0.951242 SUCCESS 108

0100100111 760  721  744  780  760  823  711  766  11.506576 0.174614 SUCCESS 109

0100101011 754  753  724  720  798  788  772  768   7.262313 0.508614 SUCCESS 110

0100101111 756  793  770  768  741  756  776  761   2.552253 0.959250 SUCCESS 111

0100110011 801  714  782  717  737  744  768  741   9.838692 0.276531 SUCCESS 112

0100110111 755  717  744  753  764  770  767  788   4.193083 0.839296 SUCCESS 113

0100111011 820  810  717  785  725  692  752  776  19.765356 0.011261 SUCCESS 114

0100111111 779  792  775  795  732  786  733  777   7.125052 0.523203 SUCCESS 115

0101000011 716  745  746  715  809  748  761  772   9.479582 0.303466 SUCCESS 116

0101000111 802  764  683  761  714  813  759  784  17.712868 0.023485 SUCCESS 117

0101001011 783  779  748  741  754  759  794  767   3.536395 0.896347 SUCCESS 118

0101001111 775  795  742  729  739  759  703  753   8.678804 0.370112 SUCCESS 119

0101010011 747  760  793  755  743  797  736  791   6.004489 0.646729 SUCCESS 120

0101010111 772  788  751  729  789  722  761  761   5.707047 0.680009 SUCCESS 121

0101011011 797  783  748  726  768  751  732  766   5.580978 0.694053 SUCCESS 122

0101011111 723  802  751  717  778  755  755  792   8.661873 0.371618 SUCCESS 123

0101100011 772  802  768  790  796  783  812  797  11.763290 0.162082 SUCCESS 124

0101100111 751  741  801  772  728  735  774  767   5.578249 0.694356 SUCCESS 125

0101101111 764  771  759  707  751  755  715  725   8.447079 0.391059 SUCCESS 126

0101110011 789  793  787  814  750  759  792  774   9.249422 0.321694 SUCCESS 127

0101110111 772  750  744  739  706  742  770  786   6.644004 0.575483 SUCCESS 128

0101111011 718  758  759  792  772  714  782  759   7.423991 0.491652 SUCCESS 129

0101111111 750  760  775  729  728  782  819  764   8.434761 0.392193 SUCCESS 130

0110000111 750  746  749  767  788  747  773  798   4.065793 0.851139 SUCCESS 131

0110001111 748  744  826  740  783  750  769  780   8.402022 0.395216 SUCCESS 132

0110010111 757  768  758  740  755  787  772  761   1.845180 0.985396 SUCCESS 133

0110011111 759  779   789  718  734  784  761  773   5.884581 0.660159 SUCCESS 134

0110100111 754  739  762  735  734  797  760  754   4.271640 0.831821 SUCCESS 135

0110101111 736  785  769  800  789  803  777  731   8.987731 0.343332 SUCCESS 136

0110110111 732  765  756  773   755  734  807  723   7.068503 0.529260 SUCCESS 137

0110111111 765  739  752  762  709  758  782  751   4.964363 0.761378 SUCCESS 138

0111001111 763  754  774  771  814  752  787  760   5.460087 0.707458 SUCCESS 139

0111011111 746  725  790  719  767  723   752  797   9.186141 0.326838 SUCCESS 140

0111111111 751  763  736  737  723  753  738  738   4.797600 0.778974 SUCCESS 141

1000000000 767  772  800  784  747  743  729  775   5.380962 0.716190 SUCCESS 142

1000100000 715  795  790  796  783  753  782  793   10.188026 0.252076 SUCCESS 143

1000110000 732  741  725  752  761  753  786  745   4.536149 0.805806 SUCCESS 144

1001000000 747  837  787  769  715  710  758  764  15.354803 0.052603 SUCCESS 145

1001001000 773  807  757  770  754  767  779  719   6.186914 0.626304 SUCCESS 146

1001010000 766  787  753  755  744  746  748  732   2.973782 0.935992 SUCCESS 147

 

sample size = 1

 

                                OVERLAPPING TEMPLATE OF ALL ONES TEST

                        -----------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------------

                        (a) n (sequence_length)      = 6225920

                        (b) m (block length of 1s)   = 10

                        (c) M (length of substring)  = 1032

                        (d) N (number of substrings) = 6032

                        (e) lambda [(M-m+1)/2^m]     = 0.999023

                        (f) eta                                        = 0.499512

                        -----------------------------------------------

                             F R E Q U E N C Y

                           0   1   2   3   4 >=5   Chi^2   P-value  Assignment

                        -----------------------------------------------

                        3678 894 602 310 236 312  9.510428 0.090357 SUCCESS

 

------------------------------------------------------------------------------

  C1  C2  C3  C4  C5  C6  C7  C8  C9 C10  P-VALUE  PROPORTION  STATISTICAL TEST

------------------------------------------------------------------------------

    2   0   3     3   3   0   2   0   1   2  0.035174   1.0000    universal

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

The minimum pass rate for each statistical test with the exception of the random

excursion (variant) test is approximately = 0.915376 for a sample size = 16

binary sequences.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

 

------------------------------------------------------------------------------

  C1  C2  C3  C4  C5  C6  C7  C8  C9 C10  P-VALUE  PROPORTION  STATISTICAL TEST

------------------------------------------------------------------------------

  17   8  10  16   4   8   9   6  11  10  0.045348   0.9596 *  apen   M=10

  17   8   7   9  13   7   6  12   8  12  0.162606   0.9899    serial M=10

    8  12   7  12  13   9   9   9   9  11  0.853234   0.9899    serial

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

The minimum pass rate for each statistical test with the exception of the random

excursion (variant) test is approximately = 0.960000 for a sample size = 99

binary sequences.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - -

 

sample size = 6 for the first three there was an insufficient amount of cycles.

 

                                        RANDOM EXCURSIONS TEST

                        --------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        --------------------------------------------

                        (a) Number Of Cycles (J) = 0567

                        (b) Sequence Length (n)  = 1000000

                        (c) Rejection Constraint = 500.000000

                        -------------------------------------------

SUCCESS           x = -4 chi^2 =  2.146930 p_value = 0.828464

SUCCESS           x = -3 chi^2 =  6.459386 p_value = 0.264048

SUCCESS           x = -2 chi^2 =  2.688767 p_value = 0.747836

SUCCESS           x = -1 chi^2 =  1.631393 p_value = 0.897427

SUCCESS           x =  1 chi^2 =  2.703704 p_value = 0.745552

SUCCESS           x =  2 chi^2 =  2.658110 p_value = 0.752518

SUCCESS           x =  3 chi^2 =  8.232694 p_value = 0.143869

SUCCESS           x =  4 chi^2 =  7.628949 p_value = 0.177905

 

                                        RANDOM EXCURSIONS TEST

                        --------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        --------------------------------------------

                        (a) Number Of Cycles (J) = 1369

                        (b) Sequence Length (n)  = 1000000

                        (c) Rejection Constraint = 500.000000

                        -------------------------------------------

SUCCESS           x = -4 chi^2 =  2.269473 p_value = 0.810740

SUCCESS           x = -3 chi^2 =  1.283278 p_value = 0.936643

SUCCESS           x = -2 chi^2 =  4.029245 p_value = 0.545213

SUCCESS           x = -1 chi^2 =  2.551497 p_value = 0.768720

SUCCESS           x =  1 chi^2 =  2.346969 p_value = 0.799344

SUCCESS           x =  2 chi^2 =  3.751923 p_value = 0.585656

SUCCESS           x =  3 chi^2 =  6.638199 p_value = 0.248968

SUCCESS           x =  4 chi^2 =  6.917782 p_value = 0.226827

 

                                        RANDOM EXCURSIONS TEST

                        --------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        --------------------------------------------

                        (a) Number Of Cycles (J) = 0516

                        (b) Sequence Length (n)  = 1000000

                        (c) Rejection Constraint = 500.000000

                        -------------------------------------------

SUCCESS           x = -4 chi^2 =  3.693564 p_value = 0.594322

SUCCESS           x = -3 chi^2 =  4.355367 p_value = 0.499465

SUCCESS           x = -2 chi^2 =  4.207867 p_value = 0.519893

SUCCESS           x = -1 chi^2 =  5.220930 p_value = 0.389517

SUCCESS           x =  1 chi^2 =  6.046512 p_value = 0.301719

SUCCESS           x =  2 chi^2 =  6.223753 p_value = 0.285052

SUCCESS           x =  3 chi^2 =  5.639926 p_value = 0.342846

SUCCESS           x =  4 chi^2 =  1.900900 p_value = 0.862680

 

sample size = 6, for the first three there was an insufficient amount of cycles.

 

                                    RANDOM EXCURSIONS VARIANT TEST

                        --------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        --------------------------------------------

                        (a) Number Of Cycles (J) = 567

                        (b) Sequence Length (n)  = 1000000

                        --------------------------------------------

SUCCESS           (x = -9) Total visits =  588; p-value = 0.879780

SUCCESS           (x = -8) Total visits =  593; p-value = 0.841987

SUCCESS           (x = -7) Total visits =  557; p-value = 0.934360

SUCCESS           (x = -6) Total visits =  511; p-value = 0.616089

SUCCESS           (x = -5) Total visits =  498; p-value = 0.494606

SUCCESS           (x = -4) Total visits =  497; p-value = 0.432058

SUCCESS           (x = -3) Total visits =  499; p-value = 0.366493

SUCCESS           (x = -2) Total visits =  513; p-value = 0.354539

SUCCESS           (x = -1) Total visits =  537; p-value = 0.372998

SUCCESS           (x =  1) Total visits =  584; p-value = 0.613680

SUCCESS           (x =  2) Total visits =  612; p-value = 0.440401

SUCCESS           (x =  3) Total visits =  642; p-value = 0.319239

SUCCESS           (x =  4) Total visits =  643; p-value = 0.393649

SUCCESS           (x =  5) Total visits =  623; p-value = 0.579360

SUCCESS           (x =  6) Total visits =  608; p-value = 0.713547

SUCCESS           (x =  7) Total visits =  601; p-value = 0.779456

SUCCESS           (x =  8) Total visits =  587; p-value = 0.878124

SUCCESS           (x =  9) Total visits =   577; p-value = 0.942584

 

                                    RANDOM EXCURSIONS VARIANT TEST

                        --------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        --------------------------------------------

                        (a) Number Of Cycles (J) = 1369

                        (b) Sequence Length (n)  = 1000000

                        --------------------------------------------

SUCCESS           (x = -9) Total visits =  984; p-value = 0.074340

SUCCESS           (x = -8) Total visits = 1046; p-value = 0.110976

SUCCESS           (x = -7) Total visits = 1043; p-value = 0.083999

SUCCESS           (x = -6) Total visits = 1031; p-value = 0.051461

SUCCESS           (x = -5) Total visits = 1077; p-value = 0.062866

SUCCESS           (x = -4) Total visits = 1185; p-value = 0.183821

SUCCESS           (x = -3) Total visits = 1270; p-value = 0.397484

SUCCESS           (x = -2) Total visits = 1295; p-value = 0.414216

SUCCESS           (x = -1) Total visits = 1336; p-value = 0.528261

SUCCESS           (x =  1) Total visits = 1352; p-value = 0.745267

SUCCESS           (x =  2) Total visits = 1305; p-value = 0.480089

SUCCESS           (x =  3) Total visits = 1319; p-value = 0.669135

SUCCESS           (x =  4) Total visits = 1355; p-value = 0.919451

SUCCESS           (x =  5) Total visits = 1330; p-value = 0.803792

SUCCESS           (x =  6) Total visits = 1342; p-value = 0.876365

SUCCESS           (x =  7) Total visits = 1340; p-value = 0.877836

SUCCESS           (x =  8) Total visits = 1267; p-value = 0.614744

SUCCESS           (x =  9) Total visits = 1276; p-value = 0.666422

 

                                    RANDOM EXCURSIONS VARIANT TEST

                        --------------------------------------------

                        COMPUTATIONAL INFORMATION:

                        --------------------------------------------

                        (a) Number Of Cycles (J) = 516

                        (b) Sequence Length (n)  = 1000000

                        --------------------------------------------

SUCCESS           (x = -9) Total visits =  603; p-value = 0.511288

SUCCESS           (x = -8) Total visits =  671; p-value = 0.212840

SUCCESS           (x = -7) Total visits =  676; p-value = 0.167167

SUCCESS           (x = -6) Total visits =  651; p-value = 0.205133

SUCCESS           (x = -5) Total visits =  654; p-value = 0.152167

SUCCESS           (x = -4) Total visits =  624; p-value = 0.203844

SUCCESS           (x = -3) Total visits =  560; p-value = 0.540187

SUCCESS           (x = -2) Total visits =  579; p-value = 0.257532

SUCCESS           (x = -1) Total visits =  586; p-value = 0.029331

SUCCESS           (x =  1) Total visits =  456; p-value = 0.061801

SUCCESS           (x =  2) Total visits =  430; p-value = 0.122200

SUCCESS           (x =  3) Total visits =  411; p-value = 0.143818

SUCCESS           (x =  4) Total visits =  405; p-value = 0.191562

SUCCESS           (x =  5) Total visits =  449; p-value = 0.486926

SUCCESS           (x =  6) Total visits =  507; p-value = 0.932682

SUCCESS           (x =  7) Total visits =  533; p-value = 0.883314

SUCCESS           (x =  8) Total visits =  523; p-value = 0.955133

SUCCESS           (x =  9) Total visits =  465; p-value = 0.700208

 

sample size = 1

 

-----------------------------------------------------

            L I N E A R  C O M P L E X I T Y

-----------------------------------------------------

            M (substring length)     = 500

            N (number of substrings) = 12451

-----------------------------------------------------

                F R E Q U E N C Y                           

-----------------------------------------------------

    C0   C1   C2   C3   C4   C5   C6    CHI2    P-value

-----------------------------------------------------

            Note: 420 bits were discarded!

  122  369 1593 6191 3151  752  273  4.692328 0.583835

 

sample size = 6

 

                                        LEMPEL-ZIV COMPRESSION TEST

                        -----------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------

                        (a) W (# of words) = 69591

                        -----------------------------------------

SUCCESS           p_value = 0.628152

 

                                        LEMPEL-ZIV COMPRESSION TEST

                        -----------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------

                        (a) W (# of words) = 69579

                        -----------------------------------------

SUCCESS           p_value = 0.141130

 

                                        LEMPEL-ZIV COMPRESSION TEST

                        -----------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------

                        (a) W (# of words) = 69587

                        -----------------------------------------

SUCCESS           p_value = 0.444155

 

                                        LEMPEL-ZIV COMPRESSION TEST

                        -----------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------

                        (a) W (# of words) = 69590

                        -----------------------------------------

SUCCESS           p_value = 0.583209

 

                                        LEMPEL-ZIV COMPRESSION TEST

                        -----------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------

                        (a) W (# of words) = 69577

                        -----------------------------------------

SUCCESS           p_value = 0.095274

 

                                        LEMPEL-ZIV COMPRESSION TEST

                        -----------------------------------------

                        COMPUTATIONAL INFORMATION:

                        -----------------------------------------

                        (a) W (# of words) = 69584

                        -----------------------------------------

SUCCESS           p_value = 0.311714

 

Log 7: The NIST test results of the sixth XOR extract of ERAWNN3.dat.

 

4. Discussion

The true entropy estimates of the starting material files had a wide range between 0.0074 and 1.00 bits per byte. (Table 1) Since all, except one, archive files pass standard randomness tests the numbers in this range are a good estimate. (Table 2) I remind you that a PRNG really fails big for the DIEHARD tests suite if there are p=1 or 0 in more than six places. Only the compressed F2RAW.iso by Windows XP folder zip, which is not a state-of-the-art compressor, has 103 p=1 or 0. The other have 0 of these values and also the KS-values of all p-values are for these files within 0.05-0.95. So these files pass the DIEHARD test suite and should give a good idea how well successive XOR operations can extract entropy in all cases.

There is no simple correlation between the true entropy estimates and the number of successive XOR steps when the files start to pass the ENT randomness test suite. (Table 3) For me this is countering intuition. XOR extracts entropy and should be dependent on the true entropy of the source. Nevertheless the bias development of the extract files follows more or less exactly the Davies and Hisakado et al. models. (Table 4) You can even calculate the expected Chi-Bits from the predicted bias of the models for the next step:

·        F2RAW.iso = 652,852 Kb * 1024 = 668,520,448 byte

·        14th successive XOR extract = 2-14*668,520,448 = 40,803.25 byte * 8 = 326,426 bits

·        predicted E(x XOR y) bias by the models = fraction 1 bits (p1) = 0.4991514104

·        Number of 1 bits = p1 * 326,426 = 162,936 bits

·        Number of 0 bits = (1 - p1) * 326,426 = 163,490 bits

·        Expected number of 1 or 0 bits = 326,426 / 2 = 163,213

·        Chi-square on the bits = [(162,936 – 163,213)2 / 163,213] + [(163,490 – 163,213)2 / 163,213] = 0.94

Exactly the value found by ENT from Table 4. Additional evidences how good the Davies and Hisakado et al. models are. My intention was to discover a formula from which one can calculate the number of successive XOR steps needed to make a file random. But randomness has of course many aspects and with the models one can only calculate the bias, which can as I demonstrated be recalculated as Chi-Bits but that is only one, most simple and yet a fundamental aspect of randomness. With E(x XOR y) you get the fraction of the one bits but not the E(x) and E(y) of the adjacent bits. You got to measure them by a computer program. So you can only calculate, after the measurement, Chi-Bits for the next step and not for a series of successive steps.

From Graph 1 is clear that a file with negative correlation requires less XOR steps than one with the corresponding positive correlation. This can be understood as follows: When the (x,y) bits are completely identical the correlation is +1. When they are statistically independent the correlation is 0. When they are completely different the correlation is –1. It is just a matter of positive and negative correlation. When the adjacent bits are more different there is more information already present and the amount of entropy extraction per XOR step is higher than when the bits are more alike.

The randomness tests of the sixth XOR extract of ERAWNN3.dat all looks very well. I have only a few comments:

·        As I said in 2. Materials and Methods for some of the ENT tests there are no hard criteria. Judging these comes from common sense and experience. Also for instance the Monte Carlo value of PI becomes better when the random files get larger. There is a good site that does a comparison of the value of PI from good and bad PRNG’s for fixed 15,000 number of Monte-Carlo experiments.16 The 15 best ones have typically errors in the 0.01-0.66% range. ENT does one experiment for each six successive bytes, calculates 24 bit X and Y coordinates from them and does one dart throwing experiment. For my XOR extract this comes down to 129,706 experiments. Since I do not know the relationship between file size and PI error of the same source, I regard my 0.06% from Log 1, with caution, as very good.

·          The RABENZIX is, as I said earlier, an experimental test suite in the BETA phase. I fully belief that the Chi-based Newcomb-Benford correlation tests are sound and the hard statistical criteria can be correctly applied. The number of samples (six) in Log 3 is to small to draw conclusions. Actually the number of samples for the full test should be >= 50, then RABENZIX also calculates the P-of-the-p’s for Newcomb-Benford. Now the proportions are unreliable and consequently only the 95% criterion is passed. For accurate measurements there should be a reasonable change that at least one sample is under the α criterion (p > 0.01). So around 100 samples is the minimum of the proportions tests. The sixth XOR extract file from ERAWNN3.DAT is also to small (760 Kb) for the Zipf correlation test as it is for the moment. Because I know for a fact that good random files must be around 5 Mb to pass this test. This comes from the fact that the relative frequencies of the digit pairs become more accurate with increasing cumulative frequencies if the file is truly random. In other words the confidence intervals and the p-value of the difference with Rho are based on fixed 90 (x,y) pairs but the accuracy of y (= relative frequency of a digit pair) is dependent on the size of the “random” file. I haven not yet figured it out how to adapt the confidence intervals with sample size. This will be concluded soon, keep following future versions of this article and that of RABENZIX.10

·        The most critical test suite (NIST) has only one asterix (*), indicating a particular part of the test that is not passed: For 99 samples of the Approximate Entropy (apen) test. I can explain that; the 0.9600 proportion that has to be above the p = 0.01 criterion, p > 0.01 is also called the Significance of the tests, is an approximate value and apen scores 0.9596, very close to the border. In dubio pro deo. Considering this the sixth XOR extract of ERAWNN3.dat is innocent until proven guilty. The other tests did not find the file guilty. So I rest my case.

 

4.1. Reactions in the sci.math newsgroup to the successive XOR entropy extraction concept:

A posting was done on Monday August 7 2006 by Mensanator in the sci.math newsgroup of USENET called ‘Prime randomness’.15 Mensanator is an acquaintance of mine and I asked him to peer review the ‘Randomics: Study of Patterns and Randomness; Are the prime numbers randomly distributed? Part 3’ (draft) paper. He had problems with my successive XOR extraction of entropy, from files where odd primes are coded with an 1 bit and odd composites with an 0 bit, until the extract passes randomness tests. So he consulted the sci.math newsgroup. To be fair I called this extraction of randomness and maybe extraction of entropy or Shannon information are better words. But I do not think that is the point, he simply did not belief the concept. I posted my comments on the newsgroup reactions, which needed more research, a few month later in this thread of sci.math. Please take a look at it because all the skepticism can be refuted, reference 15.

 

5. Conclusions

·        True entropy estimations by compression come from archive files that pass standard randomness tests. (Tables 1 and 2)

·        There is no obvious correlation between true entropy estimations and the number of successive XOR steps required making a biased file pass standard randomness tests. No formula is found in which true entropy estimations is a part of that correlate with this number of steps. (Table 3)

·        The bias of the XOR extracts can be fully explained by the Davies and Hisakado et al. models. You can even verify the Chi-bits found by ENT by calculating it from the bias of the models. (Table 4)

·        Negative correlation of adjacent bits requires less successive XOR steps for a file to become random than a corresponding positive correlation. (Graph 1)

·        XOR extract file can pass standard randomness tests.

·        XOR operations will never corrupt a random file.

 

-o0o-

 

Notes & References:

1) Winer M., van der Galiën J.G. ‘Randomics: Are the prime numbers randomly distributed? (Part 3)’ SATOCONOR.COM 5.6. (2006)

http://www.satoconor.com

2) Anonymous ‘Help File Compression Test’ (2006)

http://www.maximumcompression.com/data/hlp.php

3) Van der Galiën J.G. "State-of-the-art compressors as tools for true entropy estimations," SATOCONOR.COM 4.4. (2005)

http://home.versatel.nl/galien8

4) Davies R.B. ‘Exclusive OR (XOR) and hardware random number generators’

http://www.robertnz.net/pdf/xor2.pdf

5) Hisakado M., K. Kitsukawa K., Mori S. ‘Correlated binomial models and correlation structures’ ArXif.org Physics (2006)

http://arxiv.org/abs/physics/0605189

6) Mahoney M. ‘The paq data compression programs’

http://cs.fit.edu/~mmahoney/compression/

7) Gailly J-L. ‘The gzip homepage’

http://www.gzip.org/

8) Walker J. ‘Ent: A pseudo random sequence test program’

http://www.fourmilab.ch/random/

9) Knuth, D.E 'The art of computer programming, Volume 2 / Seminumerical algorithms' Reading MA: Addison-Wesley (1969)

10) Van der Galiën J.G. ‘Rabenzix v3.0 beta’ Scientia Araneae Totius Orbis 5.4. (2006)

http://www.satoconor.com

11) Anonymous ‘DIEHARD battery of tests of randomness v0.2 beta’ http://www.cs.hku.hk/~diehard/

12) NIST 'Random Number Generation and Testing'

http://csrc.nist.gov/rng/

13) Anonymous ‘Fresh rpms: Red hat linux mirror sites’

http://freshrpms.net/mirrors/redhat/7.3.html

14) Tommila M. ‘ApFloat Home Page’ (I downloaded APTESTC5.ZIP)

http://www.apfloat.org/

15) Google Groups sci.math ‘Prime randomness’ (2006)

http://groups.google.nl/group/sci.math/browse_thread/thread/d4ccb781b83b211d/1b4aa7184a8f2c46?lnk=st&q=&rnum=1&hl=nl#1b4aa7184a8f2c46

16) Anonymous ‘Mersenne twister: A study on random number generators’

http://student.vub.ac.be/~nkaraogl/mt/mt.html

17) Van der Galiën J.G. ‘Collatz Randomics’ SATOCONOR.COM 5.7. (2006)

http://www.satoconor.com