-.- |-

   FlashHash & BackHash for all-efficient multiple recycled hashes
   e.g. for dispersed virtual hashing in Bloom filters (tutorial + math)

        CopyRight (C) 2002 + 2004 , Jan Hajek , NL

NO part of this document may be published, implemented, programmed, copied
   or communicated by any means without an explicit & FULL REFerence to this
author together with the FULL title and the website  www.humintel.com/hajek
for the freshest version, or www.matheory.info , plus the CopyRight note in
texts and in ALL references to this. An implicit, incomplete, indirect,
disconnected or unlinked reference (in your text and/or on www) does NOT
suffice. All based on 1st-hand experience.  ALL rights reserved.

Version 1.23a , 2004-9-9 , lines of < 79+CrLf chars in plain ASCII for
easy view, unlikely to be updated soon, submitted to the webmaster of
    http://www.humintel.com/hajek   ( = the freshest version ).
This epaper may read better (more of left margin, PgDn/Up) outside your email.
This epaper has new facilities for fast finding and browsing. Please save the
last copy and use a file differencer to see only where the versions differ :
download Visual Compare VC154.ZIP and run  VCOMP vers1 vers2 /k /i  which is
the best & the brightest colorful line-by-line comparer for plain .txt files.
Your comments are welcome, preferably interlaced into this .txt lines-file.

New terms: FlashHash , BackHash , recycled randomness , rotated/recycled hash
           combinatorial implosion , combinatorially imploded , hashcrosstalk
paired hashes , hashpair , dispersed hasher , dispersed virtual hashing ,
distributed hashing , distributed virtual hasher , monolithic Bloom filter ,
Bloom filter of the 1st 2nd 3rd kind , virtually stored , bitsharing factor ,
cumulated false positives , reproducible acyclic pseudo-entropy ,
imported/injected pseudo-entropy , hashclash , clashhash ,
and ALL inflections of ALL these new terms are all also subject to my
CopyRight conditions above.  ALL based on experience.  ALL rights reserved.

Browsers may like to search here for the following markers :
 !!! !! ! ?? ? { comments }  [ refs ]
-.- separates sections  |- ------- framed stuff

+Contents:

+Executive summary = the beef at fiesta entropicana
+Mottos
+Basic & new terms, abbreviations & symbols
+Bloom filters (BFs) = a tutorial + a quantitative design study
+A comedy of errors = compact virtual hashing CH or HC
  vs. distributed / dispersed virtual hashing DH or BF
+The proof of the pudding = Tab. 3
+FlashHashing & BackHashing = (m)eat the beef again
+More explanations and hints for programmers
+Bitwise rotations aka circular shifts aka barrelshifts
+On hashing functions
+Summary of flashhashing
+From mission impossible via combinatorial implosion
   to mission accomplished
+References


-.- +Executive summary = the beef at fiesta entropicana :

FlashHash can be easily understood in intuitive terms thus:  An item, state,
or a string-of-bits S is laboriously and irreversibly reduced by a HashFun(S)
to a number called hash  h0  viewed as a string of 0's and 1's 0011...01011
(or call them pearls and beads :-).  By repeatedly closing the string  h0
into a necklace, rotating it by few bits, and reopening or cutting it,  k
different open-string designs are easily produced. If the original  h0  is
remembered i.e. saved/stacked in a store, then all  k  strings can be almost
effortlessly reproduced or regained from  h0  by BackHash.  The net effect
is that  k  sufficiently stochastically independent hashes are obtained at
almost no expense from a single call of a HashFun(S).  This metaphor was
inspired by recalling the poems by Hermann Hesse (1877-1962) near the end
of his 1946 Nobel Prize winning novel Das Glasperlenspiel, or The Glass
Bead Game, aka Magister Ludi (= The Master of the Game), which some folks
see as a precursor of computing culture, algorYthmics, and www .
But I designed also non-rotated versions based on "injected pseudo-entropy".
Doubting Thomases may now like to find 22 occurrences of "independ" here.

This new class of FlashHash algorithms is all-efficient i.e. extremely fast,
space saving (all [re]generations are done by blitzy incremental bitshifting
fewliners), short and simple, hence easy to do for both (wo)men and
(wo)machines when generating  k > 1  hashes necessary e.g. for well designed
& well working Bloom filters.  FlashHash was created especially for, but
not only for, CPU-bound apps like e.g. Gerard Holzmann's (Bell Labs) SPIN's
bitstate model checking.  Dispersed virtual hashers ( DH ) like Bloom filters
( BF ) are explained first and compared with compact virtual hashers CH, all
without a full collision resolution which would require full storage of all
distinct  S's  during hasher's life-cycle. The main purpose of a virtual
hasher is to avoid storage of full  S's.  Both dynamic and static design
formulas for (non)optimized Bloom filters are given (find also "IT & TI"):

  f(n) = (1 - (1 - 1/M)^(k*n))^k  is the instantaneous probability of a
                  false positive AFTER the  n-th item was "stored" in a
      BF[0..M-1] of bits;  f(n) may be called a false+rate or a failrate.
  f(n) is the "general dynamics" formula for BFs.

  fo = 1/2^k  and  k*n = M*ln(2)  or  k*n = M*2/3   are for a BF filled
          optimally wrt memory utilization (useful in quasistatic apps
      when the final  n  can be estimated or fixed beforehand). At such
      a load,  f  has dropped exponentially with  k  increased linearly.
  fo  is the "optimal statics" formula for BFs.

  Fn =      Sum_i=1..n( f(i) ) is the cumulated number of false positives
                              (for large  n use compiled code on a fast PC);
     = Integral_i=1..n[ f(i) ] for fast numerically evaluated approximation,
                             which is very good for very large  n .

  Fn < n*f(n) ;      display Fn as a float, as Fn < 1 or Fn > 1 may result.

  Fn is directly verifiable or falsifiable empirically (see below) in easy
        experiments based on simple I/O through a BF and counting the I-O's:

  Fn = (number of unique items in) - (number of unique items out)
     =  number of unique items lost.    (unique = distinct = no duplicate).

  Pnoloss =     Prod_i=1..n( 1 - f(i))  < 1  probability of NO loss at all;
                                         if too near to 1 then better reads:
    Ploss = 1 - Prod_i=1..n( 1 - f(i))  < 1
          = 1 - probability of NO loss of any item  or state
          =     probability of ONE OR MORE    items or states lost
          =     probability of AT LEAST ONE   item  or state  lost
          = 1 - exp( Integral_i=1..n[ ln(1 - f(i)) ] ) for fast numerically
                evaluated approximation which is very good for very large n.
          (all AS IF during insertions of  n  unique i.e. distinct items)

  Ploss is a computable theoretical number which (unlike  Fn ) is neither
        directly verifiable nor falsifiable in the spirit of Karl Popper's
        philosophy of science. Hence  Ploss  is a quasi-Platonic quantity.
  Ploss is introduced only for comparisons with other work (find Pnc , Pno ).


Lets start with a 2-liner version in edu-C of FlashHash for monolithic BF1
with hashcrosstalk, which is the simplest version. Put overflow checking off
especially if you have to use signed integer  h0  in 2-complement arithmetic
which has asymmetric subranges of absolute values for integers < 0 and > 0 .
Choose your moduli  M1, M  well; they should either be prime numbers, or if
M1, M are powers of 2, use  M-1, M1-1 as masks i.e. bitwise AND-ed with  h0.
When trying out FlashHash, do not optimize prematuraly; first try with the
% M operator with M prime, then with  & Mmin1 [i.e.  & (M-1) ], compare and
check. Start with a simple version. Don't be pound foolish and penny wise :-)

1st version of FlashHash for a monolithic BF1 employs rotation of bits in h0:

  d = w - c;  h0 = HashFun(S);  h = h0 % M;    /* use  h  or h[1] = h;  */
  for j = 2 to k do begin h0 = (h0 << c) | (h0 >> d);  h = h0 % M;  end;


2nd version of FlashHash for a monolithic BF1.  Unlike the 1st version, the
    if j > 1 then ...  allows concentration at one place of an application
    dependent blitzy additional in-line code following each  h = h0 % M;
    typically the in-line code will test/check or set bits in a BF[0..M-1].
    Like the 1st version, this 2nd one also saves precious bits from being
    skipped and wasted, although a  c  prime wrt  w  would finally visit all
    w  bits in  h0 . Wasting means that the  c  distance between bits will
    have to be smaller for the given  w  and  k , if k <= 1 + (w-1) div c
    is wanted.  Smaller distance  c  makes hashes less independent.

  d = w - c;  h0 = HashFun(S);
  for j = 1 to k do
  begin
    if j > 1 then /* may take this line out, but will waste the 1st c bits */
        h0 = (h0 << c) | (h0 >> d); /* bits rotated by c bits up ie to msb */
    h = h0 % M;        /*  Here test/set h-th bit BF1[h], or do h[j] = h;  */
  end;

3rd version of FlashHash for monolithic BF1 (I call it Fiesta enTropicana):

  /* Initially fill an array  ENTROPIC[1..kmax] of integers  with arbitrary,
     preferably huge integers typed e.g. by Eddington's monkeys. These huge
     integers must remain CONSTant during the whole lifecycle of a BF as an
    "entropic database" ( which, btw, cannot be read by any 3-letter word
     agency like, say, KGA [ find KGA ] ). In BPascal I just declared &
     banged in these "pseudo-entropic numbers" :
                     here kmax = 11
     Const ENTROPIC : array [ 1..11 ] of LongInt = ( 194758204,
                    293745367, 389486698, 406837398, 585769263, 669402120,
                    734068134, 836141173, 928486384, 035590389, 107805073 );
     Note that these are NOT numbers from a random generator, because
          it is desirable to obtain entropic bitstrings by means of blitzy
      xor which flips bits over the whole width of a LongInt h0 , hence
          we need huge integers within the range of  LongInt h0 . Quiz: why
          my entropic numbers have their first digits 1, 2, 3, .. etc ?
  */
  h0 = HashFun(S); /* Use this version if  c too low, e.g. if  k > w/3  */
  for j = 1 to k do                        /*   use  h  or do h[j] = h; */
  begin
    if j > 1 then /* ..PIC[1] above allows to take THIS one line out    */
      h0 = h0 xor ENTROPIC[j];   h = h0 % M;    /* put inline use of  h */
  end;


4th version of FlashHash for a monolithic BF1 is just another implementation
    of the 3rd version (also with IMPORTED/INJECTED pseudo-ENTROPY ):

  h0 = HashFun(S);
  for j = 1 to k do  /* Use this version if  k > w/3  ie if  c too low */
  begin         /* if j > 1 then... is not needed anymore, but allowed */
                /*  xor with  k  distinct, arbitrary, huge numbers but */
    case j of   /*       all  k  nrs within the range of integer  h0   */
    begin       /*      These nrs may be typed by Eddington's monkeys  */
         1: h0 = h0 xor 132950380;
         2: h0 = h0 xor 293745367;    3: h0 = h0 xor 389486698;
         4: h0 = h0 xor 406815298;    5: h0 = h0 xor 585021763;
         6: h0 = h0 xor 669402120;    7: h0 = h0 xor 734481234;
         8: h0 = h0 xor 836141173;  /* ... etc up to  kmax ever needed */
      else: h0 = h0 xor 007805073;  /* Correctness insurance           */
    end case;
    h = h0 % M;  /*  or:  h = offset + (h0 % M1) for BF3 or BF2        */
                 /*  Use  h  or do h[j] = h;   i.e. your in-line code  */
  end for;


5th version of FlashHash for a monolithic BF1 combines the 2nd & 3rd version
    so that higher number of hashes  k  are easily obtained without  c  too
    low and without too many entropic numbers typed by Eddington's monkeys:
  /*    c  prime is recommended as it will visit all  w  bits wrap around  */
  c  = 13;   d = w - c;  /* may now be constants eg 13 and 19 = 32 - 13    */
  h0 = HashFun(S);  i = 0;
  for j = 1 to k do  /* no if j > 1 then.., as we have enough entropy now  */
  begin         /* declare and fill ENTROPIC[1..kmax]; may recycle % Emax  */
    if (j & 1)= 0 then h0 =  h0 xor ENTROPIC[++i] /* or: PIC[(++i) % Emax] */
                  else h0 = (h0 << c) | (h0 >> d);
    h = h0 % M;   /* Here inline test/set h-th bit BF1[h], or do h[j] = h; */
  end;

The 5 versions above are readily adopted to my favorite BF3 versions below.

  |--- No overkill & no overhead with blitzy FlashHash & BackHash :
  !
  !  - No more than one call by FlashHash of a HashFun(S) per item, or
  !       string, or state S is needed. The received wisdom was  k  calls.
  !  - No other call on any other function is needed, not even on a slooow
  !       pseudorandom generator, also not desirable wrt XOR above.
  !  - No slooow floating point arithmetic needed.
  !  - No slooow multiplication needed, not even in integer arithmetic!
  !  - No division or the mod-op  % is needed iff M or M1 is a power of 2.
  !
  !  Making  k  calls of a HashFun(S) looks now hopelessly inefficient.
  !  Even    k  calls of a Random( h0 ) would be  hopelessly inefficient,
  !             simply because  k  is a small number, at most few dozens
  !  (because we want use our bits well), hence we do not need a computa-
  !  tionally expensive pseudorandom sequence of numbers with a long period.
  !  Non-rotational versions of FH use a small set of entropic numbers.
  !  Non-rotational and rotational versions were combined into a HYBRID
  !  version nr. 5 which will RECYCLE (via % Emax) the INJECTED ENTROPY if
  !      h0 = h0 xor ENTROPIC[ (++i) % Emax]  where Emax < k  is allowed.
  !
  !  By now even entrenched fundamentalists will have to admit that their
  !  traditional computations of  k  separate HashFuns per S are about as
  !  efficient as shooting at flies (or bits :-) with a nuclear howitzer,
  !  at least in the context of BFs and other non-Manichean (find below)
  !  apps i.e. where no enemies or adversaries are expected (find KGA ).
  !
  |------------

  The latter versions (re)produce REPRODUCIBLE ACYCLIC PSEUDO-ENTROPY,
  which may become useful when we want
  larger  k , so that  c  would become too small and bit-patterns less
  independent when  c < 3 or 4. In fact any reproducible set or sequence of
  remappings of  h0  will do, provided they will be sufficiently pseudo-
  random, pseudo-chaotic, deterministically chaotic i.e. pseudo-entropic
  (ideally MaxEntropic) w.r.t. the Bloom filter BF[0..M-1]. Perfectionists'
  unattainable Platonic ideal of  k  stochastically independent hashing
  functions laboriously recomputing from  S  uniformly distributed hash-
  values is now not unlike the monuments of the Ideal Society Leaders (read
  Plato's Republic). Where are those Ideal Ideologists and their monuments
  now ? On graveyards together with their victims: 100000000 = 100 MegaDead
  murdered by the Kommies!  That's the result of an ideology applied to
  people. An ideology in computer science does not waste lives, just cycles.
  But waiting in a queue for food at a store in a Kommie kantry was also
  wasting life-cycles of people who were there buried alive behind the
  barbed wire, watch towers manned by the subservient opportunists trying
  hard to obtain favors from their Ideal Leaders. Now with FlashHash you do
  not have to waste your life-cycles anymore when waiting for a result of,
  say, a model checker like SPIN, and you do not have to rerun it and wait
  again and again as you can get now 1000 times higher quality of a result
  in a single run at no extra cost of your boringly repeated sequential
  re-running, re-waiting, re-staring at crt i.e. your re-wasting your
  limited biocycles.


FlashHash in edu-C for monolithic BF1 with hashcrosstalk :
/* 2nd version with comments : 2nd version allows concentration of in-line
                                   code following the  h = h0 % M;
/*     M = k*M1;    for comparisons between BF1 and BF3 or BF2         */
/* Mmin1 = M - 1;   for faster  %  iff  M  is a power of 2 ; see below */
   d = w - c;  /* w = nr bits in h0; use max c < w/2 for k <= 1 +(w-1)/c */
  h0 = HashFun(S);    /* Store THIS h0 now iff needed                  */
  for j = 1 to k do   /* generates  k  hashes per item or state S :    */
  begin /* max odd or prime c <= (max c allowed) will be the best  c   */
          if j > 1 then  h0 = (h0 << c) | (h0 >> d); /* rotate bits    */
          h = h0 % M ;      /* put HERE inline code using  h           */
   /* or: h = h0 & Mmin1;    this is faster; use it iff  M  power of 2 */
  end; /* h = might be h[j] = ... ; use  h  before and in the loop     */


FlashHash in edu-C for BFs without hashcrosstalk i.e. for BF3 and BF2 :
  /*  FlashHashing is likely to work better on partitioned / segmented
      BFs. FlashHashing can work almost as well as  k  independently
      computed  HashFun(S). "Works" means "achieves low Fn", see Tab. 3.
      Huge "addressing bitspace"  M = k*M1 bits may need larger  w  in
      bits, which also allows larger distance  c  bits i.e. independence.
  */
  M = k*M1;  M1min1 = M1 - 1;     /* or: M1 = M div k  i.e. int/int */
  d = w - c; /* w = nr bits/word; use max c < w/2 for k <= 1 +(w-1)/c  */
  offset = -M1;   h0 = HashFun(S);  /* Store THIS h0 now iff needed */
  for j = 1 to k do   /* generates  k  hashes per item or state S : */
  begin /* max odd or prime c <= (max c allowed) may be the best c  */
          if j > 1 then  h0 = (h0 << c) | (h0 >> d); /* rotate bits */
          offset += M1;             /*     For j = 1 is offset = 0  */
          h = offset + (h0 % M1);   /* Put HERE inline code using h */
   /* or: h = offset + (h0 & M1min1), iff M1 power of 2 , is faster */
  end; /* h = might be h[j] = ... ; Use  h  within loop or store it */

BackHash is always like FlashHash without  h0 = HashFun(S)  but with
        h0 = FetchFromStore; (e.g. from a stack) in its place. For easy
maintenance BackHash can be integrated with FlashHash by means of a
parametrized condition :

   if wantFH then h0 = HashFun(S) else h0 = FetchFromStore;

BackHash regenerates from one hash  h0  by bitshifts all  k  hashes per S.
As long as  h0's are stored & available e.g. on a stack, only one  h0  per
each  S, no  HashFun(S) needs to be recomputed for the  S's done so far.

HashFun(S) may be your favorite, or this Bytewise 1-liner (or better):

  h0 = 1;  for ix = loB to upB do  h0 = (h0 << 8) + h0 + S[ ix ];

The logically partitioned BF3[0..M-1] of bits, with M = k*M1, is my favorite
  for the following reasons :
- The declaration & access is as simple as for a monolithic BF1[0..M-1], yet:
- There is no direct hashcrosstalk between pairs of distinct states S1, S2.
  Alas, there is an indirect hashcrosstalk among 3-tuples of states.
  Combinatorial implosion works via the (k! - 1) nonmatching permutations as
  at least an insurance or last ditch defense against direct hashcrosstalk.
- Smaller modulus  M1  will distribute more uniformly than M = k*M1.
- The   h = h0 % M1  uses less bits from  h0  than the  h = h0 % M  does.
- If we wish to use a prime modulus  M1 , then there is no need for finding
  primes  M =~ k*M1 for a BF1. With a single prime M1 we get k*M1 partitions,
  each with a prime number of bits. This means easier and less error prone
  changes during tests and maintenance.

Many technical details are pointed out, quantitative evaluations and
comparisons are tabulated, and ready to use algorithmic auxiliaries are
worked out including "hashpairs" i.e. paired hashes as if from a 64 bit
machine, probably better than a single 64 bit hash. Clearly, the bigger
the  M , the more bits  w >> c  in a hash will be better for lower  Fn.
This I call the "reversed/inverted stochastic pigeonhole principle".
Hasty cognoscenti or unbelieving connoisseurs may now jump to my term
"hashcrosstalk" and to the "k!"  permutations argumentation for logically
segmented "BF3" of the 3rd kind. For quality-performance numbers find
"Tab.", "Fn" and "w = 64" without quotes. The proof of the pudding is in
"Tab. 3". The cardinals of fundamental computing science are invited to
look through the telescope (translation: run FlashHash before you judge).


-.- +Mottos :

Simple, few parts, easy to maintain.
       [ Gen. Chuck Yeager, USAF, tested Bell X-1;  The Right Stuff ]

"The idea is crazy, but not crazy enough."
    [ Niels Bohr (1885-1962), 1922 Nobel Prize for physics ]

"When a distinguished but elderly scientist (JH: over 30 in physics :-)
 states that something is possible, he is probably right.
 When he states that something is impossible, he is very probably wrong."
"The only way to discover the limits of the possible
 is to go beyond them to the impossible."
"Any sufficiently advanced technology is indistinguishable from magic."
  [ 3 laws of Arthur C. Clarke; invented a geostationary satellite in 1945 ]

- Expect nothing short of a miracle, as only where miracles are expected,
         miracles are common.
- Never say never.
- Don't say it with flowers. Do IT with Bloom filters!
- Bloom Bit said: Just turn me On.


-.- +Basic & new terms, abbreviations & symbols (henceforth in edu-C+BP/Ada) :

  bag    a multiset (often implemented with hopefully small counters).
  item   a chunk of data, e.g. a data vector, record, state i.e. a string S .
         S  may be a state vector of a finite state machine,  or a state  S
         in a model checker or a protocol verifier using a combinatorially
         explosive reachability analysis to systematically generate and check
         as many reachable states  S  as feasible (ideally all S's) like e.g.
         started by [Hajek 1977, 1978a, 1978b, 1979], [Holzmann 1988, 1990].
  hashfun = a short word for a hashing function  HashFun(S) which reduces an
         item i.e. a string-of-bytes-ie-of-bits  S  to a number called hash.
         Hashing is an irreversible reduction stronger (= produces less bits)
         than a reversible compression. The original  S  cannot be recovered
         from its hash by an inverse operation like a decompression would be.
         An ideal, perfect hashfun would provide one-to-one mapping of unique
         S's into unique i.e. distinct numbers called hashes.  Alas ...... :
  hashclash = clashhash = my new words for a hashing collision when two or
         more hashes clash i.e. become hashing synonyma i.e. become equal
         while they ideally should differ. Hashclashing is an undesirable but
         generally unavoidable many-to-one mapping. Only for particular,
         small sets of fixed i.e. a priori known items (typically the
         reserved words in a programming language) a perfect hashing i.e.
         one-to-one mapping may be possible.
  false positive = an error of the Fn-kind:  let after  n  trials or inserts
         Fn  be called "cumulated false positives". Then (# means number of):
         Fn = # wanted (to get or to know) misidentified as unwanted
            = # forecasts which did not come out { who wants non-forecasts? }
            = # irrelevant documents retrieved    { who wants irrelevantia? }
            = # healthy persons diagnosed as ill    { Docs see ill as +     }
            = # foes misidentified as friends { if you want to id foes }, or
            = # friends misjudged as foes     { if you want to id friends   }
            = # new i.e. unseen states or strings  S  misidentified as old
                i.e. as if already present i.e. as if "stored" in a BF, so
                that consequently they will be lost because discarded as if
                (pseudo)duplicates.

  Bloom filter (BF) = a dispersed virtual hasher for probabilistic
         set membership testing i.e. for checking if some  S  almost
         certainly is/was present or seen before (deja vu). The items or
         states  S  are not actually stored in a BF;  a new  S  sets
       k bits in a BF i.e. it bitmarks a BF-bitmap to look AS IF stored,
         hence the items are "stored" only VIRTUALLY. A new  S  sets
       k bits, but some of them may be set already, hence insertion of
       n distinct  S's into a BF sets less than  k*n  distinct bits,
         so that bits become shared i.e. "overloaded". Filtering refers
         to BF's discrimination capability to filter out duplicates.
         Unlike the monolithic BF1, slotted i.e. segmented BF2 or BF3 are
         partitioned either "physically" or logically and therefore do
         not suffer from hashcrosstalk like BF1 does.
         Unlike compact virtual hashers which keep their bits together in
         adjacent positions, dispersed virtual hashers distribute their bits
         over the whole monolithic BF1[0..M-1] or over its segments of M1
         bits in logically partitioned BF3[0..M-1] or in BF2.
   Note0: a  BF[0..M-1] of bits  must be initialized to all bits = 0.
         Of course we could also initialize all bits to 1 and do resetting
         of bits to  0  instead of setting bits to  1 .
   Note1: ideally the  k  bitindexes per item, string, or state  S  should
         be all distinct, but it is not a must. If they are distinct,
         less gamble will take place at the cost of some programming and
         runtime overhead spent on (usually superfluous) checking that
         they indeed are distinct  k  bitindexes per S. By distinct per S
         I do not mean all  k*n  bitindexes being distinct (which would
         be the unattainable Platonic ideal of a perfect Ideal Hasher
         playing in the same league as Maxwell's demon, Laplace's
         omniscient demon, Ideal Observer, or Carnap & Bar-Hillel's
         Ideal Receiver [Kahre 2002]. Moreover even an Ideal Being cannot
         violate the pigeonhole principle (from combinatorics) in its
         elementary form, i.e. (s)he cannot put 1001 distinct pigeons into
         M = 1000 pigeonholes without at least one pair of pigeons sharing
         a hole. The same applies to setting  M+1  bits in a BF[0..M-1], but:
  Note2: Due to bitsharing,  k  hashfuns applied to  n  distinct states will
         always set less than  k*n  bits in a well filled BF.
         In fact on one hand bitsharing saves precious memory, but on the
         other hand increases the chances of faults of the false positive
         type only which will lead to some losses (omissions, collisions).
         See the "optimal statics" formula for the  fo . Hence :
       - pessimists see negatively sounding clashes :-(   , while
       - optimists  see positively sounding bitsavings due to bitsharing :-)
  Note3: not even an Ideal Being can individually address more than (i.e. 
         separately each of all), say, 2^32 bits with a single hash  h0
         which is an unsigned integer of 32 bits. Therefore BFs with huge
      M  may need two hashes h0 and h00 forming a hashpair of a width
      w = 64 bits (find "w= 64").
         This is my "reversed/inverted pigeonhole principle", as trivial
         as the pigeonhole principle (from combinatorics) in its elementary
         form.

  hashcrosstalk = my new term for the nowhere even mentioned hashclashes of
      the 2nd/3rd kind.  There are 3 kinds of clashes between S1, S2 in BFs :
      Explained in shorthand (here S2 and S1 are distinct items or states)  :
       If HashFunA(S1) =  HashFunA(S2) then hashclash of the 1st kind;
       If HashFunA(S1) =  HashFunB(S2) then hashclash of the 2nd kind in BF1;
       If HashFunA(S1) =  HashFunB(S1) then hashclash of the 3rd kind in BF1;
       If HashFunA(S1) <> HashFunA(S1) then irreproducible nondeterminism due
                                   to a bug (e.g. an uninitialized variable).
       There are no other possibilities to be considered when only a single
       pair of  S1, S2  is considered in splendid isolation. Alas, as Dr.
       Dragan Bosnacki has pointed out, a 3rd state  S3  may be lost due to a
       hashcrosstalk via S1, S2 over the (logical) boundaries of the segments
       of partitioned BFs i.e. BF2 and BF3.  Quiz: how ? Hint: create a case.
       Hence "BFs without hashcrosstalk" holds only for pairs of states in
       splendid isolation from the third state  S3 . Therefore a partitioning
       of BFs has more limited, yet non-negligible practical consequences
       (find "favorite" here).

      Clashes between splendidly isolated  S1, S2  explained in longhand :
       - Clashes of the 1st kind (in BF1, BF2, BF3) :  distinct states
          S1, S2 to which the exactly same hashfun is applied may produce
                 the same hash-address (e.g. due to the % M1 operation);
       - Clashes of the 2nd kind (in BF1 only) :  different hashfuns on
          S1, S2 may produce the same hash-address. This cannot happen in
                BF2 or BF3 which are partitioned, hence different hashfuns
                 work only their own strictly separate parts of BF[0..M-1].
       - Clashes of the 3rd kind (in BF1 only) :  k  different hashfuns on
          S1 only may produce less then  k  different hashes. This can
             happen only on BF1 with a poor hashfun, or too small M .
       Note: by hashcrosstalk I will mean that of the 2nd or 3rd kind in BF1,
             unless specified otherwise.

  FlashHash is a multihashing function employing a single HashFun(S) to
            produce a single hash  h0  from which all  k  hashes are obtained
            i.e derived (e.g. for a BF by e.g. stepwise rotations of hashbits
            or by injecting entropy ).
  BackHash  is like FlashHash, but without any hashfun.  BH recovers hashes
            by bitwise rotations from the stored 0-th flashhash h0.
  
  msb       the most significant bit (in a number called hash here).
  {   }     embrace comments like  /*  */  in the C programming language.
    ^       raises to power neither in C nor in BPascal.
   2^32 = 4294967296;  2^24 = 16777216;  2^20 = 1048576;  2^16 = 65536;
   2^31 = 2147483648 = MaxLongInt + 1 in BPascal  ( = a Mersenne prime + 1 )
                 = abs(MinLongInt) in 2-complement arithmetic will overflow
  ln(2) = 0.6931 is e-based Napier's logarithmus naturalis e.g. log in C
  :=        assignment operator in ADA, BP; i.e.   =  in C  mixes with math
  =         equality operator as in math    i.e.  ==  in C
  <>        not equal relational operator   i.e.  !=  in C
  %         modulus operator                i.e.  mod in BP aka BPascal
  div       integer division operator       i.e.  /   in C  is "overloaded"
  |         bitwise inclusive OR operator   i.e.  OR  in BP is "overloaded"
  &         bitwise          AND operator   i.e. AND  in BP is "overloaded"
  xor       bitwise          XOR operator   i.e.  ^   in C  mixes with math
  <<        bitwise leftshift  operator     i.e. shL in BPascal on integers
  >>        bitwise rightshift operator     i.e. shR in BPascal on integers
  Note: the % and << are also used a couple of times each in their usual
        mathematical meanings of "percent" and "much smaller" respectively.


-.- +Bloom filters (BFs) = a tutorial + a quantitative study :

BFs typically work as approximate set/bag membership testers/checkers.
Membership testing means checking the current/past presence/occurrence,
or its opposite i.e. absence/nonoccurrence by testing/checking  k  bits.

BFs employ probabilistic bitmapping technique of multiple hashes per
item (e.g. a state S) for marking/setting and testing/checking multiple
bits scattered (with semantic overload or overlap) over a bitmap i.e.
in one or more bitarrays.  The three versions of BFs are:

Bloom filter of the 1st kind (BF1) is monolithic:
BF1 is a single bitmap  BF1[0..M-1] of bits, where all  k  hashfuns
    per each state or string  S  operate i.e. pseudorandomly set and
    test/check  k  bits per   S over the whole range of a bitmap:

        h2   h1  - - -  hk   h3    all k bits set by one S
        v    v          v    v
    01111001010001....011110010001100100001
      ^    ^            ^    ^
      h3   h1    - - -  h2   hk    all k bits set by another S

Bloom filter of the 2nd kind (BF2) is partitioned / segmented "physically":
BF2 consists of  k  smaller bitmaps, each BF2[0..M1-1] of bits, so that
    k*M1 = M bits,  each reserved for only one j-th hashfun:

    0110010100101   M1 bits used by hashfun nr j = 1 only
    0010010000100   M1 bits used by hashfun nr j = 2 only
    - - - - - - - - - - - -
    0100010100100   M1 bits used by hashfun nr j = k only

    BF2 is handy when e.g. a compiler does not allow to declare,
    or OS to allocate, a single huge hashtable as big as needed.

Bloom filter of the 3rd kind (BF3) is partitioned / segmented logically:
BF3 is BF2 in a single bitarray  BF3[0..M-1] of bits where each hashfun
    works its own reserved subrange of  M1  bits so that  k*M1 = M bits.

    |.........|.........|..........|.........|

    For BF3 the j-th hash is then finalized thus :

    Offset is the absolute bitindex of the 0-th bit in a segment of BF3
           as-if BH2[0..M1-1]; the 1st segment (for j=1) has offset = 0;
           2nd offset = M1, etc, i.e. :
    offset = M1*(j - 1);    { in loops reduce * to repeated + M1 }
    h  = offset + ( h % M1 );            /* in C      */
    or faster iff M1 is a power of 2, and M1min1 = M1 - 1 :
    h  = offset + (    h   &  M1min1);   /* in C      */
    h := offset + (abs(h) AND M1min1);   {  in BPascal }

    offset := M1*LongInt(j - 1);     { safely typecast in BPascal }
    h := offset + (abs(h) mod M1);   { safe for any M1 in BPascal }

    The abs(h) must be used and overflow checking must be off if a
    signed integer  h  is in 2-complement arithmetic (with asymmetric
    subranges). An overflow of the offset  M1*(j-1)  must be avoided
    even if the overflow checking, if any, is off. This bad bug did
    hit me when my  M  was decreased!  Quiz: How come that smaller
    numbers could cause an overflow, but larger did not (in BPascal)?
    Although I do not make many bugs anymore, or see them instantly,
    it took me hours to kill this badbug.  Hint: M1 was a constant, j
    was a byte. Despite of such badbugs, BP was a love at dz 1st byte.

In all three BF-schemes the same bit can be shared i.e. set/tested by
hashfuns generated from different items or states. However unlike in BF1,
there is no "hashcrosstalk" in slotted i.e. partitioned i.e. k-segmented
Bloom filters like  BF2 and BF3, where a j-th  hashfun cannot share a bit
with an  i-th  hashfun if i <> j.  Hence BF2 and BF3 are much less
semantically bit-mixed than a monolithic BF1. This will turn out to be the
conditio sine qua non for quality flashhashing.

BFs are resilient (wrt external errors) due to the dispersion of bits
(not unlike holograms) belonging to each particular item. The bits' overlap
i.e. bitsharing i.e. semantic overloading of individual bits shared by
different hashfuns (originated from different hashed items S) makes BFs
memory-efficient but also (internally or inherently) error-prone like any
imperfect hashing scheme without a collision resolution backup. However
the false positive rate is calculable hence under math control at any
moment, and can be updated and reported in-flight.

BFs are dispersed bithashers without collision resolution, hence with
calculable risks of errors of the false positive kind only. Approximate
set membership testing and multiset/bag membership checking is an
operation classically used by spelling checkers without a fully stored
dictionary, or by model checking programs like SPIN without explicitly
storing whole generated states of, say, computational communicating
processes [Holzmann 1988, 1990, Bell Labs] who calls Bloom filter a
bitstate hashing. Nowadays model checking means and does what automated
verification or validation of protocols (e.g. for networks, communication,
security, passwords, etc) used to mean and to do earlier [Hajek 1977, 1978a,
1978b, 1979].

BFs fall into the category of VIRTUAL HASHing algorithms [Morris 1968],
[Murray 1970, p. 180-183], [Bloom 1970, p. 423, Methods 1 & 2] which form
a subclass of hashing algorithms. Virtual hashing without any collision
resolution was invented and loosely described at the end of the classical
paper by [Morris 1968, Bell Labs], where on pp. 42-43 he used the terms
virtual scatter, and also virtual hash 3 times on p. 43. Virtual scatter
or hashing has been mathematically analyzed for the first time by a
computer scientist from Cornell University [Murray 1970].

BFs clearly fall, in my view, under the general paradigm of HINTing
algorithms. Note that this insight is not a commonly received wisdom
as it is hard to find, but see [Lampson & Sproul 1979, pp. 100-102, 3.6].

BFs can be viewed as algorithmic versions of ancient notched-edge cards
manual filing systems i.e. needle-card based manual documentation &
retrieval systems where card columns were at premium, so the ancients
had to be extremely inventive. Read on the method of SUPERIMPOSED CODING
[Knuth vol.3], or on SUPERIMPOSED CODES [Roberts 1979, Bell Labs],
[McIlroy 1982, p. 95, Bell Labs], also called Zatocoding or Peekaboo,
no joke!

BFs operate optimally (w.r.t. the memory use under the given load i.e.
at their capacity to store items with the required false positive rate)
under the maximum entropy condition (for BFs simply the equal chances
that a bit is On or Off on average), hence can be understood in terms
of (non)classical information theories like e.g. Shannon's classical
Mathematical Theory of Communication [Shannon 1948], and Jan Kahre's
unifying neoclassical Mathematical Theory of Information [Kahre 2002].

BFs being virtual hashers, behave virtually i.e. almost AS IF they would
be really encompassing much larger "name-space" of a set/bag with much
larger cardinality i.e. number of possible names, states, strings or items.
In this respect, like all virtualities, BFs have their philosophical patron
in the little known German philosopher of ALS OB (= AS IF) Hans Vaihinger
[Vaihinger 1923], [Kahre 2002, see his Indexes].


BFs are predictable:

In this section all explanations pertain to the BF1 scheme only, but apply
also to the BF2 and BF3 schemes (unless stated otherwise) which behave
virtually w.r.t. BF1 i.e. almost AS IF they were BF1. All formulas are
derived under the assumption of stochastic independence, as usual.
After  n  items were stored in BF1 by setting  k  bits (some may have been
1 already and will be shared) per item, the false positive rate is:

  f(n) = (1 - (1 - 1/M)^(k*n))^k  is the instantaneous probability of a
                  false positive AFTER the  n-th item was "stored" in a
      BF[0..M-1] of bits;  f(n) may be called a false+rate or a failrate.
  f(n) is the "general dynamics" formula for BFs.

For large enough  M bits is  f(n)  very well approximated by

  f(n) = (1 - exp(-k*n/M))^k ,  k*n < M;  optimal k*n = M*ln(2) or M*2/3

From the latter formula for f(n) obtains

  fo = 1/2^k  and  k*n = M*ln(2)  or  k*n = M*2/3   for a BF filled
            optimally wrt memory utilization (useful in quasistatic apps
       when the final  n  can be estimated or fixed beforehand). At such
       a load,  f  has dropped exponentially with  k  increased linearly.
  fo is the "optimal statics" formula for BFs.

From the instantaneous  f(n)  obtains the so far cumulated number of
false positives Fn (not explained anywhere and misunderstood everywhere):

  Fn =      Sum_i=1..n( f(i) ) is the cumulated number of false positives,
     = Integral_i=1..n[ f(i) ] for fast numerically evaluated approximation
                               which is very good for very large  n .

  Fn < n*f(n) ;       display Fn as a float, as Fn < 1 or Fn > 1 may result.

  Fn is directly verifiable or falsifiable empirically (see below) in easy
        experiments based on simple I/O through a BF and counting the I-O's:

  Fn = (number of distinct items In) - (number of distinct items Out)
     =  number of distinct items lost.   (distinct = unique = no duplicates)

  Ploss = 1 - Prod_i=1..n( 1 - f(i) )  <  1 ,  check it in programs;
        = 1 - probability of NO loss of any item or state
        =     probability of  a loss of ONE OR MORE  items or states
        =     probability of  a loss of AT LEAST ONE item  or state
        = 1 - exp( Integral_i=1..n[ ln(1 - f(i)) ] ) for fast numerically
              evaluated approximation which is very good for very large  n .
        (all AS IF during insertions of  n  unique i.e. distinct items)

  Ploss is a computable theoretical number which (unlike Fn) is neither
        directly verifiable nor falsifiable in the spirit of Karl Popper's
        philosophy of science. Hence Ploss is a quasi-Platonic quantity.
  Ploss is introduced only for comparisons with earlier work by others.

Fn  and  Ploss  are as exact as only reasonably possible and cannot be
approximated by nice analytical formulas without making large errors, but
can be easily computed numerically. Fn may be approximated as an integral
and also simply counted in practice, so that this theoretical model  Fn
can be easily verified by empirical "measurement" thus:  Input & "store"
n  unique (i.e. known to be nonduplicate) items (e.g. strings) into a well
filled BF with not too large  M  and count how many items have NOT passed
thru this BFilter. Uniqueness is necessary for a simple & fast yet reliable
measurement, otherwise one might be losing all but the 1st from just few
bad-luck "muplicates" if each reinserted many times. This method was used
for creating a reliable Tab. 3. Be aware of the fact that the differences
between the theoretical  Fn  and the empirical  Fn  are due to the total
systemic properties of the "Grey Box = HashFun(S) + items S input", i.e.:

1. The quality of the hashes used, which interacts with:
2. the quality of unique items (their differences, size and redundancy).
3. The amount  of unique items (the more items the lower the variance of Fn)
Note that from nonunique items any  Fn  or  Ploss  may result :-)

The criterion of optimality for BFs is usually memory utilization when
the tolerable  f  is given i.e. specified beforehand. The approximate
f-formula (without the optimality condition) is briefly mentioned
in [Knuth, vol.3, any ed., just see Bloom]. In the next e-paper I will
derive these design formulas in several ways, including my derivation
from the very 1st principles. Here it suffices to say that an up to
4-way trade-off is involved among f, M, k, n :

   f = false positive rate (AS IF present while absent) i.e. false+rate
       (errors, omissions, collisions are not specific hence ambiguous).
   M = memory space in bits available in a single BF[0..M-1] of bits.
   n = number of items or states (virtually AS-IF-)"stored" in BF.
   k = number of hashes and bits (0 or 1) set per item;
       proportionally increases CPU-time spent on hashing, and in some
       applications like e.g. model checking with SPIN, larger  k  may
       need extra memory for additional data structures. It is this
       k-dependent CPU-time (and probably an additional space overhead)
       which FlashHash & BackHash can drastically reduce to almost zero.

A quantitative example:

     M = 400 MBytes * 8 bits/Byte = 3200 Megabits are available;
     f = 0.001  is acceptable i.e. false positive rate of a BF
                AFTER  n  items were stored is one lost item per
                          1000 "stored" (the AFTER is important!)

Q: How should we design our BFs ?  asks a programmer.
   k = ?  i.e. how to choose the number of hashfuns  k ?
   n = ?  i.e. how many unique items S can we store for the  k  chosen ?

A: You may repeatedly solve one of the above formulas numerically
   for different pairs of  k  and  n.  Or, if you believe information
   theoreticians, you just use the formula for optimally utilized BF
   i.e. operating at its "optimally filled" (though not full) capacity:

   fo = (0.5)^k = 1/2^k   where 0.5 is the probability of an average
                          bit being On (this is the true meaning), or Off,
                    because 0.5 happens to mean equal chances of 1 vs. 0).

Hence  fo  is calculable without a calc and even without a piece of paper
if you manage powers of 2:  1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024,
so that for our task at hand we have

   fo = 1/1024 = 1/2^10  < 0.001 required, hence our optimized  k  is
   k  = 10  hashes per item or state S "stored" in a BF.

The computation of the capacity of  n  items of our optimally filled
(yet far from full) BF may be obtained by solving for  n  one of the
general f-formulas far above, which yields

  k = ln(2)*M/n = 0.6931*M/n  > 0.5*M/n ;   hence
  n = ln(2)*M/k = 0.6931*M/k    hence in our case:
  n = 0.6931*3200 Megabits /(10 bits/item stored)
    = 222 MegaItems  >  0.5*3200/10 = 160 MegaItems (or Mstates in SPIN).

The > is due to the semantic overload or overlap i.e. bitsharing.
Note that  222/160 = 1.39, and  160/222 = 0.72 bits. This means that
setting 100 bits actually sets i.e. turns On only 72 bits on average.

  k*n = M*2/3  is my engineering rule of thumb with a small extra
               safety margin. This together with  fo = 1/2^k  allows
               us to design near optimally operating Bloom filters by hart.

The expected cumulated number of false positives is easily computed
numerically, e.g. for the  M = 3200 Mega, n = 80 Mega as above, we get:

  f    = (1 - (1 - 1/M)^(k*n))^k = 0.0000002804 if nr. hashfuns k = 10;
  Fn   =  2.29 states missed out of 80 MegaStates "stored" with k = 10; this
  Fn  << 22.43 = f*n = 0.0000002804*80000000 iff 80M random membership tests
                would be done AFTER 80 MegaStates were "stored" already, the
        latter being the correct interpretation of what would be an improper
        calculation of  Fn .

  Tab. 1:  M = 400 MBytes,   n = 80 MegaStates;   cumulated losses Fn :
  k     =     2    3    4   5  6  7  8     9    10    11    12    13    14
  Fn    = 64372 7743 1363 314 89 30 11 4.847 2.288 1.176 0.652 0.386 0.248
  Ploss =     1    1    1   1  1  1  1 0.992 0.899 0.692 0.479 0.320 0.215

  This table says a lot:
  - It desperately cries for  k > 2 for BFs (e.g. bitstate hashing in SPIN ).
  - It shows that  Fn  provides much more information than  Ploss  does,
    especially when more than 1 item or state is expected to be lost. To
    tell that bitstate in SPIN will lose 64372 states is about 60k times more
    informative than to say that at least one state will be lost.
  - In fact  Ploss approximates  Fn, albeit a bit roughly when losses exceed
    1 state :-) Then Ploss tells that 1 or more states will be lost, while
    Fn  quantitatively informs us how many states or items will be lost.


-.- +A comedy of errors = compact virtual hashing CH or HC
      vs. distributed / dispersed virtual hashing DH or BF :

Medical warning! This section is for healthy hearts & strong nerves only!
Lets now compare a BF with the classical virtual hasher (which does not
disperse individual bits over a bitmap) invented by [Morris 1968], analyzed
by [Murray 1970], and reinvented by [Wolper & Leroy 1993] who have called
it inventively HASHCOMPACT, not to be confused with a similar though not
identical COMPACT HASH by [Cleary 1984].

Caution!  The curves in fig. 2 in [Wolper & Leroy 1993] are "approximate"
          by the factor  17.32  i.e. are 1632 % off mark. Lets look at one
clear point there: for  n = 10^7 , k = 10 hashfuns and  M = 1 Gigabits my
numerical integrations as well as the straight summation both yield Fn ,
and straight numerical computations yield the product Ploss :

Fn    = 0.0000577302 =     expected cumulated number of lost items or states
Ploss = 0.0000577285 =     probability of ONE OR MORE  losses i.e.
                     =     probability of AT LEAST ONE loss
                     = 1 - probability of NO loss (omission, failure, fault)
                     during insertions of  n  unique i.e. distinct items.

Ploss is semantically equivalent to  1 - Pnc = 0.001 = 10^-3  proclaimed for
    the fig. 2 in [Wolper & Leroy 1993].  But here they are not equal, since

(1 - Pnc)/Ploss = 0.001/0.0000577285 = 17.32  is the factor by which their

equations and fig.2 are "approximate" wrt The Truth of Ploss & Fn.

That Ploss will approximate my new  Fn  was pointed out to me by Dr. Ulrich
Stern (ex-Stanford University), when he noticed my misinterpretation of  Pnc
and a small typo in the previous version of this paper, all after I kept
insisting for weeks on the incorrectness of the graphs in fig. 2 in
[Wolper & Leroy 1993] which has been accepted as ok for almost 10 years.
Due to the continuities and orderings of the curves in their fig. 2, the
single point here identified as 1632 %  in error makes their fig. 2 invalid.

However, Ploss  cannot approximate  Fn  in general because it always holds
         Ploss <= 1 , while  Fn > 1  is quite a common number of lost items.
Quiz: Why then is Ploss = 0.0000577 so close to Fn = 0.0000577 above ?
Ploss (always <= 1) can be computed theoretically but not easily falsified
as the Popperian philosophy of science requires, nor directly counted like
Fn > 1 which can always be enforced by decreasing  M  or by increasing  k*n.
Moreover  Ploss  contains much less information than  Fn  does (see Tab. 1).

For the same point  k = 10 ,  n = 10^7 ,  M = 10^9 bits I computed three
values which must be semantically equivalent (all under the assumption of
stochastic independence) :

    Pnc = probability of NO collision i.e. NOT a single state S lost;
    Pno = probability of NO omission  i.e. NOT a single state S lost;
  Ploss = probability of AT LEAST ONE state lost i.e. of ONE OR MORE items
          lost, which is the complement of  Pnc  as well as of  Pno .
          Since the numbers too close to 1 do not read well (we are used to
          ignore chains of 00000's but not of 99999's), complements are
          preferably computed and displayed on screen and here :

  Ploss = 0.0000577 =~  6^-5 introduced here above is the True Value,
                             unlikely to be improved upon, hence the claimed
          0.001     =  10^-3 is far from True Value for the given  k, n, M ;
1 - Pnc = 0.0009995 =~ 10^-3 from the equation (4) in [Wolper & Leroy, 1993]
                             is erroneous but consistent with their fig. 2;
1 - Pno = 0.0000909 =~ 10^-4 from the e-paper [Wolper, Stern, Leroy, Dill ]
                             is 11 times smaller then in their fig. 8 which
                         is identical with fig. 2 in [Wolper & Leroy 1993].
Their equation has improved since 1993 but the graphs have not :-) because:

(1 - Pnc)/Ploss = 17.32;   (1 - Pnc)/(1 - Pno) = 11
(1 - Pno)/Ploss =  1.58

Another clearly identifiable point in their identical figs. 2 and 8 is for
  k = 5 ,  n = 10^4 ,  M = 10^6 bits  which yields:

     Fn = 0.000468157 by numerical integration
          0.000468293 by numerical summation      which is the True Value
  Ploss = 0.000468047 by numerical integration of the logarithmicized formula
          0.000468183 by numerical multiplication which is the True Value
1 - Pnc = 0.00312     by the approx-approx equation  (4) [Wolper, Leroy 1993]
                         which is 312 % of their proclaimed 0.001 for fig. 2.
1 - Pno = 0.00052     by the approx-approx equation after their equation (5)
                         in [Wolper, Stern, Leroy, Dill ],
                         which is  52 % of their proclaimed 0.001 for fig. 8.
The value 0.001 is too far from the True Value for the  k, n, M  given above.

Their equations (3) for Pnc and their identical (5) for Pno have built in a
1st order approximation of  i*k bits being On after  i  items or states were
inserted i.e. virtually "stored" in a BF or bitstate hasher. This fundamental
weakness cannot be removed by improving their 2nd order approximations.
Moreover I could not get a meaningful result from a straight numerical
evaluation of their equations (3) or (5), despite my careful logarithmization
with overflow checking On, extended floating point type of 10 bytes
BPascal and using relatively small numbers like  k = 5, n = 10^4  and
M = 1 Megabit, which still form a clear point in their fig. 2.

Note that here I tear down the formulas and fig. 2 in [Wolper & Leroy 1993]
and fig. 8 in [Wolper, Stern, Leroy, Dill ] concerning Holzmann's "bitstate
hashing" [Holzmann 1988,1990] aka Bloom filtering [Bloom 1970], [Knuth 1973],
aka "probabilistic hashing" [Sincovec & Wiener 1986, p.396] aka "multiple
hash functions" [Reingold & Hansen 1986, p.410-413]. I did not check yet
Wolperian theories behind hashcompact HC [Wolper & Leroy 1993] aka compact
hashing CH [Cleary 1984] aka virtual hashing since [Morris 1968, Bell Labs],
virtual scatter [Murray 1970, Cornell Univ.] and virtual hash [Wiener 1986].

Since one comparison assumed that 400 MBytes of fast memory are available
and accessible, saving it makes no sense, and since hashcompact cannot take
more than up to 80 MegaStates, only one nontrivial comparison makes sense :
Q: Which fault rates f are achievable for M = 400 MB and n = 80 MegaStates?
A: BF cannot win over HC for the given  M, n because for k = 28 BF's lowest
   here achievable losses are  Fn = 0.017 =~ Ploss = 0.017 > 0.0013 = 0.13 %
   claimed for HC in [Wolper, Stern, Leroy, Dill ].
On the other hand, at the time when HC with k = 40 bits/state has filled all
the available bitspace of  M = 3200 Megabits,  a BF with  k = 28 ( k <> 28
would worsen  Fn  and  PLoss ) has used only about  M/2 = 1600 Megabits.

The optimal bitsharing factor  Obsf = 0.5*M/(k*n) = 0.5*3200/(28*80) = 0.71

i.e. from the  k*n  bits ever set  29 % were already On hence were set again
so that 29 % of the On-bits are bitshared and thus semantically overloaded.

When HC has used (almost) all  M bits, HC will start to lose each & every new
state simply because there will be no single bit left to store, while BF can
go on quite comfortably much further, until it will start losing much more
often than acceptable. Note the minimal Fn = 0.017 =~ 0.017 = Ploss for k=28:

Tab. 2:    Fn = the smaller the better;     M = 3200 Megabits
    k   Ploss =~ Fn for Fn < 1, but here BF cannot achieve Fn < 0.017
HC: 40  Ploss = BEST 0.0013  n < 80 MegaS = HC's abs.limit, HC used ALL bits
BF: 40     Fn =      0.036   n = 80 MegaS,  BF far from full (bitsharing)
BF: 28     Fn = best 0.017   n = 80 MegaS,  BF used only  M/2 =  50% of bits
BF: 20     Fn =      0.038   n = 80 MegaS,  BF far from full (bitsharing)
BF: 10     Fn =      2.26    n = 80 MegaS,  BF far from full (bitsharing)
BF:  2     Fn =  64372 lost; n = 80 MegaS,  BF far from full (bitsharing)

Here BF is the loser in general, and for k = 2 a Huge Loser in particular.
But again, when HC uses all  M  bits available,   HC with  k = 40
will turn into a Henceforth-All-Loser long before BF with  k = 28 (or less)
will start to lose unacceptably. So there is no clear cut criterion or rule
of thumb for situations when  n  is not known beforehand.

One clear advantage of classical virtual   COMPACT HASHers ( CH like HC i.e.
hashcompact )  over  neoclassical virtual DISPERSED HASHers ( DH like BF )
with  k > few  has been until now their speed, but this difference is
virtually annihilated by the new algorithm called FlashHash for the first
time described here above and in more detail below.


-.- +The proof of the pudding = Tab. 3 :

Here are some numbers for a BF (operating with my 1-liner hashfun) on
  n  =  63609 real-world strings (all unique!, many similar, ave.length 19)
  M1 = 116663 bits/segment in a partitioned BF3, and M = k*M1 in BF1
hence  (k*n)/M = (k*n)/(k*M1) = n/M1 = 63609/116663 = 0.545, well filled.
As said above,  Fn = number of lost strings during 1 to n  distinct
insertions (i.e. no duplicates [re]inserted).

  Tab. 3:  BF1 is monolithic;  BF3 is partitioned;  cumulated losses Fn:

         FlashHash Yes/No ;  k =    2     3    4    5    6   7   8   9  10
  Theor. num. sum BF1: No   Fn =  4205  1353  460  162  59  22   8   3   1
  Theor. integral BF1: No   Fn =  4272  1385  474  169  61  23   9   3   1
  Really counted  BF1: No   Fn =  4259  1390  434  151  64  17   6   2   1
  Really counted  BF1: Yes  Fn =  4316  1411  460  159  64  33  25   7   7
   rotated by c < w=32 bits  c =    13    13    7    7   5   5   4   3   3
  Really counted  BF1: Yes  Fn =  4360  1387  503  160  59  33  11   8   3
                               =  version with  h0 := h0 xor EnTroPic[j];

  Further for BF3 all really counted cumulated losses Fn:
  - k hashfuns/S    :
      NO FlashHash     No   Fn =  4331  1395  482  178  64  24   8   4   0
  - 1 hashfun/S only:  Yes  Fn =  4259  1366  465  161  56  24  12   5   3
                               =  version with  h0 := h0 xor EnTroPic[j];
  - 1 hashfun/S only:
    FlashHash w = 32   Yes  Fn =  4312  1399  488  175  70  30  13  11   5
     rotated by c bits       c =    13    13    7    7   5   5   4   3   3
  - 2 hashfuns/S only:
    FlashHash w = 64   Yes  Fn =  4354  1354  465  164  70  20  10   2   1
     rotated c <= 13         c =    13    13   13   13  12  10   9   7   7
  - 2 hashfuns/S only:
    FlashHash w = 64   Yes  Fn =  4311  1390  503  174  65  20  10   2   1
     rotated c <= 11         c =    11    11   11   11  11  10   9   7   7
  Number of hashes:          k =     2     3    4    5   6   7   8   9  10

The table empirically validates that a k-flashhashed single h0 := HashFun(S)
can compete with  k  separately (re)computed HashFun(S). The theoretical
k!-based reason follows further below (find "k!" in this text).
FlashHash with 1 hashfun was competitive up to  k = 5  and only then
the decresing  c  has begun to take toll when  h0, h  had w = 32 bits/word.
Of course the word width  w > 32 bits would help, and so did "hash pairing"
of two 32 bit results  h0, h00  of two hashfuns into one final "hashpair"
of w = 64 bits.  The results were close to those of  k  separate hashfuns
and also close to the theoretical cumulated losses  Fn  for BF1.
Like it is impossible to address large spaces with too few bits, larger M
may require larger word width  w > 32 bits for the hashes (find "w = 64").

The IT & TI engineering rule of thumb for  Fn  in a flashhashed BF3 or BF2:
For your  M, n and k  compute my new theoretical summing formula for BF1

    Fn = Sum_i=1..n( fi )  with  f(i) = (1 - (1 - 1/M)^(k*i))^k and M = k*M1

(do not forget to initialize Sum := 0 :-) and this  Fn  multiply by the
following corrective ratio from the Tab. 3 :

   ( flashhashed Fn[k] ) / ( theoretical numerical sum Fn[k] ) .

The safety margin built into this engineering rule is that the Tab. 3 is
based on my simple (good enough for my apps) 1-liner HashFun(S) :

     h0 := (h0 << 5) + h0 + (S[ix] & 31);  for subset-of-ASCII strings.

Common technical sense tells us that: the better the HashFun(S) and the
larger the distance  c  and the width-bits  w  of the 0-th hash  h0 (as if
concatenated and rotated with  h00 ), the lower the actual  Fn  will be.

Competent readers using better hashfuns should achieve better results,
especially if their  S  is densely packed i.e. without too many redundant
bits contributing no information (hence my << 5, & 31 ; find these below).
Competent users of FlashHash will obtain their custom designed corrective
ratios based on their versions of my Tab. 3 obtained for their own hashfuns
working on their typical sets of states or items.

The tab. 3 above shows that there is no need to compute  k  HashFun(S)
    iff k > 1  hashes are obtained by means of FlashHash :
1.  The k > 2  greatly decreases BF's losses  Fn  and the failrate  f ;
2.  for k < 6  only 1 HashFun(S) will suffice iff FlashHash is employed;
3.  for k > 6  only 1 HashFun(S) may  suffice iff FlashHash is employed,
                but 2 HashFun(S) will further decrease  Fn  and  f .

Inevitably, like with all algorithms, there surely will be some range(s)
of parameters for which a compact hash CH or hashcompact HC will be worse
or better than a dispersed hash a la BF. To determine these ranges is left
as an exercise for the spinners (= kind, thoughtful readers in general and
the reinventors of compact hash or hashcompact in particular). This paper
gives them proper formulas for the cumulated losses  Fn.


-.- +FlashHashing & BackHashing = (m)eat the beef again :

The basic idea of flashhashing is to compute only one hash  h0  (as good as
possible or available; here good means that its bits are quite independent)
and then to derive all  k  hashes from it by means of one stepwise rotation
of  h0's bits per each hash. Since the above analysis was done under the
assumption of  k  independent & uniform hashing functions I am now
trespassing into the holy ground of those theoreticians who would claim
that my flying machine cannot fly, as Professor Simon Newcomb, then
the US "astronomer-royal" residing at Harvard, and Professor Lord Kelvin in
the UK have claimed [Newcomb 1903]. To take their fear of flying away, I can
say more than "Sirs, with all due respect, I have flown my FlashHash". Lets
switch from BF1 to BF3 (or BF2), to really appreciate why BF3 is our favorite
scheme. As said, BF3 or BF2 do not suffer from hashcrosstalk i.e. no bits are
shared by j-th and i-th hashfuns if i <> j. This observation makes FlashHash
and BackHash workable. As all  k  hashes are generated in FIXED ORDER, each
works only its own j-th segment of BH3. It is possible that states S1 and S2
will generate the same  k  hashes, but the higher the  k, the more unlikely
it is that hashes will be generated in the same order, afterall the number of
their permutations is  k!  (this is factorial, not an exclamation! :-) and
only one of the k!  permutations will match. The same in still other words:
When two different states  Si, Sj  produce two different 0-th hashes h0i, h0j
and FlashHash rotated  h0i by Zi steps and  h0j by Zj steps, then both hashes
may become identical, and they both would match the same bits in a monolithic
BF1, but not in a partitioned / segmented BF3 or BF2 where each hash has its
own segment or partition. Against such mismatches in BF1 resulting from the
rotated  h0  works the COMBINATORIALLY IMPLOSIVE nonmatching logic of k! .

The flashhashing fewliner is presented in edu-C+BP, the lingua franca of
ancient algorithmicians, defined above as a mixture of BP-Sanskrit and
C-Latin:

Unsigned integers :
  If you have to use signed integers e.g. in a 2-complement arithmetic
  i.e. if abs(MinInt) <> abs(MaxInt) then you must put overflow checking
  (if any) off, and for speed put it off anyway, since flying as fast as
  possible without a parachute is the right fun, isn't it ?

  M1 = nr. of bits in a BF[0..M-1]; for BF1 is the modulus; h0 := h0 % M
  M  = nr. of bits in each segment of a BF3 or BF2,  where  h0 := h0 % M1
  j  = the order number of the j-th hash produced per one call of HashFun(S)
  h0 = the 0-th hash produced by HashFun(S) called only once per S
  h  = any hash subsequently (re)generated from h0 ;  h could be h[j]
  w = number of bits in  h0 and h  i.e. hash-word width in bits;
      often  w = 32 bits, or  w = 64 bits for  h0  hashpaired with  h00.
  k = total number of hashes (not hashfuns' calls) to be used per item S ;
  c = number of bits shifted;  0 < c < w , the larger the  c  the better
      because mutually more distant bits will be more stochastically
      independent. Pre-check to make sure that:

  Use  c  such that  k <= 1 + ((w-1) div c) and  odd c < (w div 2)
       c  prime w.r.t.  w  would guarantee that all  w  bits will be
          visited even in the case of i.e.  c*k > w - 1.

An item S is a chunk of data to be hashed, e.g. a string of variable size,
  or a fixed sized data vector like e.g. a bitstring or a bytestring
  representing anything, e.g. a state S of a model, mail or femail :-)

FlashHash for BF3 or BF2 (i.e. no hashcrosstalk) to be run only once per S.
          This version and that in the executive summary above are both
          functionally equivalent but coded differently :

  d := w - c; { use max c < w/2 for k <= 1 +((w-1) div c), w = #bits in h0 }
  M1 := M div k;  { i.e. M := k*M1;    }
  M1min1 := M1 -1;  h0 := HashFun(S); { store THIS h0 now, iff needed  }
  offset := 0;      h  := h0 % M1;    {  h := h0 & M1min1; is faster   }
{ Use/test/set  BF3[h]    for j = 1 , or store in h[1] :=              }
  for j := 2 to k do  { generates k > 1 hashes per item or state S :   }
  begin  { max prime or odd c <= (max c allowed) will be the best  c   }
        h0 := (h0 << c) | (h0 >> d);  { h0 rotated  c bits to the left }
        offset := offset  + M1;
        h := offset + (h0 % M1); { Use/set BF3[h]  for k=j, or: h[j]:= }
  { or: h := offset + (h0 & M1min1);    is allowed iff M1 power of 2   }
  end;{ h :=  might be h[j] := ... ; use  h  before and in the loop    }

BackHash is like FlashHash with  h0 := HashFun(S)  replaced with
         h0 := FetchFromStore; (e.g. from a stack)

Iff h0's are stored (e.g. stacked, only one h0 per each S), no HashFun(S)
has to be redone for  S's done so far, as BackHash can regenerate all  k
hashes per  S  from its single  h0  by bitshifts or, surprise, even
without  h0  if flashhashing was done with  k*c = w  e.g. 32, and if no
final transformation like  h := offset + (h0 % M1)  was applied, which
would mean that a nonsegmented BF1 would have to be used. This seductive
save-no-h0 trick is not recommended and therefore not discussed.
Quiz: Why?  Having read so far, the answer should not be too difficult.

FlashHash for monolithic BF1 (with hashcrosstalk) to be run only once per S:

{     M := k*M1;   for comparisons of this BF1 with BF3 or BF2          }
{ Mmin1 := M - 1;  for faster  %  iff  M  is a power of 2 ; see below   }
  d := w - c; { w = #bits in h0; use max c < w/2 for k <= 1 +((w-1) div c) }
  h0 := HashFun(S); { store THIS  h0  now, iff needed                   }
  h := h0 % M;      { h := h0 & Mmin1; is faster; use iff M power of 2  }
{ Use/check/set   BF1[h]    for j = 1 , or store in h[1] :=             }
  for j := 2 to k do  { generates k > 1 hashes per item or state S :    }
  begin  { max prime or odd c <= (max c allowed) will be the best  c    }
        h0 := (h0 << c) | (h0 >> d); { h0 rotated by c bits to the left }
        h :=   h0 % M; { Here use ie test/set BF1[h]  for k=j, or: h[j] }
  { or: h :=   h0 & Mmin1); is faster but allowed iff M is a power of 2 }
  end;{ h :=  might be h[j] := ... ; use  h  before and in the loop     }


FlashHash in edu-C for monolithic BF1 with hashcrosstalk; run it once per S:
/* This version allows concentration of an inline code at one place    */
/*     M = k*M1;    for comparisons between BF1 and BF3 or BF2         */
/* Mmin1 = M - 1;   for faster  %  iff  M  is a power of 2 ; see below */
   d = w - c;  /* w = nr bits in h0; use max c < w/2 for k <= 1 +(w-1)/c */
  h0 = HashFun(S);    /* Store THIS h0 now iff needed                  */
  for j = 1 to k do   /* generates  k  hashes per item or state S :    */
  begin /* max odd or prime  c <= (max c allowed) will be the best  c  */
          if j > 1 then  h0 = (h0 << c) | (h0 >> d); /* rotate bits    */
          h = h0 % M ;      /* put HERE inline code using  h           */
   /* or: h = h0 & Mmin1;    this is faster; use it iff  M  power of 2 */
  end; /* h = might be h[j] = ... ; use  h  before and in the loop     */


-.- +More explanations and hints for programmers :

HashFun(S) above may be your favorite, or this Bytewise 1-liner (or better):

  h0 := 1;  for ix := loB to upB do  h0 := (h0 << 8) + h0 + S[ ix ];

  h0 := 0 would be ok, but h0 := 1 may be better. Quiz: why ?

FH & BH employ a repeated barrelshift which cyclically step-wise rotates
bits in the last word  h0  by  c  bits to the left i.e. by  (j-1)*c bits
w.r.t. the original hash  h0. Each turn of the loop yields one of the  k
hashes per item  S  so that FlashHash works as an ultraefficient
multihashing function. BackHash is even faster.

If you need to remember but do not want to store all  k  hashes per item
( or per state when generating too many states e.g. for model checking
in SPIN ), then you can do with storing only the 0-th hash  h0  and
recover i.e. reconstruct all  k  hashes with BackHash which is much
faster than FlashHash, because no hashfun needs to be ever recomputed
again as long as the 0-th hash  h0  from FlashHash is still stored and
readily accessible e.g. on a stack or in a queue (LIFO / FIFO).

Here is how to generate  c  and  d  for the chosen  k :

  Either precompute  c  for  h0's  word width  w = 32 bits thus:
  for ctry := 1 to w do { huge M=k*M1 may need w = 64 bits i.e. h0 and h00 }
    if k > ( 1 + ((w -1) div ctry) )
      then begin c := ctry -1;  Goto cLa end;
  cLa:
    { c = 13 or 11 are large enough distances for bits' independence      }
  if  c > 13     then c := 13; { prevents c > 32/2, e.g. c = 31 for k = 2 }
  if (c % 2) = 0 then c := c - 1;  { odd  c  is better than even  c       }
  d := w - c;

  Or pre-set  c  as follows, with odd  c  preferred:
  if k > 1 then
  case   k  of { the best  c  seems to be  c = prime or odd: }
      2, 3: c:=13;   4: c:=9;       5: c:=7;
      6, 7: c:= 5;   8: c:=4;   9, 10: c:=3;
      else: { here can be the "Either" loop from above }
  end;
    { c = 13 or 11 are sufficient distances for bits' independence        }
  if  c > 13     then c := 13; { prevents c > 32/2, e.g. c = 31 for k = 2 }
  if (c % 2) = 0 then c := c - 1;  { odd  c  is better than even  c       }
  d := w - c;

  Check( 1 < k <= 1 + ((w - 1) div c ) );
  Check( 1 < c ); { 1 << c  makes hashes less dependent }

By now it should be clear that FlashHash deserves its name as its blitzy
performance in time & space derives from the "incremental" re-computation
based on computationally simple & blitzy bitshifting which can be justified
by not too simplistic reasoning about  k!  permutations of  k  hashes for
k-segmented hence "combinatorially imploded" Bloom filters of the 2nd and
3rd kind i.e. for BF2 and BF3 only. Negative tests on BF1 mentioned above
have empiricaly confirmed this conclusion. Positive results are in Tab. 3.
By negative test results I mean that a flashhashed BF1 will almost always
make more false positives than  a flashhashed BF3 or Bf2, but the
difference does not need to be large.

So far on FlashHash. No more spinning about how to obtain  k  hashes in
general and how to do it fast & memory efficient in particular i.e. how
(not to have) to save them all in some applications like e.g. in SPIN.
Despite the "Never say never", I firmly believe that there never was
and never will exist more efficient multihash algorithms than FlashHash
& BackHash which have reached the absolute limit of what is possible.


-.- +Bitwise rotations aka circular bitshifts aka barrelshifts :

 The << , >>  are bitwise shiftLeft, shiftRight ( shL , shR in BPascal);
 the filling bits are zeroes.
 The | is a bitwise OR in C i.e. bitwise OR on integers in BP.

 BarrelshiftLeft by  c  bits of an unsigned 32 bit integer h0:
      d := 32 - c;
      h := (h0 << c) | (h0 >> d );
 e.g. h := (h0 << 8) | (h0 >> 24);
 e.g. h := (h0 << 1) | (h0 >> 31);

 BarrelshiftLeft between two 32 bit words h0 and h00 taken AS IF they
 form a one w = 64 bit word (see its lower losses  Fn  in Tab. 3 above):
      d   := 32 - c;
      h0  := HashFun1( S );  h00 := HashFun2( S );
      t0  :=  h0  >> d;  {  t0  is a temporary aka shunt like for swap }
      h0  := (h0  << c) | (h00 >> d);
      h00 := (h00 << c) |  t0;
      h   := offset + (h0 % M1); { for BF3.  For BF1 use  h := h0 % M; }
    { h   := offset + (h0 & M1min1) iff M1 = 2^i ; is fastest }

 BarrelshiftLeft by  cL = 5 bits of a  w = 20 bits integer bitstring
          within a  32 bits word  h0  so that 12 msbs are 000000000000 :
      offbits   :=  32 - w;        {= 12 bits to be put Off i.e. zeroed }
      offvalue  :=  1048575;       {= 2^w - 1 = 00..12x..000111...20x..111 }
      offpluscL :=  offbits + cL;  {= 12+5 = 17 }
      h := (((h0 & offvalue) << offpluscL) >> offbits)    { safe Version 1 }
                 | (h0 >> (32 - offpluscL));

  or: iff we know that the 32-w msbits are all 0 then it suffices :
      h := ((h0 << offpluscL) >> offbits) | (h0 >> (32 - offpluscL)); { V2 }

  E.g. when we computed a  h0  of 32 bits and we want to test AS IF  h0  has
       w = 20 bits only, we use Version 1 for j = 1st hash, and V2 for all
       further hashes j = 2 to k,  per state S. Of course we could use the
       Version 1 for all j  as well. A new   S  must start with Version 1.


-.- +On hashing functions :

If you know how to obtain multiple mutually almost independent hashes, then
Bloom filters are quite easy to program, but nevertheless less easy to
understand i.e. to derive BF-formulas and to choose design parameters in
order to be able to exploit their engineering trade-offs properly for a near
optimal design and operation. Good hashes are the keys to good BFs.

To obtain excellent HashFun(S) just search the www for: "Bob Jenkins"
and hashing. Bob Jenkins works for Oracle and his hashfun is a funnel-
less miracle. Please consult the www on how hashing funnels may harm.
To try out FlashHash you need just one hashfun. If you do not have one
handy and you do not want to study & test hard, you can use the quick
& clean (yet good enough for many not too critical applications) 1-liner
operating Bytewise on an item or a dataslab  S[loB..upB]  treated as a
string or an array of bytes to be hashed. As words consist from characters
or bytes, and bytes consist from bits, this hashfun is a pretty flexible
1-liner. When hashing ASCII strings, masking away 1,2,3 msb's will improve
the quality of hashes and decrease the losses  Fn  anf the failrate  f.

h0 is an unsigned integer of, say, 32 bits. Put off overflow checking,
   if any, to make sure that no overflow happens, and for speed anyway:

  h0 := 1;  for ix := loB to upB do  h0 := (h0 << 8) + h0 + S[ ix ];

  h0 := 0  might be worse. Quiz: guess why?

  Specialized 1-liners:              { Masking msb's off with & :   }

  For strings in ASCII < 128 decimal: (h0 << 7) + h0 + (S[ix] & 127);
  for smaller subsets of chars, e.g.: (h0 << 6) + h0 + (S[ix] &  63);
      for even smaller subsets, e.g.: (h0 << 5) + h0 + (S[ix] &  31);
      where some chars may be pre-mapped (i.e. pre-converted to other,
  preferably unused or rarely used characters) which does not harm and
  actually is likely to improve the hashes and lower the faults Fn, f .

On an unsigned integer my 1-liner is just a fast way of repeatedly computing
its mathematical equivalent h0 := h0*257 + S[ix]; note that 257 is a prime
number (this is not so important as long as it is an odd number. Quiz:
why odd ?), but do not forget that the goal of hashing is not math, but an
(ir)reversible compression of a chunk  S  into its much shorter hence
generally non-unique signature aka fingerprint called hash i.e. here
the final numerical value of  h0  which is used for accessing-by-indexing
an array called hash table (of bits for BFs).

If the range of  h0  (e.g. 2^32 -1 = 4294967295 bits = 536870912 bytes)
is larger than the highest index number  M-1  of the last bit in the
hashtable[ 0..M-1 ] of bits, then:

If M is a power of 2  i.e.  M = 2^Int where Int > 0
then the final hash value of  h0  can be reduced to h0/2 (or h0/4 etc)
     by zeroing its msb(s); the AND-masking will do it:
        h0 := h0 & mask;  { where mask = 0..0111..111 }
else clip the final  h0  with the post-final  h0 := h0 % M1;
     where M1 is preferably a prime not too close to such "round numbers"
     like powers of 2, 3,.., 10, etc.

If  k  different hashfuns are needed then we may obtain them as follows:
Precomputations (done only initially, only once per session):
1st: Generate a sequence of numbers loB..upB and
                store it in  Seq[1, loB..upB];
2nd: Initialize a pseudorandom generator of integers;
3rd: Generate  k  pseudorandomly permuted sequences of the nrs loB..upB:
     for j := 1 to k do pseudorandomly Swap the elements in
       Seq[j, loB..upB]  until  k  pseudorandom sequences (each
          containing all numbers loB..upB permuted) are stored in Seq.
       Note: shake well, mix well i.e. do rather more swaps than less,
             say, 10*(upB - loB + 1) or more swaps.

The hashing proper (done repeatedly,  k  times per hashed item):

4th: The j-th hash is obtained by scanning & hashing the item S[loB..upB]
     in the order given by its precomputed pseudorandom  Seq[j, loB..upB].
This simple process yields  k  fairly independent hashfuns, albeit at a
price of considerable CPU-time spent. Such an overhead may be small for
I/O-bound applications which do lots of reading/writing (e.g. of strings)
from/to a disk which is 1000s times slower than a fast memory. However for
CPU-bound "generative" applications, like e.g. model checking in SPIN which
generate millions of states each hundreds of bytes long, their (re)comput-
ation and storage are the dominating performance factors. For such apps
I developed flashhashing which of course can be used in I/O-bound apps too.


-.- +Summary of FlashHashing (FH) :

FH requires from you to have only one (very) good hashfun  h0, and it
  will (re)generate more of decent hashes from it in no measurable time.
FH makes it unnecessary to store more than one hash  h0, because all  k
  hashes per state or item can be recovered from  h0  in no time measurable
  with a fewliner called BackHash which is almost identical to FlashHash,
  but much faster, as BackHash does not really execute any hashfun, it just
  recovers hashes from one  h0  stored per state  S.
Of course the hashes will not be as independent as they could be if
  generated separately, but this suboptimality can be partially
  compensated by a increasing by 1 the number  k  of hashes used. Just
  remember the common-sensical engineering rules of thumb :
  "The bigger the  M = k*M1 , the larger the  w  will be better for low Fn.
  "The larger the constant  c, the more independent the hashes will be"
  (it rhymes, so our Alzheimers can easily remember it :-)  For example
  if SPIN model checkers want k = 10 or 11 hashes, then they can use
  c = 3  if the unsigned integers  h0, h  have 32 bits. The maximal
  c  can be easily precomputed or predefined for  k = 2, 3, 4,.., 10,
  .. up to 32 in which case it must obviously be poor  c = 1. The general
  condition which must be satisfied is  k <= 1 + ((w - 1) div c) ,
  hence  we may do:

  if  k <  1 + ((w - 1) div c) then
  begin  Increase  k  by  1  and use as the last extra hash:
           h0 := (h0 << (w - 1)) | (h0 >> 1);
   {  e.g. h0 := (h0 <<    31  ) | (h0 >> 1); }
  end.

In fact  c  might vary (according to a fixed reproducible pattern!) during
the scan, but constants make our lives easy and computations breezy :-)


-.- +From mission impossible via combinatorial implosion
       to mission accomplished :

It would be no fun to write & to read this e-paper in the conventional dull,
historically falsified style using reductionistic terms "merely" and
"nothing but" or the phrases I hate, like "it obviously/easily follows",
since not much is obvious to anybody but few genuine geniuses belonging to
the category of Ideal Observers (IO) or Ideal Receivers (IR) [Kahre 2002].
For such Superior Beings (almost) all mathematics is nothing but a rather
trivial tautology not worth of pursuing. Having to rely most of the time
on received wisdoms, I keep being surprised by what finally shows up to be
possible in science and technology in general and in IT in particular, as
IT is less limited by "material" restrictions than most other technologies.
  I am writing this section because initially I believed that for high
quality Bloom filtering it is impossible to avoid generating  k  separate
hashfuns. I believed that, because I had my take of papers by theoreticians.
However, few times before this, I have done things deemed impossible:

- Beating down the combinatorial explosions during protocol verification
  by APPROVER = the first Automated & Proven Protocol Verifier introduced
  for the first time at the Lake Arrowhead Workshop [Hajek 1977, 1978a].

- In 1988 I have beaten combinatorial explosion down again when generating
  one solution of the  N  nonattacking queens problems for all chessboards
  up to  N = 1800 (by 1800) on an XT PC, while then the world record (on a
  PC ?) was for  N = 96 by an ex-Stanford professor Dr. Harold Stone then
  the manager of the advanced architectures studies at the IBM Thomas J.
  Watson Research Center [Stone & Stone 1987], [Hajek 1990]. My bad luck
  was that at about the same time two groups of researchers have succeeded
  to turn this NP?-problem (yes, I have asked professor Stephen Cook at
  the University of Toronto, who should be called Dr. NP) into N.P. i.e.
  into No Problem finally for  N = 3M  [Sosic & Gu 1990], and the NASA
  researchers working on the scheduling problem for the Hubble telescope
  [Minton, Johnson, Philips, Laird 1990], albeit on fast workstations with
  lots of memory, not on an XT with unexpandable 520 kBytes which were my
  absolutely limiting factor. Again a mission impossible has become a
  routine (for cognoscenti). Unless you are an expert on CSPs, please feel
  free to try N-queens for N > 100 to get a feeling for what I am talking
  about.

- About 1990 I decided to try my hand & head on the problem of in-flight
  deletions from open addressed hash tables which has been described as if
  impossible without their complete rehashing: "The situation can be
  improved, by periodically rehashing the table" [Standish 1980, pp.157-9],
  "The only solution to the problem of degraded search times is the ultimate
  (total) reconstruction of the table by a process called rehashing"
  [Reingold 1986, p.406 (1983, p.361) ]. Despite being explicitly warned
  by these good professors and implicitly by others who offered no true
  solution to such true deletions, I succeeded to construct a simple but
  not simplistic algorithm for in-flight deletions. Weeks later, I found
  its basic idea described, though not worked out (guess where? :-).
  Only 12 years later when adding this section to this e-paper, I noticed
  a hint (but no more) in [Reingold 1986 & 1983, section 7.4, Exercise 13].

- These three examples, all somehow related to the task at hand, suffice.

I am writing up these experiences mainly to show that one (wo)man's
impossibility is another (wo)man's possibility or even probability, and
that especially in algorithmics there are few known boundaries to human
ingenuity. Afterall, the Laws of InfoProcessing (LIPs :-) still wait to be
uncovered by an info-Ein Stein, but see [Kahre 2002] for a nice try.

Recalling these hands on-&-in experiences and still believing in the
impossibility of avoiding the computation of  k  hashfuns as independent
as only possible, I searched for an indirect inspiration by re-scanning
the works referred further in this section. Amazingly, it did not take
more than keeping the problem in mind for a week before I saw the light.

    There are impossibilities and "impossibilities"; see [Richards 1975]
who starts with a children's verse: "While one man was proving it couldn't
be done, another man tried and he did IT." See also [Garfield 1977] together
with [Newcomb 1903] reprinted there.  More "down to the PC" remarks are
from [Knuth 1977, p.69]: "interpolation ... works better in spite of the
proof that binary search is best", followed on p. 80 by the Conclusions
of this most eminent professor emeritus (Stanford) of algorithmics: "Even
the 'best possible' algorithm can sometimes be improved if we change the
ground rules." FlashHash does change the classical rules; read on to find
out which one.  Concerning classical approaches: "The word 'Classical'
means only one thing in science: it's wrong!" is how [Frieden 1983,1991,2001,
the 1st page of the chap. 16] has paraphrased Robert J. Oppenheimer, the
co-father of our nuclear age.  Oppenheimer has also said that "There are
children playing in the street who could solve my top problems in physics,
because they have modes of perception that I lost long ago."  His statement
fits perfectly with that of a zen master Shunryu Suzuki in his book Zen
Mind, Beginner's Mind: "In the beginner's mind there are many possibilities,
but in expert's there are few. ... You should be rather grateful for the
weeds you have in your mind, because eventually they will enrich your
practice.  ... True understanding is actual practice itself."
Om Mani Padme Hum.  Om Wagi Shori Mum.  Om Vajra Pani Hum.

In this e-paper the "classically impossible" was reached :

- by literally "recycled randomness" (which may seem to be an oxymoron)
  i.e. by stepping over the mental threshold of the classical requirement
  of  k  independent hashfuns, by relativization of their independence
  w.r.t. the observer, human, animal, machine or an abstract Gedanken-It.
  In fact, any programmed randomness is an oxymoron as well, as it is
  pseudorandom anyway. It is my firm opinion that the requirement of
  randomness for applications should be defined w.r.t. the observer, who
  does NOT (!) have to be the omniscient Ideal Observer (IO like e.g. the
  well known Maxwell's demon, or Laplace's demon) quite similar to the
  Ideal Receiver (IR) of Bar-Hillel & Carnap employed by [Kahre 2002], as
  long as we do not deal with such 3-letter word Organizations like e.g.
  KGA (= Kommietet of Gruesome Adversaries) and other Manichean devils
  [Wiener 1954, chap. II & XI, spelled Manichaean]. By not being paranoid
  when not necessary i.e. by assuming only entropic demons of confusion
  and not of willful malice, we can compensate for the weak spots in the
  "recycled randomness" by employing what I call "combinatorial implosion"
  explained next. Manichean demons can be coped with by employing miniMax
  strategies which minimize maximal possible losses. The price payed for
  such (over)cautious miniMaxing are average losses higher than with other
  less conservative strategies.

- by incremental computations pushed to the extreme - to cyclicity, which
  perfectly fits with my "recycled randomness" - which allows blitzy &
  simple constructions of multiple hashes, and their reconstructibility
  or recoverability by means of literal recycling.

- by recognizing that there is "hashcrosstalk" in the usual BF (BF1) and
  by the key insight that the "no crosstalk" property of segmented Bloom
  filters of the 2nd and 3rd kind (BF2 and BF3) can be exploited because
  of  k!  permutations of potential matches (of which always only one will
  actually match i.e. the chances of an unwanted pseudo-match decrease
  hyperexponentially with  k ).  More often than not the combinatorial
  expressions like  k!  (which is about the worst simple one we can get)
  are our enemies we have to fight in combinatorially explosive tasks
  (e.g. in constraint satisfaction problems aka CSP's like the mentioned
  classical N-queens problem). But here the  k!  is our ally which works
  against the weaknesses inherent in "recycled hashes".  Hence my new
  terminus technicus "combinatorial implosion". It would be interesting
  to try to employ it on a wider basis i.e. to try to attack algorithmic
  problems by explicit or even systematic reference to "combinatorial
  implosion".


The synergistic effect of these elements is that  k  hashes are recycled,
and if necessary recovered, from a single result  h0  of a single call per
S  of one HashFun(S) only.

I started this e-paper with the necklace metaphor inspired by Hermann
Hesse's poems, containing such jewels like e.g. the following fragments:
 "It is a game whose rules are nice and neat."
.......
 "Those hieroglyphs once so significant                  { Those progs ... }
  that now are only colored bits of glass,                 { bits of chips }
  he lets them roll until their force is spent    { roll i.e. recycle bits }
  and silently they vanish in the sand."       { silicon is made from sand }
.......
 "Proof that for every color may be found  { .. for every state S be found }
  in music a proper corresponding key."    { in an algorYthm a ... hashkey }
.......
 "The pattern sings like crystal constellations,   { crystal clear bitpatt }
  and when we tell our beads, we serve the whole, ..." { we count our bits }

Telling i.e. counting the beads (or bits :-) is what The Tibetan Book of
the Dead mentions exactly at what I have identified as its culmination
point:
"Even at the time that the pebbles are being counted out, be not frightened,
 nor terrified; tell no lies; and fear not the Lord of Death. .....
 The Lords of Death (aka the Executive Furies) are thine own hallucinations.
 ... Apart from one's own hallucinations, in reality there are no such
 things existing outside oneself like Lord of Death, or god, or demon, ...
 Act so as to recognize this."


-.- +References [ refs ] :

Q: Which refs are books and which are papers ?
A: Unlike in the titles of (e)papers here listed, in the titles of books and
   periodicals all words start with a CAP, except for the insignificants like
   eg.:  a, and, der, die, das, for, from, in, of, or, to, the, with, etc.

Bloom B.H.:  Some techniques and trade-offs affecting large data base
  retrieval times, Proc. ACM Nat. Conf., 24, 1969, 83-95.

Bloom B.H.:  Space/time trade-offs in hash coding with allowable errors,
  CACM 13/7, July 1970, 422-426.

Carter L., Floyd R., Gill J., Markowsky G., Wegman M.:  Exact and
  approximate membership testers, Proc. 10th Annual ACM Symp. on
  Theory of Computation, San Diego, 1978, 59-65; they refer to Bloom 1970.

Cleary J.G.:  Compact hash tables using bidirectional linear probing,
  IEEE Trans. on Computers, C-33/9, Sept. 1984, 828-834.

Czech Z.J. et al: Perfect hashing, Theoretical Computer Science, vol. 182,
  1997, 1-143 = a strong but long story for the strong on long weekends :-)

Frieden Roy. B.:  Probability, Statistical Optics, and Data Testing,
  1st ed. 1983, 2nd ed. 1991 has new chap. 17. In the 3rd ed. of 2001
  start reading the vastly expanded  chap. 17 on p. 413-etc.

Garfield Eugene:  Negative science and "The outlook for the flying
  machine", Current Comments in his Current Contents, Nr. 26, June 27,
  1977, 155-166; followed by [Newcomb 1903].

Gremillion L.L.:  Designing a Bloom filter for differential access,
  CACM 25/9 (1982), 600-604.

Hajek Jan:  the 1st presentation of APPROVER = Automated & Proven PROtocol
  VERifier at the Lake Arrowhead Workshop, California, August 1977.
  From its very beginning, APPROVER was sufficiently advanced to perform
  reachability analysis of protocols specified as procedures in Burroughs
  Algol, so that much more primitive finite state models (FSA) had to be
  translated into Algol and not vice versa.  ARPA's (later: DARPA) TCP =
  Transmission Control Program (later: Protocol) has been specified by Dr.
  Carl Sunshine, Rand Corp., and design bugs were found in TCP by APPROVER
  [Sunshine 1978]. ARPANET was the mother of Internet still running on TCP.

Hajek Jan:  Progress report on the automatic and proven protocol verifier,
  Computer Communication Review, ACM SigComm 8/1, January 1978, 15-16.

Hajek Jan:  Automatically verified data transfer protocols, Proc. 4th
  Intl. Computer Communications Conference, Kyoto, Sept. 1978, 749-756,
  with 4 protocols and their 4 Effective Progress/Stagnation graphs aka
  P/S-graphs. On p. 752 left column, 2nd line from below: please fix the
  otherwise nonexistent ":= phaseA"  into the correct ":= phasesA".

Hajek Jan:  Protocols verified by APPROVER, Computer Communication Review,
  ACM SigComm 9/1, January 1979, 32-34.

Hajek Jan: A new dynamic backtracking algorithm for constraint satifaction
  problems, technical report CC-Note 50, 1990, 18 pages.

Holzmann Gerard J.:  An improved protocol reachability analysis technique,
  Software Practice and Experience, 18/2, Feb. 1988, 137-161, on pp. 144
  and 161 refers to [Morris 1968] and to [McIlroy 1982].

Holzmann Gerard J.:  Algorithms for automated protocol verification, AT&T
  Technical Journal, Jan./Feb. 1990, 32-44.

Kahre Jan:  The Mathematical Theory of Information, 2002

Knuth Donald:  The Art of Computer Programming, vol. 3: Sorting and
  Searching, 1973, see his index for Superimposed coding, and for Bloom
  method for which the correct approximate yet sufficiently precise
  formula is on p. 562 mid in the 1st edition, and on p. 573 up in the
  2nd edition.

Knuth D. E.:  Algorithms, Scientific American, April 1977, 63-80.

Lampson B.W., Sproul R.F.:  An operating system for a single-user machine,
  Proc. 7th Symp. on Operating Systems Principles, Dec. 1979, 98-105.

McIlroy M.D.:  Development of a spelling checker, IEEE Trans on Commun.,
  COM-30/1, Jan. 1982, 91-99; hashing, superimposed codes, Bloom, Morris,
  Floyd & Gill on p. 95 and in the references, where [10] is Knuth vol.3.

Minton Steven, Johnston Mark D., Philips Andrew B., Laird Philip: Solving
  large scale constraint satisfaction problems using a heuristic repair
  method, International Joint Conference on Artificial Intelligence,
  1990, IJCAI-90, 17-24.

Morris R.:  Scatter storage techniques, CACM 11/1, 1968, 38-44.

Mullin J.K.:  A second look at Bloom filters, CACM 26/8 (1983), 570-571.

Murray D.M.:  A scatter storage for dictionary lookups, Journal of Library
  Automation, Sept. 1970, 173-201, mathematical analysis on pp. 177-183,
  virtual hashing analyzed on pp. 180-183 by a computer scientist from
  Cornell University.

Newcomb Simon: The outlook for the flying machine, The Independent - A
  Weekly Magazine, Oct. 22, 1903. Reprinted in [Garfield 1977, 167-172].
  Reco to search www for: Newcomb AND flying

Nix Robert:  Experience with a space efficient way to store a dictionary,
  CACM 24/5, 1981, 297-298; refers Carter et al., describes Bloom filter
  without directly referring to Bloom (but Carter et al do refer to Bloom).

Ramakrishna M.V.:  Practical performance of Bloom filters and parallel
  free-text searching, CACM 32/10 (1989) 1237-1239; the reference 3. to
  Gremillion should be Sept. 1982 and not July 1982.

Reingold Ed.M., Hansen W.J.:  Data Structures in Pascal, 1986 (there
  was a 1983 edition), section 7.4.3 on Multiple hash functions, 410-413;
  their math on pp. 412-413 starts with the assumption that maximum amount
  of entropy is good (for Bloom filters, but Bloom is not named).

Richards Ian: Impossibility, Mathematics Magazine, Nov-Dec. 1975, 249-262

Robert Ch.S.:  Partial-match retrieval via the method of superimposed
  codes, Proceedings of the IEEE, 67/12, Dec. 1979, 1624-1642.

Shannon C.E.:  The Mathematical Theory of Communication, 1948, 1949.

Sincovec R.F., Wiener R.S.:  Data Structures Using Modula-2, 1986,
  pp. 396-408 on probabilistic hashing and on virtual hashing.

Sosic Rok, Gu Jun:  A polynomial time algorithm for the N-queens problem,
  ACM Sigart Bulletin, vol. 1, no. 3, 1990, 7-11.  Two more articles in
  ACM Sigart Bulletin, vol. 2, no. 2, 1991, 8 and 22-24.  Further in
  ACM Sigart Bulletin, vol. 3, no. 1, 1992, 8-12.

Stallings W.:  Password generation by Bloom filters, Dr. Dobbs Journal,
  Aug. 1994, 119-etc, formulas are ok, but fig.2 is not.

Standish Th. A.:  Data Structure Techniques, 1980

Stone Harold. S., Stone Janice M.: Efficient search techniques - An
  empirical study of the N-queens problem, IBM J. Research & Development,
  vol. 31, no. 4, July 1987, 464-474.
  Private communication by letters with Harold Stone in 1988 & 1989.

Sunshine Carl A.:  Survey of protocol definition and verification
  techniques, ACM Computer Communication review, vol.8, July 1978, nr.3,
  35-41, see Table 1 for APPROVER & Hajek & ARPA TCP on p. 39.

Vaihinger Hans:  Die Philosophie des Als Ob - Volks-Ausgabe, 1923

Wiener Norbert:  The Human Use of Human Beings - Cybernetics and Society,
  2nd (expanded) edition 1954, (1st in 1950).

Wiener Richard S.:  An efficient virtual hash algorithm for a spelling
  checker, Journal of Pascal, Ada & Modula-2, Jan-Feb. 1986, 23-29;
  with modules programmed in Modula-2

Wolper Pierre, Leroy Denis:  Reliable hashing without collision detection,
  in Computer Aided Verification, Proc. Int. Workshop, Elounda, Crete,
  Lecture Notes in Computer Science, Vol. 697, Springer-Verlag,
  June 1993, pp. 59-70

Wolper Pierre, Stern Ulrich, Leroy Denis, Dill David L.:  Reliable
  probabilistic verification using hash compaction, on www only, 30 pp.
  (temporarily withdrawn by Ulrich Stern)
-.-