|
|
Line 1: |
Line 1: |
| A family of hash functions is said to be '''<math>k</math>-independent''' or '''<math>k</math>-universal'''<ref name=CLRS>
| | I'm Napoleon and I live in Lower Woodend. <br>I'm interested in Integrated International Studies, Stone collecting and Portuguese art. I like to travel and reading fantasy.<br><br>Also visit my blog - [http://www.youtube.com/watch?v=qp18RhCHpes shadow fight 2 hack] |
| {{cite book
| |
| | first = Thomas H.
| |
| | last = Cormen
| |
| | coauthors = Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford
| |
| | title = Introduction to Algorithms
| |
| | publisher = MIT Press
| |
| | year = 2009
| |
| | isbn = 0-262-03384-4
| |
| | edition = 3rd
| |
| }}
| |
| </ref> if selecting a [[hash function]] at random from the family guarantees that the hash codes of any designated <math>k</math> keys are [[Independence (probability theory)|independent random variables]] (see precise mathematical definitions below). Such families allow good average case performance in randomized algorithms or data structures, even if the input data is chosen by an adversary. The trade-offs between the degree of independence and the efficiency of evaluating the hash function are well studied, and many <math>k</math>-independent families have been proposed.
| |
| | |
| == Introduction ==
| |
| {{see also|Hash function}}
| |
| | |
| The goal of hashing is usually to map keys from some large domain (universe) <math>U</math> into a smaller range, such as <math>m</math> bins (labelled <math>[m] = \{0, \dots, m-1\}</math>). In the analysis of randomized algorithms and data structures, it is often desirable for the hash codes of various keys to "behave randomly". For instance, if the hash code of each key were an independent random choice in <math>[m]</math>, the number of keys per bin could be analyzed using the [[Chernoff bound]]. A deterministic hash function cannot offer any such guarantee in an adversarial setting, as the adversary may choose the keys to be the precisely the [[Image (mathematics)|preimage]] of a bin. Furthermore, a deterministic hash function does not allow for ''rehashing'': sometimes the input data turns out to be bad for the hash function (e.g. there are too many collisions), so one would like to change the hash function.
| |
| | |
| The solution to these problems is to pick a function ''randomly'' from a large family of hash functions. The randomness in choosing the hash function can be used to guarantee some desired random behavior of the hash codes of any keys of interest. The first definition along these lines was [[universal hashing]], which guarantees a low collision probability for any two designated keys. The concept of <math>k</math>-independent hashing, introduced by Wegman and Carter in 1981,<ref name=WC81>
| |
| {{cite journal
| |
| | last1 = Wegman
| |
| | first1 = Mark N. | author1-link = Mark N. Wegman
| |
| | last2 = Carter
| |
| | first2 = J. Lawrence
| |
| | title = New hash functions and their use in authentication and set equality
| |
| | journal = Journal of Computer and System Sciences
| |
| | volume = 22
| |
| | issue = 3
| |
| | pages = 265–279
| |
| | year = 1981
| |
| | doi = 10.1016/0022-0000(81)90033-7
| |
| | id = Conference version in FOCS'79
| |
| | url = http://www.fi.muni.cz/~xbouda1/teaching/2009/IV111/Wegman_Carter_1981_New_hash_functions.pdf
| |
| | accessdate = 9 February 2011
| |
| }}</ref> strengthens the guarantees of random behavior to families of <math>k</math> designated keys, and adds a guarantee on the uniform distribution of hash codes.
| |
| | |
| === Mathematical Definitions ===
| |
| The strictest definition, introduced by Wegman and Carter<ref name=WC81 /> under the name "strongly universal<math>_k</math> hash family", is the following. A family of hash functions <math>H=\{ h:U \to [m] \}</math> is <math>k</math>-independent if for any <math>k</math> distinct keys <math>(x_1, \dots, x_k) \in U^k</math> and any <math>k</math> hash codes (not necessarily distinct) <math>(y_1, \dots, y_k) \in [m]^k</math>, we have:
| |
| : <math>\Pr_{h \in H} \left[ h(x_1)=y_1 \land \cdots \land h(x_k)=y_k \right] = m^{-k}</math>
| |
| | |
| This definition is equivalent to the following two conditions:
| |
| # for any fixed <math>x\in U</math>, as <math>h</math> is drawn randomly from <math>H</math>, <math>h(x)</math> is uniformly distributed in <math>[m]</math>.
| |
| # for any fixed, distinct keys <math>x_1, \dots, x_k \in U</math>, as <math>h</math> is drawn randomly from <math>H</math>, <math>h(x_1), \dots, h(x_k)</math> are independent random variables.
| |
| | |
| Often it is inconvenient to achieve the perfect joint probability of <math>m^{-k}</math> due to rounding issues. Following,<ref name=Siegel>
| |
| {{cite journal
| |
| | last1 = Siegel
| |
| | first1 = Alan
| |
| | title = On universal classes of extremely random constant-time hash functions and their time-space tradeoff
| |
| | journal = SIAM Journal on Computing
| |
| | volume = 33
| |
| | issue = 3
| |
| | pages = 505–543
| |
| | year = 2004
| |
| | id = Conference version in FOCS'89
| |
| | url = http://www.cs.nyu.edu/faculty/siegel/FASTH.pdf
| |
| | doi=10.1137/S0097539701386216}}
| |
| </ref> one may define a <math>(\mu, k)</math>-independent family to satisfy:
| |
| | |
| : <math>\forall</math> distinct <math>(x_1, \dots, x_k) \in U^k</math> and <math>\forall (y_1, \dots, y_k) \in [m]^k</math>, <math>~~\Pr_{h \in H} \left[ h(x_1)=y_1 \land \cdots \land h(x_k)=y_k \right] \le \mu / m^k</math>
| |
| | |
| Observe that, even if <math>\mu</math> is close to 1, <math>h(x_i)</math> are no longer independent random variables, which is often a problem in the analysis of randomized algorithms. Therefore, a more common alternative to dealing with rounding issues is to prove that the hash family is close in [[statistical distance]] to a <math>k</math>-independent family, which allows black-box use of the independence properties.
| |
| | |
| ==See also==
| |
| * [[Universal hashing]]
| |
| * [[Tabulation hashing]], a technique for generating 3-independent hash functions
| |
| | |
| == References ==
| |
| <references />
| |
| | |
| == Further reading ==
| |
| * {{cite book
| |
| | last1 = Motwani
| |
| | first1 = Rajeev
| |
| | last2 = Raghavan
| |
| | first2 = Prabhakar
| |
| | title = Randomized Algorithms
| |
| | publisher = Cambridge University Press
| |
| | year = 1995
| |
| | isbn = 0-521-47465-5
| |
| | page = 221
| |
| }}
| |
| | |
| [[Category:Hash functions]]
| |
| [[Category:Search algorithms]]
| |
| [[Category:Error detection and correction]]
| |
I'm Napoleon and I live in Lower Woodend.
I'm interested in Integrated International Studies, Stone collecting and Portuguese art. I like to travel and reading fantasy.
Also visit my blog - shadow fight 2 hack