Rolling hash: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>FrescoBot
→‎Rabin-Karp rolling hash: De-abbreved chars to characters
 
Line 1: Line 1:
In [[cryptography]], the '''Merkle–Damgård construction''' or '''Merkle–Damgård hash function''' is a method of building [[Collision resistance|collision-resistant]] [[cryptographic hash function]]s from collision-resistant [[one-way compression function]]s.<ref name=GoldwasserBellare>[[Shafi Goldwasser|Goldwasser, S.]] and [[Mihir Bellare|Bellare, M.]] [http://cseweb.ucsd.edu/~mihir/papers/gb.html "Lecture Notes on Cryptography"]. Summer course on cryptography, MIT, 1996-2001</ref>{{rp|145}} This construction was used in the design of many popular hash algorithms such as [[MD5]], [[SHA1]] and [[SHA2]].
Wilber Berryhill is what his wife loves to call him and he completely loves this name. What I adore performing is soccer but I don't have the time lately. Office supervising is where my primary income comes from but I've usually wanted my own business. I've usually cherished living in Alaska.<br><br>Here is my web page ... [http://Findyourflirt.net/index.php?m=member_profile&p=profile&id=117823 real psychic readings]
 
The Merkle–Damgård construction was described in Ralph Merkle's [[Doctor of Philosophy|Ph.D.]] [[thesis]] in 1979.<ref>[[Ralph Merkle|R.C. Merkle]]. [http://www.merkle.com/papers/Thesis1979.pdf ''Secrecy, authentication, and public key systems.''] Stanford Ph.D. thesis 1979, pages 13-15.</ref> [[Ralph Merkle]] and [[Ivan Damgård]] independently proved that the structure is sound: that is, if an appropriate [[Padding (cryptography)|padding scheme]] is used and the compression function is [[Collision resistance|collision-resistant]], then the hash function will also be collision resistant.<ref>[[Ralph Merkle|R.C. Merkle]]. ''A Certified Digital Signature''. In Advances in Cryptology - CRYPTO '89 Proceedings, Lecture Notes in Computer Science Vol. 435, G. Brassard, ed, Springer-Verlag, 1989, pp. 218-238.</ref><ref>[[Ivan Damgård|I. Damgård]]. ''A Design Principle for Hash Functions''. In Advances in Cryptology - CRYPTO '89 Proceedings, Lecture Notes in Computer Science Vol. 435, G. Brassard, ed, Springer-Verlag, 1989, pp. 416-427.</ref>
 
The Merkle–Damgård hash function first applies an [[#MD-compliant_padding|MD-compliant padding]] function to create an output whose size is a multiple of a fixed number (e.g. 512 or 1024) — this is because compression functions cannot handle inputs of arbitrary size. The hash function then breaks the result into blocks of fixed size, and processes them one at a time with the compression function, each time combining a block of the input with the output of the previous round.<ref name=GoldwasserBellare />{{rp|146}} In order to make the construction secure, Merkle and Damgård proposed that messages be padded with a padding that encodes the length of the original message. This is called ''length padding'' or '''Merkle–Damgård strengthening'''.
 
[[Image:Merkle-Damgard hash big.svg|thumb|400px|right|Merkle–Damgård hash construction]]
 
In the diagram, the one-way compression function is denoted by ''f'', and transforms two fixed length inputs to an output of the same size as one of the inputs. The algorithm starts with an initial value, the [[initialization vector]] (IV). The IV is a fixed value (algorithm or implementation specific). For each message block, the compression (or compacting) function ''f'' takes the result so far, combines it with the message block, and produces an intermediate result. The last block is padded with zeros as needed and bits representing the length of the entire message are appended. (See below for a detailed length padding example.)
 
To harden the hash further the last result is then sometimes fed through a ''finalisation function''. The finalisation function can have several purposes such as compressing a bigger internal state (the last result) into a smaller output hash size or to guarantee a better mixing and [[avalanche effect]] on the bits in the hash sum. The finalisation function is often built by using the compression function{{citation needed|date=January 2011}} (Note that in some documents instead the act of length padding is called "finalisation".).
 
== Security characteristics ==
 
The popularity of this construction is due to the fact, proven by [[Ralph Merkle|Merkle]] and [[Ivan Damgård|Damgård]], that if the one-way compression function ''f'' is [[Collision resistance|collision resistant]], then so is the hash function constructed using it. Unfortunately, this construction also has several undesirable properties:
 
* [[Length extension attack|Length extension]]&nbsp;— once an attacker has one collision, he can find more very cheaply.
* Second [[preimage attack]]s against long messages are always much more efficient than brute force.
* Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.<ref>Antoine Joux. ''Multicollisions in iterated hash functions. Application to cascaded construction.'' In Advances in Cryptology - CRYPTO '04 Proceedings, Lecture Notes in Computer Science, Vol. 3152, M. Franklin, ed, Springer-Verlag, 2004, pp. 306–316.</ref>
* "<abbr title='also known as "Nostradamus attack"'>Herding attack</abbr>s" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a [[random oracle]].<ref>John Kelsey and Tadayoshi Kohno. ''Herding Hash Functions and the Nostradamus Attack'' In Eurocrypt 2006, Lecture Notes in Computer Science, Vol. 4004, pp. 183–200.</ref><ref>{{cite web
|url=http://www.win.tue.nl/hashclash/Nostradamus/|title=Nostradamus
|date=2007-11-30|work=The HashClash Project|publisher=[[Eindhoven University of Technology|TU/e]]
|last1=Stevens|first1=Marc
|last2=Lenstra|first2=Arjen|authorlink2=Arjen Lenstra
|last3=de Weger|first3=Benne|accessdate=2013-03-30}}</ref>
* "Extension attacks": Given the hash ''H(X)'' of an unknown input ''X'', it is easy to find the value of ''H(pad(X) || Y)'', where ''pad'' is the padding function of the hash. That is, it is possible to find hashes of inputs related to ''X'' even though ''X'' remains unknown.<ref>Yevgeniy Dodis, Thomas Ristenpart, Thomas Shrimpton. ''Salvaging Merkle–Damgård for Practical Applications''. Preliminary version in Advances in Cryptology - EUROCRYPT '09 Proceedings, Lecture Notes in Computer Science Vol. 5479, A. Joux, ed, Springer-Verlag, 2009, pp. 371–388.</ref> A random oracle would not have this property, and this may lead to simple attacks even for ''natural'' schemes proven secure in the random oracle model.<ref>J.S. Coron, Y. Dodis, C. Malinaud, and P. Puniya. ''Merkle–Damgård Revisited: How to Construct a Hash Function.'' Advances in Cryptology – CRYPTO '05 Proceedings, Lecture Notes in Computer Science, Vol. 3621, Springer-Verlag, 2005, pp. 21–39.</ref> Length extension attack was actually used to attack a number of commercial web message authentication schemes such as one used by [[Flickr]].<ref>Thai Duong, Juliano Rizzo, [http://netifera.com/research/flickr_api_signature_forgery.pdf Flickr's API Signature Forgery Vulnerability], 2009</ref>
 
== Wide pipe construction ==
[[Image:WidePipeHashFunction.png|thumb|400px|right|The Wide pipe hash construction. The intermediate chaining values have been doubled.]]
 
Due to several structural weaknesses of Merkle–Damgård construction, especially the length extension problem and multicollision attacks, Stefan Lucks proposed the use of the wide-pipe hash<ref>S. Lucks, ''Design Principles for Iterated Hash Functions'', In: Cryptology ePrint Archive, Report 2004/253, 2004.</ref> instead of Merkle–Damgård construction. The wide-pipe hash is very similar to the Merkle–Damgård construction but has a larger internal state size, meaning that the bit-length that is internally used is larger than the output bit-length. If a hash of ''n'' bit is desired, the compression function ''f'' takes ''2n'' bit of chaining value and ''m'' bit of the message and compresses this to an output of ''2n'' bit.
 
Therefore, in final step a second compression function compresses the last internal hash value (''2n'' bit) to the final hash value (''n'' bit). This can be done as simply as to discard half of the last ''2n''-bit-output. SHA-224 and SHA-384 take this form since they are derived from SHA-256 and SHA-512, respectively.
 
== Fast wide pipe construction ==
[[Image:FastWidePipeHashFunction.png|thumb|400px|right|The Fast wide pipe hash construction. Half of chaining value is used in the compression function.]]
It has been demonstrated by Mridul Nandi and [[Souradyuti Paul]] that the Widepipe hash function can be made approximately twice faster, if the widepipe state can be divided in half in the following manner: one half is inputted to the succeeding compression function when the other half is combined with the output of that compression function.<ref name="NP10">Mridul Nandi and Souradyuti Paul. Speeding Up the Widepipe: Secure and Fast Hashing. In Guang Gong and Kishan Gupta, editor, Indocrypt 2010, Springer, 2010.</ref>
 
The main idea of the hash construction is to feed-forward half of the previous chaining value to XOR it to the output of the compression function. In so doing the construction takes in longer messageblock every iteration than original widepipe. Using the same function ''f'' as before, it takes ''n'' bit chaining value and ''n+m'' bit of the message. However, the price to pay is the extra memory used in the construction for feed-forward.
 
==MD-compliant padding==
As mentioned in the introduction, the padding scheme used in the Merkle–Damgård construction must be chosen carefully to ensure the security of the scheme. [[Mihir Bellare]] gives sufficient conditions for a padding scheme to possess to ensure that the MD construction is secure: the scheme must be "MD-compliant" (the original length-padding scheme used by Merkle is an example of MD-compliant padding).<ref name=GoldwasserBellare />{{rp|145}} Conditions:
*<math>M</math> is a prefix of <math>\mathsf{Pad}(M).</math>
*If <math>|M_{1}| = |M_{2}|</math> then <math>|\mathsf{Pad}(M_{1})| = |\mathsf{Pad}(M_{2})|.</math>
*If <math>|M_{1}| \neq |M_{2}|</math> then the last block of <math>\mathsf{Pad}(M_{1})</math> is different from the last block of <math>\mathsf{Pad}(M_{2}).</math>
 
With these conditions in place, we find a collision in the MD hash function ''exactly when'' we find a collision in the underlying compression function. Therefore, the Merkle–Damgård construction is provably secure when the underlying compression function is secure.<ref name=GoldwasserBellare />{{rp|147}}
 
== Length padding example ==
 
To be able to feed the message to the compression function, the last block needs to be padded with constant data (generally with zeroes) to a full block.
: ''For example, let's say the message to be hashed is "HashInput" and the block size of the compression function is 8 bytes (64 bits). So we get two blocks looking like this:''
: <tt>HashInpu  t0000000</tt>
 
But this is not enough since it would mean that distinct messages starting by the same data and terminated by zero or more bytes from the padding constant data would get fed into the reduction function using exactly the same blocks, producing the same final hash sum.
: ''In our example, for instance, the modified message "HashInput00" would generate the same blocks as the original message "HashInput".''
 
To prevent this, the first bit of the padded constant data must be changed. As the constant padding is generally made of zeroes, the first padding bit will be mandatorily changed into "1".
: ''In our example, we get something like this:''
: <tt>HashInpu  t1000000</tt>
 
To harden the hash even further also, the length of the message can be added in an extra block.
: ''So in our example, we would get three blocks like this:''
: <tt>HashInpu  t1000000  00000009</tt>
 
To avoid ambiguity, the message length value must be itself resistant to length extensions. Most common implementations use a fixed bit-size (generally 64 or 128 bits in modern algorithms) and a fixed position at end of the last block for encoding the message length value.
 
Now that is a bit wasteful since it means hashing one full extra block for the length value. So there is a slight speed optimisation that most hash algorithms use. If there is space enough among the zeros padded to the last block the length value can instead be padded there.
 
: ''Let's say here that, in our example the length value is encoded on 5 bytes (40 bits), thus it gets padded in the final block as "00009", not just "9" or with too many unnecessary zeroes. Like this:''
: <tt>HashInpu  t1000009</tt>
 
== References ==
* ''[http://www.cacr.math.uwaterloo.ca/hac/ Handbook of Applied Cryptography]'' by Menezes, van Oorschot and Vanstone (2001), chapter 9.
* ''[http://www.cs.umd.edu/~jkatz/imc.html Introduction to Modern Cryptography]'', by Jonathan Katz and Yehuda Lindell. Chapman and Hall/CRC Press, August 2007, page 134 (construction 4.13).
<references/>
 
{{Cryptography navbox | hash}}
 
{{DEFAULTSORT:Merkle-Damgard Construction}}
[[Category:Cryptographic hash functions]]

Latest revision as of 20:32, 9 July 2014

Wilber Berryhill is what his wife loves to call him and he completely loves this name. What I adore performing is soccer but I don't have the time lately. Office supervising is where my primary income comes from but I've usually wanted my own business. I've usually cherished living in Alaska.

Here is my web page ... real psychic readings