Moment of inertia: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
No edit summary
 
Line 1: Line 1:
== たぶん...年間約今」 ==
{{more footnotes|date=March 2009}}
In [[cryptography]], '''coincidence counting''' is the technique (invented by [[William F. Friedman]]<ref>{{Cite document |last = Friedman | first = W.F. | author-link = William F. Friedman | title = The index of coincidence and its applications in cryptology | place = Geneva, Illinois, USA | publisher = Riverbank Laboratories | year = 1922 | series = Department of Ciphers. Publ 22 |oclc=55786052 |postscript = <!--None--> }} The original application ignored normalization.</ref>) of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts. This count, either as a ratio of the total or normalized by dividing by the expected count for a random source model, is known as the '''index of coincidence'''.


?前者は、ややバッフルを見つめて「混乱」 [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-7.html カシオ 腕時計 gps]。同様シャオヤンの言葉を聞いて、シャオゆうはびっくりし、すぐに容疑者に驚いた」。<br>インストラクター大メイシェン「陰陽」のリン·組織の良い長い間、ちょうどうなずい点灯している場合<br>シャオヤンはそっと、「色」、厳粛な顔を見て、「それはあまりにもある場合さて、あなたが必要とするどのくらい、残すために長い言葉。私の権利と義務にするだけでなく、あなたが上で反撃するため。 [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-6.html 電波腕時計 カシオ] '<br>キャロラインはパステル目を光って見えた<br>ペアは、シャオヤンは突然沈黙の後、彼はただ恥ずかしい本物の持っていた、いくつかの赤い肌を感じた: '。たぶん...年間約今」<br>この爆発を<br>、テントが急に静かに、道路仰天の目は瞬時に約一年、ティーンエイジャーであることの嘲笑になっていますか [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-10.html カシオ 腕時計 スタンダード]?この瞬間。誰もが彼の耳には、問題となっている見たことがないされていないままに [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-8.html カシオ gショック 腕時計]...しかし、これはただではカナンの設立以来、病院で...今年そのようなことを指示してくださいことを考えているようだ、頭思える
==Calculation==
相关的主题文章:
<ul>
 
  <li>[http://www.hanjie.info/home.php?mod=space&uid=1891715 http://www.hanjie.info/home.php?mod=space&uid=1891715]</li>
 
  <li>[http://ysswkj.usa13.hao.org.cn/plus/feedback.php?aid=9 http://ysswkj.usa13.hao.org.cn/plus/feedback.php?aid=9]</li>
 
  <li>[http://d.hamburger-girls.com/cgi-bin/gb.cgi http://d.hamburger-girls.com/cgi-bin/gb.cgi]</li>
 
</ul>


== 「極端な崩壊 ==
The Index of Coincidence provides a measure of how likely it would be to draw two matching letters if you randomly selected two letters from a given text. The chance that you will draw a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again is (appearances - 1 / text length - 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English.


低レベルの戦いの技術:グリーン風のスピン拳! [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-13.html カシオ アナログ 腕時計] '<br>風音から鋭いブレークと空気中の<br>拳、巨大な圧力が、シャオヤンは実際に地面に破片の横に、すべてが飛んで持ち上げる [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-6.html casio 腕時計 メンズ]。<br><br>マイクロ斜視、テレビドラマ対向車の怒りは、圧力、シャオヤンの顔「色」の沈黙次第に厳粛な瞬間体が急に向きを変えて戻って押し込んだ後に、壁の上に右足が猛烈に乗って感じる逆推力装置壁の力によって、深い約半インチのフットプリントを残して壁の巨大な運動量は、シャオヤン智玄体が空中で右足が奇妙な力に引き伸ばさ鋼は一般的に困難であるかのようにラジアン、この瞬間に、ソフトな足が、それはそうです [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-7.html カシオソーラー時計]。<br><br>「極端な崩壊! [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-8.html 時計 メンズ カシオ] '<br><br>タイトな非効率、シャオヤン面畏敬の念、右空中に完成完璧充電近く、最終的に完全なビューであり、嘉Lieaoパンチ、クロスH一緒 [http://alleganycountyfair.org/sitemap.xml http://alleganycountyfair.org/sitemap.xml]。
:<math> {IC} = C * ({({\frac{n_a}{N} * \frac{n_a - 1}{N - 1}}) + ({\frac{n_b}{N} * \frac{n_b - 1}{N - 1}}) + ... + ({\frac{n_z}{N} * \frac{n_z - 1}{N - 1}})})</math>
相关的主题文章:
:''Where <math>C</math> is our normalizing coefficient (26 for English), <math>n_a</math> is the number of times <math>a</math> appears in the text, and <math>N</math> is the length of the text.''
<ul>
 
  <li>[http://jinhuabbs.com/home.php?mod=space&uid=48445 http://jinhuabbs.com/home.php?mod=space&uid=48445]</li>
 
  <li>[http://home.china168.biz/home.php?mod=space&uid=1306798 http://home.china168.biz/home.php?mod=space&uid=1306798]</li>
 
  <li>[http://www.1555666.net/home.php?mod=space&uid=13555 http://www.1555666.net/home.php?mod=space&uid=13555]</li>
 
</ul>


== 'あなたは死の願望を主張するので、私はあなたの心を与える ==
We can express the index of coincidence '''IC''' for a given letter-frequency distribution as a summation:


彼らの残留魂の結束、いくつかの方法で、彼らはあなたを作成し、 [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-1.html casio 時計] 'シャオ玄は、穏やかな声が微笑んだが、それは巨大な顔がますます歪んでいることを得ることです、に見えますが、非常に厳しい見える。<br><br>'あなたは死の願望を主張するので、私はあなたの心を与える [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-5.html 時計 casio]!'<br><br>巨大な顔の高騰、激しい魂の中で巨大な口は、一般的にコーンのような非常に巨大な嵐、嵐クレイジースピンを、放出されたことを、シャオ玄激しく離れて暴力「ショット」に対して [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-3.html カシオ 腕時計 電波 ソーラー]。<br><br>魂墓攻撃、シャオ玄はちょうどペースを行進した日、一歩一歩に直面し、テレビドラマでの巨人の顔は、一見獰猛な魂の嵐に行きましたが、これは体で汚染され、それが偽である脱ぎ履き、彼がどのような上海の一般的な原因としていないようです [http://alleganycountyfair.org/sitemap.xml http://alleganycountyfair.org/sitemap.xml]。<br><br>シャオ玄の体は、巨大な顔の下に妨げられることなく歩いて、彼の体は、突然、に実際にあるだけで、彼の魂のように、その気持ちのように、奇妙なスパークが登場
:<math>\mathbf{IC} = \frac{\displaystyle\sum_{i=1}^{c}n_i(n_i -1)}{N(N-1)/c}</math>
相关的主题文章:
<ul>
 
  <li>[http://112.126.65.107/read.php?tid=172226&ds=1 http://112.126.65.107/read.php?tid=172226&ds=1]</li>
 
  <li>[http://amzqp830.com/forum.php?mod=viewthread&tid=168845&extra= http://amzqp830.com/forum.php?mod=viewthread&tid=168845&extra=]</li>
 
  <li>[http://showtime2.no-ip.info/forum-php/forum/thread.php?threadid=225421&sid= http://showtime2.no-ip.info/forum-php/forum/thread.php?threadid=225421&sid=]</li>
 
</ul>


== 空厚いHuoyun包ま ==
where <math>N</math> is the length of the text and <math>n_1</math> through <math>n_c</math> are the [[Letter frequencies|frequencies]] (as integers) of the <math>c</math> letters of the alphabet (<math>c = 26</math> for monocase [[English language|English]]).  The sum of the <math>n_i</math> is necessarily <math>N</math>.


多くの強さ、ああ、私は彼がすでに高い参照しても、まだ彼は戦争の強力な位相でのケースを戦うために期待していなかったことが物事が中にここに戻っている場合、あなたは、ああ、やっとこの男は、ピーク戦闘力の王である知っているが、家庭、それでもソ連千長老たちは、それが非常に大きな驚きを感じるだろう。 [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-3.html カシオ 腕時計 電波 ソーラー] 「クリフと劉清林秀談話は、フラストレーションの口連絡を保持し、それは彼らがシャオヤンの難易度を超えて行ってみたい、と拡大して無限のようです。<br>空厚いHuoyun包ま<br>」と彼火の波が、わかりませんが、いくつかの戦闘スキルを持つこの男死んでいるか生きている、でも彼は逃れることはできない [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-1.html カシオ 時計 メンズ]。「研究宝石のような紫色の目を見て、彼の歯を粉砕するが、それは彼らの少し顔を心配する少しであってもよいように言った [http://alleganycountyfair.org/sitemap.xml http://alleganycountyfair.org/sitemap.xml]。<br>見上げ<br>林ヤンは、、目は眉をひそめ厚い雲、心は少し緊張している、反対者が直面している男が、本物の戦いの事件の三つの [http://www.ispsc.edu.ph/nav/japandi/casio-rakuten-14.html カシオ腕時計 メンズ] ''光のかすかな痕跡に目を向ける強いああ
The products <math>n(n-1)</math> count the number of [[combinations]] of <math>n</math> elements taken two at a time. (Actually this counts each pair twice; the extra factors of 2 occur in both numerator and denominator of the formula and thus cancel out.)  Each of the <math>n_i</math> occurrences of the <math>i</math>-th letter matches each of the remaining <math>n_i -1 </math> occurrences of the same letter. There are a total of <math>N(N-1)</math> letter pairs in the entire text, and <math>1/c</math> is the probability of a match for each pair, assuming a uniform [[random]] distribution of the characters (the "null model"; see below).  Thus, this formula gives the ratio of the total number of coincidences observed to the total number of coincidences that one would expect from the null model.<ref>{{cite journal | author=Mountjoy, Marjorie | title= The Bar Statistics | journal=NSA Technical Journal | year=1963 | volume=VII | issue=2,4}} Published in two parts.</ref>
相关的主题文章:
 
  <ul>
The expected average value for the I.C. can be computed from the relative letter frequencies <math>f_i</math> of the source language:
 
 
  <li>[http://www.sori.cn/plus/feedback.php?aid=183 http://www.sori.cn/plus/feedback.php?aid=183]</li>
:<math>\mathbf{IC}_{expected} = \frac{\displaystyle\sum_{i=1}^{c}{f_i}^2}{1/c}.</math>
 
 
  <li>[http://www.gayradio.com/phpbb/viewtopic.php?p=46613#46613 http://www.gayradio.com/phpbb/viewtopic.php?p=46613#46613]</li>
If all <math>c</math> letters of an alphabet were equally distributed, the expected index would be 1.0.
 
The actual monographic I.C. for [[telegraph]]ic English text is around 1.73, reflecting the unevenness of [[natural language|natural-language]] letter distributions.
  <li>[http://www.iehdi.org/wiki/index.php?title=User:Lpivcxitjd#8624 http://www.iehdi.org/wiki/index.php?title=User:Lpivcxitjd#8624]</li>
 
 
Sometimes values are reported without the normalizing denominator, for example <math>0.067=1.73/26</math> for English; such values may be called <math>\kappa_p</math> ("kappa-plaintext") rather than "I.C.", with <math>\kappa_r</math> ("kappa-random") used to denote the denominator <math>1/c</math> (which is the expected coincidence rate for a uniform distribution of the same alphabet, <math>0.0385=1/26</math> for English).
  </ul>
 
==Application==
 
The index of coincidence is useful both in the analysis of [[natural language|natural-language]] [[plaintext]] and in the analysis of [[encryption|ciphertext]] ([[cryptanalysis]]). Even when only ciphertext is available for testing and plaintext letter identities are disguised, coincidences in ciphertext can be caused by coincidences in the underlying plaintext. This technique is used to [[Vigen%C3%A8re_cipher#Cryptanalysis|cryptanalyze]] the [[Vigenère cipher]], for example. For a repeating-key [[polyalphabetic cipher]] arranged into a matrix, the coincidence rate within each column will usually be highest when the width of the matrix is a multiple of the key length, and this fact can be used to determine the key length, which is the first step in cracking the system.
 
Coincidence counting can help determine when two texts are written in the same language using the same [[alphabet]]. (This technique has been used to examine the purported [[Bible code]]). The ''causal'' coincidence count for such texts will be distinctly higher than the ''accidental'' coincidence count for texts in different languages, or texts using different alphabets, or gibberish texts.
 
To see why, imagine an "alphabet" of only the two letters A and B. Suppose that in our "language", the letter A is used 75% of the time, and the letter B is used 25% of the time. If two texts in this language are laid side by side, then the following pairs can be expected:
{| class="wikitable"
|-
! Pair
! Probability
|-
| AA
| 56.25%
|-
| BB
| 6.25%
|-
| AB
| 18.75%
|-
| BA
| 18.75%
|-
|}
Overall, the probability of a "coincidence" is 62.5% (56.25% for AA + 6.25% for BB).
 
Now consider the case when ''both'' messages are encrypted using the simple monoalphabetic [[substitution cipher]] which replaces A with B and vice versa:
{| class="wikitable"
|-
! Pair
! Probability
|-
| AA
| 6.25%
|-
| BB
| 56.25%
|-
| AB
| 18.75%
|-
| BA
| 18.75%
|-
|}
The overall probability of a coincidence in this situation is 62.5% (6.25% for AA + 56.25% for BB), exactly the same as for the unencrypted "plaintext" case.  In effect, the new alphabet produced by the substitution is just a uniform renaming of the original character identities, which does not affect whether they match.
 
Now suppose that only ''one'' message (say, the second) is encrypted using the same substitution cipher (A,B)→(B,A). The following pairs can now be expected:
{| class="wikitable"
|-
! Pair
! Probability
|-
| AA
| 18.75%
|-
| BB
| 18.75%
|-
| AB
| 56.25%
|-
| BA
| 6.25%
|-
|}
Now the probability of a coincidence is only 37.5% (18.75% for AA + 18.75% for BB). This is noticeably lower than the probability when same-language, same-alphabet texts were used. Evidently, coincidences are more likely when the most frequent letters in each text are the same.
 
The same principle applies to real languages like English, because certain letters, like E, occur much more frequently than other letters—a fact which is used in [[frequency analysis (cryptanalysis)|frequency analysis]] of [[substitution cipher]]s. Coincidences involving the letter E, for example, are relatively likely. So when any two English texts are compared, the coincidence count will be higher than when an English text and a foreign-language text are used.
 
It can easily be imagined that this effect can be subtle. For example, similar languages will have a higher coincidence count than dissimilar languages. Also, it isn't hard to generate random text with a frequency distribution similar to real text, artificially raising the coincidence count. Nevertheless, this technique can be used effectively to identify when two texts are likely to contain meaningful information in the same language using the same alphabet, to discover periods for repeating keys, and to uncover many other kinds of nonrandom phenomena within or among ciphertexts.
 
  Expected values for various languages<ref>{{cite book|author=[[William F. Friedman|Friedman, W.F.]] and [[Lambros D. Callimahos|Callimahos, L.D.]]|title=[[Military Cryptanalytics]], Part I – Volume 2|origyear=1956|publisher=Reprinted by Aegean Park Press|isbn=0-89412-074-3|year=1985}}</ref> are:
{| class="wikitable"
|-
! Language
! Index of Coincidence
|-
| English
| 1.73
|-
| French
| 2.02
|-
| German
| 2.05
|-
| Italian
| 1.94
|-
| Portuguese
| 1.94
|-
| Russian
| 1.76
|-
| Spanish
| 1.94
|-
|}
 
==Generalization==
 
The above description is only an introduction to use of the index of coincidence, which is related to the general concept of [[correlation]].  Various forms of Index of Coincidence have been devised; the "delta" I.C. (given by the formula above) in effect measures the [[autocorrelation]] of a single distribution, whereas a "kappa" I.C. is used when matching two text strings.<ref>{{cite book | last=Kahn | first=David | authorlink=David Kahn (writer) | title=The Codebreakers - The Story of Secret Writing | origyear=1967 | publisher=Macmillan | location=New York | isbn=0-684-83130-9 | year=1996}}</ref>  Although in some applications constant factors such as <math>c</math> and <math>N</math> can be ignored, in more general situations there is considerable value in truly ''indexing'' each I.C. against the value to be expected for the [[null hypothesis]] (usually: no match and a uniform random symbol distribution), so that in every situation the [[expected value]] for no correlation is 1.0. Thus, any form of I.C. can be expressed as the ratio of the number of coincidences actually observed to the number of coincidences expected (according to the null model), using the particular test setup.
 
From the foregoing, it is easy to see that the formula for kappa I.C.' is
 
:<math>\mathbf{IC} = \frac{\displaystyle\sum_{j=1}^{N}[a_j=b_j]}{N/c},</math>
 
where <math>N</math> is the common aligned length of the two texts ''A'' and ''B'', and the bracketed term is defined as 1 if the <math>j</math>-th letter of text ''A'' matches the <math>j</math>-th letter of text ''B'', otherwise 0.
 
A related concept, the "bulge" of a distribution, measures the discrepancy between the observed I.C. and the null value of 1.0.  The number of cipher alphabets used in a [[polyalphabetic cipher]] may be estimated by dividing the expected bulge of the delta I.C. for a single alphabet by the observed bulge for the message, although in many cases (such as when a [[Vigenère cipher|repeating key]] was used) better techniques are available.
 
==Example==
 
As a practical illustration of the use of I.C., suppose that we have intercepted the following ciphertext message:
<pre><nowiki>
QPWKA LVRXC QZIKG RBPFA EOMFL  JMSDZ VDHXC XJYEB IMTRQ WNMEA
IZRVK CVKVL XNEIC FZPZC ZZHKM  LVZVZ IZRRQ WDKEC HOSNY XXLSP
MYKVQ XJTDC IOMEE XDQVS RXLRL  KZHOV
</nowiki></pre>
(The grouping into five characters is just a [[Telegraphy|telegraphic]] convention and has nothing to do with actual word lengths.)
Suspecting this to be an English plaintext encrypted using a [[Vigenère cipher]] with normal A–Z components and a short repeating keyword, we can consider the ciphertext "stacked" into some number of columns, for example seven:
<pre><nowiki>
QPWKALV
RXCQZIK
GRBPFAE
OMFLJMS
DZVDHXC
XJYEBIM
TRQWN…
</nowiki></pre>
If the key size happens to have been the same as the assumed number of columns, then all the letters within a single column will have been enciphered using the same key letter, in effect a simple [[Caesar cipher]] applied to a random selection of English plaintext characters.  The corresponding set of ciphertext letters should have a roughness of frequency distribution similar to that of English, although the letter identities have been permuted (shifted by a constant amount corresponding to the key letter).  Therefore if we compute the aggregate delta I.C. for all columns ("delta bar"), it should be around 1.73.  On the other hand, if we have incorrectly guessed the key size (number of columns), the aggregate delta I.C. should be around 1.00.  So we compute the delta I.C. for assumed key sizes from one to ten:
{| class="wikitable"
|-
! Size
! Delta-bar I.C.
|-
| 1
|  1.12
|-
| 2
|  1.19
|-
| 3
|  1.05
|-
| 4
|  1.17
|-
| 5
|  1.82
|-
| 6
|  0.99
|-
| 7
|  1.00
|-
| 8
|  1.05
|-
| 9
|  1.16
|-
| 10
|  2.07
|-
|}
We see that the key size is most likely five.  If the actual size is five, we would expect a width of ten to also report a high I.C., since each of its columns also corresponds to a simple Caesar encipherment, and we confirm this.
So we should stack the ciphertext into five columns:
<pre><nowiki>
QPWKA
LVRXC
QZIKG
RBPFA
EOMFL
JMSDZ
VDH…
</nowiki></pre>
We can now try to determine the most likely key letter for each column considered separately, by performing trial Caesar decryption of the entire column for each of the 26 possibilities A–Z for the key letter, and choosing the key letter that produces the highest correlation between the decrypted column letter frequencies and the relative [[letter frequencies]] for normal English text. That correlation, which we don't need to worry about normalizing, can be readily computed as
 
:<math>\mathbf{\chi} = \sum_{i=1}^{c}n_i f_i</math>
 
where <math>n_i</math> are the observed column letter frequencies and <math>f_i</math> are the relative letter frequencies for English.
When we try this, the best-fit key letters are reported to be "<code>EVERY</code>," which we recognize as an actual word, and using that for Vigenère decryption produces the plaintext:
<pre><nowiki>
MUSTC HANGE MEETI NGLOC ATION FROMB RIDGE TOUND ERPAS
SSINC EENEM YAGEN TSARE BELIE VEDTO HAVEB EENAS SIGNE
DTOWA TCHBR IDGES TOPME ETING TIMEU NCHAN GEDXX
</nowiki></pre>
 
from which one obtains:
<pre><nowiki>
MUST CHANGE MEETING LOCATION FROM BRIDGE TO UNDERPASS
SINCE ENEMY AGENTS ARE BELIEVED TO HAVE BEEN ASSIGNED
TO WATCH BRIDGE STOP  MEETING TIME UNCHANGED  XX
</nowiki></pre>
after word divisions have been restored at the obvious positions. "<code>XX</code>" are evidently "null" characters used to pad out the final group for transmission.
 
This entire procedure could easily be packaged into an automated algorithm for breaking such ciphers. Due to normal statistical fluctuation, such an algorithm will occasionally make wrong choices, especially when analyzing short ciphertext messages.
 
==References==
<references/>
 
==See also==
* [[Kasiski examination]]
* [[Topics in cryptography]]
 
{{Cryptography navbox | classical}}
 
[[Category:Cryptography]]
[[Category:Cryptographic attacks]]
[[Category:Summary statistics for contingency tables]]

Revision as of 17:10, 3 February 2014

Template:More footnotes In cryptography, coincidence counting is the technique (invented by William F. Friedman[1]) of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts. This count, either as a ratio of the total or normalized by dividing by the expected count for a random source model, is known as the index of coincidence.

Calculation

The Index of Coincidence provides a measure of how likely it would be to draw two matching letters if you randomly selected two letters from a given text. The chance that you will draw a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again is (appearances - 1 / text length - 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English.

IC=C*((naN*na1N1)+(nbN*nb1N1)+...+(nzN*nz1N1))
Where C is our normalizing coefficient (26 for English), na is the number of times a appears in the text, and N is the length of the text.

We can express the index of coincidence IC for a given letter-frequency distribution as a summation:

IC=i=1cni(ni1)N(N1)/c

where N is the length of the text and n1 through nc are the frequencies (as integers) of the c letters of the alphabet (c=26 for monocase English). The sum of the ni is necessarily N.

The products n(n1) count the number of combinations of n elements taken two at a time. (Actually this counts each pair twice; the extra factors of 2 occur in both numerator and denominator of the formula and thus cancel out.) Each of the ni occurrences of the i-th letter matches each of the remaining ni1 occurrences of the same letter. There are a total of N(N1) letter pairs in the entire text, and 1/c is the probability of a match for each pair, assuming a uniform random distribution of the characters (the "null model"; see below). Thus, this formula gives the ratio of the total number of coincidences observed to the total number of coincidences that one would expect from the null model.[2]

The expected average value for the I.C. can be computed from the relative letter frequencies fi of the source language:

ICexpected=i=1cfi21/c.

If all c letters of an alphabet were equally distributed, the expected index would be 1.0. The actual monographic I.C. for telegraphic English text is around 1.73, reflecting the unevenness of natural-language letter distributions.

Sometimes values are reported without the normalizing denominator, for example 0.067=1.73/26 for English; such values may be called κp ("kappa-plaintext") rather than "I.C.", with κr ("kappa-random") used to denote the denominator 1/c (which is the expected coincidence rate for a uniform distribution of the same alphabet, 0.0385=1/26 for English).

Application

The index of coincidence is useful both in the analysis of natural-language plaintext and in the analysis of ciphertext (cryptanalysis). Even when only ciphertext is available for testing and plaintext letter identities are disguised, coincidences in ciphertext can be caused by coincidences in the underlying plaintext. This technique is used to cryptanalyze the Vigenère cipher, for example. For a repeating-key polyalphabetic cipher arranged into a matrix, the coincidence rate within each column will usually be highest when the width of the matrix is a multiple of the key length, and this fact can be used to determine the key length, which is the first step in cracking the system.

Coincidence counting can help determine when two texts are written in the same language using the same alphabet. (This technique has been used to examine the purported Bible code). The causal coincidence count for such texts will be distinctly higher than the accidental coincidence count for texts in different languages, or texts using different alphabets, or gibberish texts.

To see why, imagine an "alphabet" of only the two letters A and B. Suppose that in our "language", the letter A is used 75% of the time, and the letter B is used 25% of the time. If two texts in this language are laid side by side, then the following pairs can be expected:

Pair Probability
AA 56.25%
BB 6.25%
AB 18.75%
BA 18.75%

Overall, the probability of a "coincidence" is 62.5% (56.25% for AA + 6.25% for BB).

Now consider the case when both messages are encrypted using the simple monoalphabetic substitution cipher which replaces A with B and vice versa:

Pair Probability
AA 6.25%
BB 56.25%
AB 18.75%
BA 18.75%

The overall probability of a coincidence in this situation is 62.5% (6.25% for AA + 56.25% for BB), exactly the same as for the unencrypted "plaintext" case. In effect, the new alphabet produced by the substitution is just a uniform renaming of the original character identities, which does not affect whether they match.

Now suppose that only one message (say, the second) is encrypted using the same substitution cipher (A,B)→(B,A). The following pairs can now be expected:

Pair Probability
AA 18.75%
BB 18.75%
AB 56.25%
BA 6.25%

Now the probability of a coincidence is only 37.5% (18.75% for AA + 18.75% for BB). This is noticeably lower than the probability when same-language, same-alphabet texts were used. Evidently, coincidences are more likely when the most frequent letters in each text are the same.

The same principle applies to real languages like English, because certain letters, like E, occur much more frequently than other letters—a fact which is used in frequency analysis of substitution ciphers. Coincidences involving the letter E, for example, are relatively likely. So when any two English texts are compared, the coincidence count will be higher than when an English text and a foreign-language text are used.

It can easily be imagined that this effect can be subtle. For example, similar languages will have a higher coincidence count than dissimilar languages. Also, it isn't hard to generate random text with a frequency distribution similar to real text, artificially raising the coincidence count. Nevertheless, this technique can be used effectively to identify when two texts are likely to contain meaningful information in the same language using the same alphabet, to discover periods for repeating keys, and to uncover many other kinds of nonrandom phenomena within or among ciphertexts.

Expected values for various languages[3] are:
Language Index of Coincidence
English 1.73
French 2.02
German 2.05
Italian 1.94
Portuguese 1.94
Russian 1.76
Spanish 1.94

Generalization

The above description is only an introduction to use of the index of coincidence, which is related to the general concept of correlation. Various forms of Index of Coincidence have been devised; the "delta" I.C. (given by the formula above) in effect measures the autocorrelation of a single distribution, whereas a "kappa" I.C. is used when matching two text strings.[4] Although in some applications constant factors such as c and N can be ignored, in more general situations there is considerable value in truly indexing each I.C. against the value to be expected for the null hypothesis (usually: no match and a uniform random symbol distribution), so that in every situation the expected value for no correlation is 1.0. Thus, any form of I.C. can be expressed as the ratio of the number of coincidences actually observed to the number of coincidences expected (according to the null model), using the particular test setup.

From the foregoing, it is easy to see that the formula for kappa I.C.' is

IC=j=1N[aj=bj]N/c,

where N is the common aligned length of the two texts A and B, and the bracketed term is defined as 1 if the j-th letter of text A matches the j-th letter of text B, otherwise 0.

A related concept, the "bulge" of a distribution, measures the discrepancy between the observed I.C. and the null value of 1.0. The number of cipher alphabets used in a polyalphabetic cipher may be estimated by dividing the expected bulge of the delta I.C. for a single alphabet by the observed bulge for the message, although in many cases (such as when a repeating key was used) better techniques are available.

Example

As a practical illustration of the use of I.C., suppose that we have intercepted the following ciphertext message:

QPWKA LVRXC QZIKG RBPFA EOMFL  JMSDZ VDHXC XJYEB IMTRQ WNMEA
IZRVK CVKVL XNEIC FZPZC ZZHKM  LVZVZ IZRRQ WDKEC HOSNY XXLSP
MYKVQ XJTDC IOMEE XDQVS RXLRL  KZHOV

(The grouping into five characters is just a telegraphic convention and has nothing to do with actual word lengths.) Suspecting this to be an English plaintext encrypted using a Vigenère cipher with normal A–Z components and a short repeating keyword, we can consider the ciphertext "stacked" into some number of columns, for example seven:

QPWKALV
RXCQZIK
GRBPFAE
OMFLJMS
DZVDHXC
XJYEBIM
TRQWN…

If the key size happens to have been the same as the assumed number of columns, then all the letters within a single column will have been enciphered using the same key letter, in effect a simple Caesar cipher applied to a random selection of English plaintext characters. The corresponding set of ciphertext letters should have a roughness of frequency distribution similar to that of English, although the letter identities have been permuted (shifted by a constant amount corresponding to the key letter). Therefore if we compute the aggregate delta I.C. for all columns ("delta bar"), it should be around 1.73. On the other hand, if we have incorrectly guessed the key size (number of columns), the aggregate delta I.C. should be around 1.00. So we compute the delta I.C. for assumed key sizes from one to ten:

Size Delta-bar I.C.
1 1.12
2 1.19
3 1.05
4 1.17
5 1.82
6 0.99
7 1.00
8 1.05
9 1.16
10 2.07

We see that the key size is most likely five. If the actual size is five, we would expect a width of ten to also report a high I.C., since each of its columns also corresponds to a simple Caesar encipherment, and we confirm this. So we should stack the ciphertext into five columns:

QPWKA
LVRXC
QZIKG
RBPFA
EOMFL
JMSDZ
VDH…

We can now try to determine the most likely key letter for each column considered separately, by performing trial Caesar decryption of the entire column for each of the 26 possibilities A–Z for the key letter, and choosing the key letter that produces the highest correlation between the decrypted column letter frequencies and the relative letter frequencies for normal English text. That correlation, which we don't need to worry about normalizing, can be readily computed as

χ=i=1cnifi

where ni are the observed column letter frequencies and fi are the relative letter frequencies for English. When we try this, the best-fit key letters are reported to be "EVERY," which we recognize as an actual word, and using that for Vigenère decryption produces the plaintext:

MUSTC HANGE MEETI NGLOC ATION FROMB RIDGE TOUND ERPAS 
SSINC EENEM YAGEN TSARE BELIE VEDTO HAVEB EENAS SIGNE 
DTOWA TCHBR IDGES TOPME ETING TIMEU NCHAN GEDXX

from which one obtains:

MUST CHANGE MEETING LOCATION FROM BRIDGE TO UNDERPASS
SINCE ENEMY AGENTS ARE BELIEVED TO HAVE BEEN ASSIGNED
TO WATCH BRIDGE STOP  MEETING TIME UNCHANGED  XX

after word divisions have been restored at the obvious positions. "XX" are evidently "null" characters used to pad out the final group for transmission.

This entire procedure could easily be packaged into an automated algorithm for breaking such ciphers. Due to normal statistical fluctuation, such an algorithm will occasionally make wrong choices, especially when analyzing short ciphertext messages.

References

  1. Template:Cite document The original application ignored normalization.
  2. One of the biggest reasons investing in a Singapore new launch is an effective things is as a result of it is doable to be lent massive quantities of money at very low interest rates that you should utilize to purchase it. Then, if property values continue to go up, then you'll get a really high return on funding (ROI). Simply make sure you purchase one of the higher properties, reminiscent of the ones at Fernvale the Riverbank or any Singapore landed property Get Earnings by means of Renting

    In its statement, the singapore property listing - website link, government claimed that the majority citizens buying their first residence won't be hurt by the new measures. Some concessions can even be prolonged to chose teams of consumers, similar to married couples with a minimum of one Singaporean partner who are purchasing their second property so long as they intend to promote their first residential property. Lower the LTV limit on housing loans granted by monetary establishments regulated by MAS from 70% to 60% for property purchasers who are individuals with a number of outstanding housing loans on the time of the brand new housing purchase. Singapore Property Measures - 30 August 2010 The most popular seek for the number of bedrooms in Singapore is 4, followed by 2 and three. Lush Acres EC @ Sengkang

    Discover out more about real estate funding in the area, together with info on international funding incentives and property possession. Many Singaporeans have been investing in property across the causeway in recent years, attracted by comparatively low prices. However, those who need to exit their investments quickly are likely to face significant challenges when trying to sell their property – and could finally be stuck with a property they can't sell. Career improvement programmes, in-house valuation, auctions and administrative help, venture advertising and marketing, skilled talks and traisning are continuously planned for the sales associates to help them obtain better outcomes for his or her shoppers while at Knight Frank Singapore. No change Present Rules

    Extending the tax exemption would help. The exemption, which may be as a lot as $2 million per family, covers individuals who negotiate a principal reduction on their existing mortgage, sell their house short (i.e., for lower than the excellent loans), or take part in a foreclosure course of. An extension of theexemption would seem like a common-sense means to assist stabilize the housing market, but the political turmoil around the fiscal-cliff negotiations means widespread sense could not win out. Home Minority Chief Nancy Pelosi (D-Calif.) believes that the mortgage relief provision will be on the table during the grand-cut price talks, in response to communications director Nadeam Elshami. Buying or promoting of blue mild bulbs is unlawful.

    A vendor's stamp duty has been launched on industrial property for the primary time, at rates ranging from 5 per cent to 15 per cent. The Authorities might be trying to reassure the market that they aren't in opposition to foreigners and PRs investing in Singapore's property market. They imposed these measures because of extenuating components available in the market." The sale of new dual-key EC models will even be restricted to multi-generational households only. The models have two separate entrances, permitting grandparents, for example, to dwell separately. The vendor's stamp obligation takes effect right this moment and applies to industrial property and plots which might be offered inside three years of the date of buy. JLL named Best Performing Property Brand for second year running

    The data offered is for normal info purposes only and isn't supposed to be personalised investment or monetary advice. Motley Fool Singapore contributor Stanley Lim would not personal shares in any corporations talked about. Singapore private home costs increased by 1.eight% within the fourth quarter of 2012, up from 0.6% within the earlier quarter. Resale prices of government-built HDB residences which are usually bought by Singaporeans, elevated by 2.5%, quarter on quarter, the quickest acquire in five quarters. And industrial property, prices are actually double the levels of three years ago. No withholding tax in the event you sell your property. All your local information regarding vital HDB policies, condominium launches, land growth, commercial property and more

    There are various methods to go about discovering the precise property. Some local newspapers (together with the Straits Instances ) have categorised property sections and many local property brokers have websites. Now there are some specifics to consider when buying a 'new launch' rental. Intended use of the unit Every sale begins with 10 p.c low cost for finish of season sale; changes to 20 % discount storewide; follows by additional reduction of fiftyand ends with last discount of 70 % or extra. Typically there is even a warehouse sale or transferring out sale with huge mark-down of costs for stock clearance. Deborah Regulation from Expat Realtor shares her property market update, plus prime rental residences and houses at the moment available to lease Esparina EC @ Sengkang Published in two parts.
  3. 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

    My blog: http://www.primaboinca.com/view_profile.php?userid=5889534
  4. 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.

    My blog: http://www.primaboinca.com/view_profile.php?userid=5889534

See also

Template:Cryptography navbox