Main Page: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
{{Refimprove|date=September 2009}}
{{Otheruses4|the transmission of data across noisy channels|the storage of text in computers|Variable-width encoding}}
In [[coding theory]] a '''variable-length code''' (VLC) is a [[code]] which  maps source symbols to a ''variable'' number of bits.


In [[statistics]], a '''truncated distribution''' is a [[conditional distribution]] that results from restricting the domain of some other [[probability distribution]]. Truncated distributions arise in practical statistics in cases where the ability to record, or even to know about, occurrences is limited to values which lie above or below a given threshold or within a specified range. For example, if the dates of birth of children in a school are examined, these would typically be subject to truncation relative to those of all children in the area given that the school accepts only children in a given age range on a specific date. There would be no information about how many children in the locality had dates of birth before or after the school's cutoff dates if only a direct approach to the school were used to obtain information.
Variable-length codes can allow sources to be [[data compression|compressed]] and decompressed with ''zero'' error ([[lossless data compression]]) and still be read back symbol by symbol.  With the right coding strategy an [[independent and identically-distributed random variables|independent and identically-distributed source]] may be compressed almost arbitrarily close to its [[information entropy|entropy]]. This is in contrast to fixed length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.


Where sampling is such as to retain knowledge of items that fall outside the required range, without recording the actual values, this is known as [[Censoring (statistics)|censoring]], as opposed to the [[Truncation (statistics)|truncation]] here.<ref>Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms''. OUP. ISBN 0-19-020613-9 {{Please check ISBN|reason=Check digit (9) does not correspond to calculated figure.}}</ref>
Some examples of well-known variable-length coding strategies are [[Huffman coding]], [[Lempel–Ziv|Lempel–Ziv coding]] and [[arithmetic coding]].


==Definition==
== Codes and their extensions ==
{{Probability distribution|
  name      =Truncated Distribution|
  type      =density|
  pdf_image  =<!-- Deleted image removed: [[File:Truncation.gif|Probability density function for a truncated standard normal distribution, truncated at -1 and 1 {{Deletable image-caption|date=May 2012}}]] --><br /><small>The red line is a truncated standard normal distribution, truncated at -1 and 1</small>|
  cdf_image  =[[File:Truncation CDF.gif|Cumulative distribution function for a trucated standard normal distribution truncated at -1 and 1]] |
  parameters = The parameters of <math> f(x) </math>, plus <math> a </math> and <math> b </math>|
  support    =<math>x \in (a,b]</math>|
  pdf        =<math>\frac{g(x)}{F(b)-F(a)} </math>|
  cdf        =<math>\frac{\int_a^xg(t)dt}{F(b)-F(a)} </math>|
  mean      =<math>\frac{\int_a^b x g(x) dx}{F(b)-F(a)} </math>|
  median    =|
  mode      =|
  variance  =|
  skewness  =|
  kurtosis  =|
  entropy    =|
  mgf        =|
  char      =|


}}
The extension of a code is the mapping of finite length source sequences to finite length bit strings, that is obtained by concatenating for each symbol of the source sequence the corresponding codeword produced by the original code.


The following discussion is in terms of a random variable having a [[continuous distribution]] although the same ideas apply to [[discrete distribution]]s. Similarly, the discussion assumes that truncation is to a semi-open interval ''y'' ∈ (''a,b''] but other possibilities can be handled straightforwardly.  
Using terms from [[formal language theory]], the precise mathematical definition is as follows: Let <math>S</math> and <math>T</math> be two finite sets, called the source and target [[alphabet (computer science)|alphabets]], respectively. A '''code''' <math>C: S \to T^*</math> is a [[total function]] mapping each symbol from <math>S</math> to a [[Word (data type)|sequence of symbols]] over <math>T</math>, and the extension of <math>C</math> to a [[Homomorphism#Homomorphisms_and_e-free_homomorphisms_in_formal_language_theory|homomorphism]] of <math>S^*</math> into <math>T^*</math>, which naturally maps each sequence of source symbols to a sequence of target symbols, is referred to as its '''extension'''.


Suppose we have a random variable, <math> X </math> that is distributed according to some probability density function, <math> f(x) </math>, with cumulative distribution function <math> F(x) </math> both of which have infinite [[Support (mathematics)|support]].  Suppose we wish to know the probability density of the random variable after restricting the support to be between two constants so that the support,  <math> y = (a,b] </math>.  That is to say, suppose we wish to know how <math> X </math> is distributed given <math> a < X \leq b </math>.
== Classes of variable-length codes ==


:<math>f(x|a < X \leq b) = \frac{g(x)}{F(b)-F(a)} = Tr(x)</math>
Variable-length codes can be strictly nested in order of decreasing generality as non-singular codes, uniquely decodable codes and prefix codes. Prefix codes are always uniquely decodable, and these in turn are always non-singular:


where <math>g(x) = f(x)</math> for all <math> a <x \leq b </math> and <math> g(x) = 0 </math> everywhere else.  Notice that <math>Tr(x)</math> has the same support as  <math>g(x)</math>.
=== Non-singular codes ===


There is, unfortunately, an ambiguity about the term Truncated Distribution. When one refers to a truncated distribution one could be referring to <math> g(x) </math> where one has removed the parts from the distribution <math> f(x) </math> but not scaled up the distribution, or one could be referring to the <math> Tr(x)</math>. In general, <math> g(x) </math> is not a probability density function since it does not integrate to one, whereas <math> Tr(x)</math> is a probability density function.  In this article, a truncated distribution refers to <math> Tr(x)</math>
A code is '''non-singular''' if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is [[injective]].
* For example the mapping <math>M_1 = \{\, a\mapsto 0, b\mapsto 0, c\mapsto 1\,\}</math> is '''not''' non-singular because both "a" and "b" map to the same bit string "0" ; any extension of this mapping will generate a lossy (non-lossless) coding. Such singular coding may still be useful when some loss of information is acceptable (for example when such code is used in audio or video compression, where a lossy coding becomes equivalent to source [[Quantization (signal processing)|quantization]]).
* However, the mapping <math>M_2 = \{\, a \mapsto 1, b \mapsto 011, c\mapsto 01110, d\mapsto 1110, e\mapsto 10011\,\}</math> is non-singular ; its extension will generate a lossless coding, which will be useful for general data transmission (but this feature is not always required). Note that it is not necessary for the non-singular code to be more compact than the source (and in many applications, a larger code is useful, for example as a way to detect and/or recover from encoding or transmission errors, or in security applications to protect a source from undetectable tampering).


Notice that in fact <math>f(x|a < X \leq b)</math> is a distribution:
=== Uniquely decodable codes ===
:<math>\int_{a}^{b} f(x|a < X \leq b)dx = \frac{1}{F(b)-F(a)} \int_{a}^{b} g(x) dx = 1 </math>.


Truncated distributions need not have parts removed from the top and bottom. A truncated distribution where just the bottom of the distribution has been removed is as follows:
A code is '''uniquely decodable''' if its extension is non-singular. Whether a given code is uniquely decodable can be decided with the [[Sardinas–Patterson algorithm]].
* The mapping <math>M_3 = \{\, a\mapsto 0, b\mapsto 01, c\mapsto 011\,\}</math> is uniquely decodable (this can be demonstrated by looking at the ''follow-set'' after each target bit string in the map, because each bitstring is terminated as soon as we see a 0 bit which cannot follow any existing code to create a longer valid code in the map, but unambiguously starts a new code).
* Consider again the code  <math>M_2</math> from the previous section. This code, which is based on an example found in,<ref>Berstel et al. (2009), Example 2.3.1, p. 63</ref> is '''not''' uniquely decodable, since the string ''011101110011'' can be interpreted as the sequence of codewords ''01110–1110 – 011'', but also as the sequence of codewords ''011 – 1 – 011 – 10011''. Two possible decodings of this encoded string are thus given by ''cdb'' and ''babe''. However, such a code is useful when the set of all possible source symbols is completely known and finite, or when there are restrictions (for example a formal syntax) that determine if source elements of this extension are acceptable. Such restrictions permit the decoding of the original message by checking which of the possible source symbols mapped to the same symbol are valid under those restrictions.


:<math>f(x|X>y) = \frac{g(x)}{1-F(y)}</math>
=== Prefix codes ===
{{Main|Prefix code}}


where <math>g(x) = f(x)</math> for all <math> y < x </math> and <math> g(x) = 0 </math> everywhere else, and <math>F(x)</math> is the [[cumulative distribution function]].
A code is a '''prefix code''' if no target bit string in the mapping is a prefix of the target bit string of a different source symbol in the same mapping. This means that symbols can be decoded instantaneously after their entire codeword is received. Other commonly used names for this concept are '''prefix-free code''', '''instantaneous code''', or '''context-free code'''.
* The example mapping <math>M_3</math> in the previous paragraph is '''not''' a prefix code because we don't know after reading the bit string "0" if it encodes a "a" source symbol, or if it is the prefix of the encodings of the "b" or "c" symbols.
* An example of a prefix code is shown below.
{| class="wikitable" style="text-align:center; position: relative; left: 1in;" |
|-
! Symbol !! Codeword
|-
| a || 0
|-
| b || 10
|-
| c || 110
|-
| d || 111
|}
:: Example of encoding and decoding:
::: aabacdab → 00100110111010 → |0|0|10|0|110|111|0|10| → aabacdab


A truncated distribution where the top of the distribution has been removed is as follows:
A special case of prefix codes are [[block code]]s. Here all codewords must have the same length. The latter are not very useful in the context of [[data compression|source coding]], but often serve as [[forward error correction|error correcting codes]] in the context of [[channel coding]].


:<math>f(x|X \leq y) = \frac{g(x)}{F(y)}</math>
== Advantages ==


where <math>g(x) = f(x)</math> for all <math> x \leq y </math> and <math> g(x) = 0 </math> everywhere else, and <math>F(x)</math> is the [[cumulative distribution function]].
The advantage of a variable-length code is that unlikely source symbols can be assigned longer codewords and likely source symbols can be assigned shorter codewords, thus giving a low [[Expected_value|''expected'']] codeword length. For the above example, if the probabilities of (a, b, c, d) were <math>\textstyle\left(\frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \frac{1}{8}\right)</math>, the expected number of bits used to represent a source symbol using the code above would be:
 
:: <math>1\times\frac{1}{2}+2\times\frac{1}{4}+3\times\frac{1}{8}+3\times\frac{1}{8}=\frac{7}{4}</math>.
== Expectation of truncated random variable ==
As the entropy of this source is 1.7500 bits per symbol, this code compresses the source as much as possible so that the source can be recovered with ''zero'' error.
Suppose we wish to find the expected value of a random variable distributed according to the density <math> f(x) </math> and a cumulative distribution of <math> F(x) </math> given that the random variable, <math> X </math>, is greater than some known value <math> y </math>. The expectation of a truncated random variable is thus:
 
<math> E(X|X>y) = \frac{\int_y^\infty x g(x) dx}{1 - F(y)} </math>
 
where again <math> g(x) </math> is <math>g(x) = f(x)</math> for all <math> y < x </math> and <math> g(x) = 0 </math> everywhere else.
 
Letting <math> a </math> and <math> b </math> be the lower and upper limits respectively of support for <math>f(x)</math> (i.e. the original density) properties of <math> E(u(X)|X>y) </math> where <math>u(X)</math> is some continuous function of <math> X </math> with a continuous derivative and where <math> f(x) </math> is assumed continuous include:
 
(i)  <math> \lim_{y \to a} E(u(X)|X>y) = E(u(X)) </math>
 
(ii)  <math> \lim_{y \to b} E(u(X)|X>y) = u(b) </math>
 
(iii) <math> \frac{\partial}{\partial y}[E(u(X)|X>y)] = \frac{f(y)}{1-F(y)}[E(u(X)|X>y) - u(y)] </math>
 
(iv)  <math> \lim_{y \to a}\frac{\partial}{\partial y}[E(u(X)|X>y)] = f(a)[E(u(X)) - u(a)] </math>
 
(v)  <math> \lim_{y \to b}\frac{\partial}{\partial y}[E(u(X)|X>y)] = \frac{1}{2}u'(b) </math>
 
Provided that the limits exist, that is: <math> \lim_{y \to c} u'(y) = u'(c) </math>, <math> \lim_{y \to c} u(y) = u(c) </math> and <math>\lim_{y \to c} f(y) = f(c) </math> where <math> c </math> represents either <math>a</math> or <math> b</math>.
 
==Examples==
 
The [[truncated normal distribution]] is an important example.<ref> Johnson, N.L., Kotz, S., Balakrishnan, N. (1994) ''Continuous Univariate Distributions, Volume 1'', Wiley. ISBN 0-471-58495-9 (Section 10.1)</ref>
 
The [[Tobit model]] employs truncated distributions.
 
== Random truncation ==
Suppose we have the following set up: a truncation value, <math>t</math>, is selected at random from a density, <math>g(t)</math>, but this value is not observed.  Then a value, <math>x</math>, is selected at random from the truncated distribution, <math>f(x|t)=Tr(x)</math>.  Suppose we observe <math>x</math> and wish to update our belief about the density of <math>t</math> given the observation.
 
First, by definition:  
 
:<math>f(x)=\int_{x}^{\infty} f(x|t)g(t)dt </math>, and
:<math>F(a)=\int_{-\infty}^a[\int_{x}^{\infty} f(x|t)g(t)dt]dx .</math>
 
Notice that <math>t</math> must be greater than <math>x</math>, hence when we integrate over <math>t</math>, we set a lower bound of <math>x</math>. The functions <math>f(x)</math> and <math>F(x)</math> are the unconditional density and unconditional cumulative distribution function, respectively.
 
By [[Bayes' rule]],
 
:<math>g(t|x)= \frac{f(x|t)g(t)}{f(x)} ,</math>
 
which expands to
 
:<math>g(t|x) = \frac{f(x|t)g(t)}{\int_{x}^{\infty} f(x|t)g(t)dt} .</math>
 
=== Two uniform distributions (example) ===
Suppose we know that ''t'' is uniformly distributed from [0,''T''] and ''x''|''t'' is distributed uniformly on [0,''t''].  Let ''g''(''t'') and ''f''(''x''|''t'') be the densities that describe ''t'' and ''x'' respectively.  Suppose we observe a value of ''x'' and wish to know the distribution of ''t'' given that value of ''x''.
 
:<math>g(t|x) =\frac{f(x|t)g(t)}{f(x)} = \frac{1}{t(\ln(t) - \ln(x))} \quad \text{for all } t > x .</math>
 
==See also==
 
*[[Truncated mean]]


== Notes ==
<references/>
==References==
==References==
<references/>
* {{cite book | last1=Berstel | first1=Jean | last2=Perrin | first2=Dominique | last3=Reutenauer | first3=Christophe | title=Codes and automata | series=Encyclopedia of Mathematics and its Applications | volume=129 | location=Cambridge | publisher=[[Cambridge University Press]] | year=2010 | isbn=978-0-521-88831-8 | zbl=1187.94001 }}  [http://www-igm.univ-mlv.fr/~berstel/LivreCodes/Codes.html Draft available online]
 


[[Category:Theory of probability distributions]]
{{Compression Methods}}
[[Category:Types of probability distributions]]


[[fr:Loi tronquée]]
[[Category:Coding theory]]
[[Category:Lossless compression algorithms]]

Revision as of 12:52, 15 August 2014

Template:Otheruses4 In coding theory a variable-length code (VLC) is a code which maps source symbols to a variable number of bits.

Variable-length codes can allow sources to be compressed and decompressed with zero error (lossless data compression) and still be read back symbol by symbol. With the right coding strategy an independent and identically-distributed source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.

Some examples of well-known variable-length coding strategies are Huffman coding, Lempel–Ziv coding and arithmetic coding.

Codes and their extensions

The extension of a code is the mapping of finite length source sequences to finite length bit strings, that is obtained by concatenating for each symbol of the source sequence the corresponding codeword produced by the original code.

Using terms from formal language theory, the precise mathematical definition is as follows: Let S and T be two finite sets, called the source and target alphabets, respectively. A code C:ST* is a total function mapping each symbol from S to a sequence of symbols over T, and the extension of C to a homomorphism of S* into T*, which naturally maps each sequence of source symbols to a sequence of target symbols, is referred to as its extension.

Classes of variable-length codes

Variable-length codes can be strictly nested in order of decreasing generality as non-singular codes, uniquely decodable codes and prefix codes. Prefix codes are always uniquely decodable, and these in turn are always non-singular:

Non-singular codes

A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is injective.

  • For example the mapping M1={a0,b0,c1} is not non-singular because both "a" and "b" map to the same bit string "0" ; any extension of this mapping will generate a lossy (non-lossless) coding. Such singular coding may still be useful when some loss of information is acceptable (for example when such code is used in audio or video compression, where a lossy coding becomes equivalent to source quantization).
  • However, the mapping M2={a1,b011,c01110,d1110,e10011} is non-singular ; its extension will generate a lossless coding, which will be useful for general data transmission (but this feature is not always required). Note that it is not necessary for the non-singular code to be more compact than the source (and in many applications, a larger code is useful, for example as a way to detect and/or recover from encoding or transmission errors, or in security applications to protect a source from undetectable tampering).

Uniquely decodable codes

A code is uniquely decodable if its extension is non-singular. Whether a given code is uniquely decodable can be decided with the Sardinas–Patterson algorithm.

  • The mapping M3={a0,b01,c011} is uniquely decodable (this can be demonstrated by looking at the follow-set after each target bit string in the map, because each bitstring is terminated as soon as we see a 0 bit which cannot follow any existing code to create a longer valid code in the map, but unambiguously starts a new code).
  • Consider again the code M2 from the previous section. This code, which is based on an example found in,[1] is not uniquely decodable, since the string 011101110011 can be interpreted as the sequence of codewords 01110–1110 – 011, but also as the sequence of codewords 011 – 1 – 011 – 10011. Two possible decodings of this encoded string are thus given by cdb and babe. However, such a code is useful when the set of all possible source symbols is completely known and finite, or when there are restrictions (for example a formal syntax) that determine if source elements of this extension are acceptable. Such restrictions permit the decoding of the original message by checking which of the possible source symbols mapped to the same symbol are valid under those restrictions.

Prefix codes

Mining Engineer (Excluding Oil ) Truman from Alma, loves to spend time knotting, largest property developers in singapore developers in singapore and stamp collecting. Recently had a family visit to Urnes Stave Church.

A code is a prefix code if no target bit string in the mapping is a prefix of the target bit string of a different source symbol in the same mapping. This means that symbols can be decoded instantaneously after their entire codeword is received. Other commonly used names for this concept are prefix-free code, instantaneous code, or context-free code.

  • The example mapping M3 in the previous paragraph is not a prefix code because we don't know after reading the bit string "0" if it encodes a "a" source symbol, or if it is the prefix of the encodings of the "b" or "c" symbols.
  • An example of a prefix code is shown below.
Symbol Codeword
a 0
b 10
c 110
d 111
Example of encoding and decoding:
aabacdab → 00100110111010 → |0|0|10|0|110|111|0|10| → aabacdab

A special case of prefix codes are block codes. Here all codewords must have the same length. The latter are not very useful in the context of source coding, but often serve as error correcting codes in the context of channel coding.

Advantages

The advantage of a variable-length code is that unlikely source symbols can be assigned longer codewords and likely source symbols can be assigned shorter codewords, thus giving a low expected codeword length. For the above example, if the probabilities of (a, b, c, d) were (12,14,18,18), the expected number of bits used to represent a source symbol using the code above would be:

1×12+2×14+3×18+3×18=74.

As the entropy of this source is 1.7500 bits per symbol, this code compresses the source as much as possible so that the source can be recovered with zero error.

Notes

  1. Berstel et al. (2009), Example 2.3.1, p. 63

References

Template:Compression Methods