Conversion between quaternions and Euler angles: Difference between revisions
Removed incorrect equations |
|||
Line 1: | Line 1: | ||
'''Redundancy''' in [[information theory]] is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. [[Data compression]] is a way to reduce or eliminate unwanted redundancy, while [[checksum]]s are a way of adding desired redundancy for purposes of [[error detection]] when communicating over a noisy channel of limited [[channel capacity|capacity]]. | |||
==Quantitative definition== | |||
In describing the redundancy of raw data, recall that the '''[[Entropy rate|rate]]''' of a source of information is the average [[Information entropy|entropy]] per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a [[stochastic process]], it is | |||
:<math>r = \lim_{n \to \infty} \frac{1}{n} H(M_1, M_2, \dots M_n),</math> | |||
the limit, as ''n'' goes to infinity, of the [[joint entropy]] of the first ''n'' symbols divided by ''n''. It is common in information theory to speak of the "rate" or "[[Information entropy|entropy]]" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memoryless source is simply <math>H(M)</math>, since by definition there is no interdependence of the successive messages of a memory less source. | |||
The '''absolute rate''' of a language or source is simply | |||
:<math>R = \log |\mathbb M| ,\,</math> | |||
the [[logarithm]] of the [[cardinality]] of the message space, or alphabet. (This formula is sometimes called the [[Hartley function]].) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memory less and has a [[Uniform distribution (discrete)|uniform distribution]]. | |||
The '''absolute redundancy''' can then be defined as | |||
:<math> D = R - r ,\,</math> | |||
the difference between the absolute rate and the rate. | |||
The quantity <math>\frac D R</math> is called the '''relative redundancy''' and gives the maximum possible [[data compression ratio]], when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity <math>R : r</math> gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is '''efficiency''', defined as <math>\frac r R ,</math> so that <math>\frac r R + \frac D R = 1</math>. A memory less source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed. | |||
== Other notions of redundancy == | |||
A measure of ''redundancy'' between two variables is the [[mutual information]] or a normalized variant. A measure of redundancy among many variables is given by the [[total correlation]]. | |||
Redundancy of compressed data refers to the difference between the [[expected value|expected]] compressed data length of <math>n</math> messages <math>L(M^n) \,\!</math> (or expected data rate <math>L(M^n)/n \,\!</math>) and the entropy <math>nr \,\!</math> (or entropy rate <math>r \,\!</math>). (Here we assume the data is [[ergodicity|ergodic]] and [[Stationary process|stationary]], e.g., a memoryless source.) Although the rate difference <math>L(M^n)/n-r \,\!</math> can be arbitrarily small as <math>n \,\!</math> increased, the actual difference <math>L(M^n)-nr \,\!</math>, cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources. | |||
==See also== | |||
* [[Data compression]] | |||
* [[Hartley function]] | |||
* [[Negentropy]] | |||
* [[Source coding theorem]] | |||
==References== | |||
* {{cite book | first = Fazlollah M. | last = Reza | title = An Introduction to Information Theory | publisher = McGraw-Hill | origyear = 1961| location = New York | publisher = Dover | year = 1994 | isbn = 0-486-68210-2 }} | |||
* {{cite book | first = Bruce | last = Schneier | authorlink = Bruce Schneier | title = Applied Cryptography: Protocols, Algorithms, and Source Code in C | location =New York | publisher = John Wiley & Sons, Inc. | year = 1996 | isbn = 0-471-12845-7 }} | |||
* {{cite book | last1 = Auffarth | first1 = B | last2 = Lopez-Sanchez | first2 = M. | last3 = Cerquides | first3 = J. | chapter = Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT images | id = {{citeseerx|10.1.1.170.1528}} | title = Advances in Data Mining. Applications and Theoretical Aspects | pages = 248–262 | publisher = Springer | year = 2010 }} | |||
{{Compression Methods}} | |||
[[Category:Information theory]] |
Revision as of 22:50, 14 January 2014
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while checksums are a way of adding desired redundancy for purposes of error detection when communicating over a noisy channel of limited capacity.
Quantitative definition
In describing the redundancy of raw data, recall that the rate of a source of information is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is
the limit, as n goes to infinity, of the joint entropy of the first n symbols divided by n. It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memoryless source is simply , since by definition there is no interdependence of the successive messages of a memory less source.
The absolute rate of a language or source is simply
the logarithm of the cardinality of the message space, or alphabet. (This formula is sometimes called the Hartley function.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memory less and has a uniform distribution.
The absolute redundancy can then be defined as
the difference between the absolute rate and the rate.
The quantity is called the relative redundancy and gives the maximum possible data compression ratio, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is efficiency, defined as so that . A memory less source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed.
Other notions of redundancy
A measure of redundancy between two variables is the mutual information or a normalized variant. A measure of redundancy among many variables is given by the total correlation.
Redundancy of compressed data refers to the difference between the expected compressed data length of messages (or expected data rate ) and the entropy (or entropy rate ). (Here we assume the data is ergodic and stationary, e.g., a memoryless source.) Although the rate difference can be arbitrarily small as increased, the actual difference , cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources.
See also
References
- 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534