# Talk:Information theory/Archive 1

## What's to hope for?

is a question well asked. -- 130.94.162.61 13:52, 19 January 2006 (UTC)

- Something got lost; I don't know how to do edit conflicts. Sorry about that. -- 130.94.162.61 14:00, 19 January 2006 (UTC)

- JA: No, I moved it to the end, per instructions. I always follow advice -- until I start running into walls -- and then I do something else. It's not so much an algorithm, it's more like a guidewire. Jon Awbrey 14:08, 19 January 2006 (UTC)

## Important Information Update

November's *Monthly* just arrived, with an important article by Dan Kalman: "Virtual Empirical Investigation: Concept Formation and Theory Justification". The *Monthly* arrived over a month late, with a **packing ticket** and a **returns policy**.

- The theory got too deep for me, and I ended up in the looney bin. I'll have to leave the article for the time being for others to edit while I recover my sanity. Thank you. -- 130.94.162.61 13:22, 15 January 2006 (UTC)

## Comma inside Quotation mark

O.K., Jheald, logically you are right. The material within the quotation marks *should* represent exactly what it says, without any extraneous punctuation. The English language is not very logical in this respect, however. Punctuation that terminates a clause, such as the comma, period, semicolon, exclamation mark, and question mark, actually belongs *inside* the quotation marks. When a ? or ! appears just before a closing quotation mark, it is considered part of the quoted material, and no additional punctuation is necessary to terminate the clause. Other terminal punctuation is either added or replaced if necessary, but always just inside the closing quotation mark. Yes, this is complicated, totally illogical, and leads to slight ambiguity, but as far as I know, that's just how it is in the English language. -- 130.94.162.61 00:24, 17 December 2005 (UTC)

- That may be the unfortunate standard in some misguided parts of the world, but thankfully it's not the standard on Wikipedia :-) -- Jheald 12:07, 17 December 2005 (UTC).

- O.K. This is getting way off-topic here, but as long as you have a standard, of course you should follow it. I'm sure I've mentioned before that I'm not that familiar with Wikipedia policy. I wasn't even aware that the matter had been duly considered. I know popular publishers all have their quirks, and the sheer volume of material they put out does set de facto standards, so maybe that's where I got the impression I did. -- 130.94.162.64 21:41, 17 December 2005 (UTC)

## Archaic, misnamed, and irrelevant material?

I recommend the following actions:

- Cut terms that were used in 1948 but have fallen out of use, such as 'equivocation', 'Transinformation', and self-information / surprisal. (I think "information content" is fine for the latter.)
- Cut Absolute Rate, Absolute Redundancy, Relative Redundancy, Efficiency. This shopping list of pointless definitions is fortunately still in stub form, so no-one should be offended by cutting them.
- "Fundamental theorem of information theory"? I think this is bad terminology, and I'm not sure that "Shannon-Hartley theorem" is much better. I'd suggest having sections for the main exciting results of information theory. These are:
- THE SOURCE CODING THEOREM. This is skipped over without comment in the current opening section on Entropy. The fact that the entropy measures how much capacity you need to encode a sequence of symbols from a simple redundant source is the first of Shannon's profound results, and it is not obvious: Compression is possible!
- THE NOISY-CHANNEL CODING THEOREM. (currently called "Fundamental theorem of information theory" and "Shannon-Hartley theorem") This needs to be generalized. The Shannon-Hartley page is quite specific to a particular class of channels. There should be a statement of the noisy channel coding theorem for simpler channels such as discrete memoryless channels. When it is stated for simple channels such as the binary symmetric channel, it seems much more amazing. Errors can be detected and corrected reliably, without paying the earth!

155.232.250.35 22:56, 15 December 2005 (UTC) David MacKay

- I have no objection to the cuts of the advanced rate concepts; the article is simply becoming too long. The "archaic" terms, however, should be retained, even if they are only mentioned in parentheses. They appear in a lot of the standard textbooks which were written in the 1960's and are still in use today. A student shouldn't have to pay a lot of money for a textbook just because it uses more "modern" terminology, when there are perfectly good textbooks available for under $20 whose only fault is that they use "archaic" terminology. (I highly recommend Reza's book.) The theory itself hasn't changed one iota since the '60's. Information theory is simply too new as a mathematical field to start calling these terms "archaic." And I specifically object to the removal of the term "self-information" for . "Information content" is far too broad a term and could mean any of the measures of information Shannon defined. (I know, I've ranted about Orwellian "newspeak" before.)

- It would be greatly appreciated if you could elaborate in the article on the source coding theorem and the noisy channel-theorem, as you call them. The latter is called the "fundamental theorem of information theory" because traditionally that is the point where information theory proper stops and coding theory takes off. -- 130.94.162.64 01:24, 16 December 2005 (UTC)

- And specifically, the terms
*transinformation*and*equivocation*have subtle nuances relating to their fundamental importance in the communications process over a noisy channel. These nuances are lost in the more "modern" terms*mutual information*and*conditional entropy*, which just denote arbitrary functions of random variables. I have taken pains to explain these nuances in the article. -- 130.94.162.64 02:09, 16 December 2005 (UTC)

- By the way, I will have to read your book,
*Information Theory, Inference, and Learning Algorithms.*I'm curious. -- 130.94.162.64 03:24, 16 December 2005 (UTC)

I'm happy that the "archaic" terms be mentioned, but I think we should try to make Wikipedia sound brand new too. Enjoy the book! :-)

I'm suggest stripping out all the "Information Theory and Physics" into the linked page on thermodynamics.

- Yes, I agree. Like I mentioned earlier, a short summary should do here. -- 130.94.162.61 22:43, 16 December 2005 (UTC)

- Equivocation und Transinformation should be kept - these are terms that are used even today, if not in English in other languages, and will help readers studying this subject in other languages, like Germans (Äquivokation und Transinformation). TERdON 19:34, 6 April 2006 (UTC)

## Book recommendation

A wonderful introduction to information theory can be found in Jeremy Campbell's book __Grammatical Man__.

- Add a proper reference to it in "Other Books" if it's appropriate. -- 130.94.162.64 03:03, 16 December 2005 (UTC)

The following are very useful:

Devlin, K. Logic and Information, Cambridge University Press (1991)

Dretske, F. Knowledge and the Flow of Information, Bradford Books, MIT Press (1981)

Barwise, Jon and Seligman, Jerry, Information Flow: The Logic of Distributed Systems, Cambridge University Press (1997).

## Major Revision of Article

O.K., I just did a major reorganization of this article and expanded it considerably. I think it's somewhat more in line with Wiki guidelines than it was before, but I need some help from more experienced Wikipedians: Please **be bold** and prettify the article, correct my capitalization, and wikify things I didn't do correctly. I would like this article to be useful both to the layman and to the person with the requisite math knowledge (mainly probability theory) who wants to learn more about the topic. There is still much room for improvement. All the information that was in the article before was sound to the best of my knowledge, and i tried to include all of it in this revision. The way it was organized, however, was an absolute mess. Please make some specific, concrete suggestions here on how to further improve this article. -- 130.94.162.64 01:55, 2 December 2005 (UTC)

### To Do

Here's a start (130.94.162.64 21:02, 2 December 2005 (UTC)):

- History section needs reorganization (& expansion). Start from the beginning---with Boltzmann's H-function and Gibbs' statistical mechanical formulation of thermodynamic entropy (which von Neumann refers to in that oft-quoted story). These are the forerunners of information entropy in the continuous case and the discrete case, respectively. While Shannon's paper
*was*the founding of modern information theory as we know it today, it didn't all just start there. - Correct capitalization of section headings.
- Fill in some of the stubby subsections in "Related Concepts and Applications."
- We need a diagram of a simple model of communication over a noisy channel, but obviously something better than ASCII art.
- Standardize probability notation. It will be difficult to satisfy the probability theorist and the layman with the same notation! Probability theorists tend to use highly formalized notation, and I'm not quite sure that's what we want here, but what we have now is probably
*too*informal. My notation is not gospel, but it had better be (to some standard) correct and reasonably consistent. - Add a section on the fundamental theorem. I am not satisfied with the article on Shannon–Hartley theorem. There should be at least a reasonably formal statement of the theorem in this article, and a proof in that article.
- Section stubs are broken. They point to an edit of the whole article.
- References to Bamford are a little obscure, but they do explain a lot. Need page numbers, if anybody's reading
*Puzzle Palace*. It's a thick dense book, and I forgot what pages they were on. The references were near each other, and concerned some professor studying, nearly word for word, "an obscure branch of a field called cybernetics." Bamford then goes on to mention the well known classification of information sources into the hierarchy described in "source theory" in the article, again nearly word for word, "where each class strictly contains the preceding class." Apparently somebody was feeding Bamford a line about how everything was "classified." If you read this article before you read that book, you would not miss those references.

- Note: Looks like somebody took those references out and reworded, which was actually the right thing to do under the circumstances. Also looks like I'll either look it up myself, or (more probably) leave well enough alone. Sorry for the confusion. -- 130.94.162.61 00:43, 15 December 2005 (UTC)

Deleted the section on "recent research." (Actually, I moved it. See Entropy in thermodynamics and information theory.) It wasn't really relevant here. (Just in case anybody wonders what happened to all those references.) -- 130.94.162.64 03:43, 5 December 2005 (UTC)

I'm a little embarrassed to admit that I don't know what cybernetics is and why information theory belongs to it. The article there doesn't really help much. Can somebody please help explain this to me? Personally, I think of information theory more as a branch of probability theory. -- 130.94.162.64 07:10, 5 December 2005 (UTC)

- Nevertheless, I'm not going to dispute that classification (with all the K-junk, etc.). It might be some kind of gov't standard or official ___information or something like that. Must mean something to somebody. -- 130.94.162.64 01:30, 10 December 2005 (UTC)

Jheald, you've greatly expanded the subsection on "relation with thermodynamic entropy". The article is getting long enough as it is. A short summary is sufficient here. The gory details belong in the main article Entropy in thermodynamics and information theory. The whole reason I put up that article is so that we don't have to clutter up this and other articles with duplicate (and tangential) information. Please incorporate your contributions into the *body* of that article. I know it's exciting stuff, but let's not go off on a tangent in this article with it.

On another note, I am a little hesitant about the quote you added. It's a little misleading (as is Frank's paper in this context). Referring again to Frank's cute little diagram, we can see information entropy and thermodynamic entropy depicted as interchangeable quantities. Frank is mainly interested in *reversible* computing, and therefore he does not consider processes that cause an irreversible rise in total entropy. That is why the total entropy, or "physical information," stays constant in his scheme of things.

A thermodynamically irreversible process causes an increase in the total information content of a closed system, which we call its entropy. This can be generalized to "open" systems by considering the *net* change in entropy caused by a process. No process can ever cause a *net* decrease in entropy. This is the brutal second law of thermodynamics, which has been set in stone for over a hundred years.

On a third note, just a reminder, try to make sure you (or I) are not contributing Original Research. We do need verifiable sources. This means sources that can be verified without highly specialized expertise. -- 130.94.162.64 19:32, 10 December 2005 (UTC)

- Good work, though, and maybe a lot of it should stay. You did explain things a lot more nicely than I did. I'll try not to clobber your edits. -- 130.94.162.64 15:04, 11 December 2005 (UTC)

## Multiple "need attention" requests

This page has three different requests for attention on it.

- On the main page, User:Rvollmert put an attention sticker on it, but didn't (AFAIK) explain what specifically needed attention.
- On Wikipedia:Pages_needing_attention, User:Eequor said, "information theory is leaking information." I don't understand what that means.
- Why would it leak? Because it has holes in it. Kevin Baas | talk 16:51, 2004 Sep 7 (UTC)

- On the Talk page(this page),User:Pcarbonn added a Todo list refering to Wikipedia:WikiProject Science, which, at least, has a clear set of things to do. But what about the other two items?

If anyone knows anything about this, please comment. (or just fix it. :-) )

- I think it's really rather stubby, but I don't find anything grossly objectionable in the article. It's only what's
*not*in the article that is the problem. Michael Hardy 15:36, 7 Sep 2004 (UTC)

- I agree that the article has nothing wrong in it. So I don't see why we would put the "page needing attention" alert in the face of the casual reader. The to-do list is there to plan enhancements. That's why I removed the "Attention" tag in the article, but I'm still open to discussion... Pcarbonn 18:50, 7 Sep 2004 (UTC)

## What should the introduction say ?

In response to Michael Hardy: I don't think that information theory is only of interest to persons who study probability theory as you suggest. Anyone from the general public who would like to understand what "Information theory" is about should be able to find a reasonable answer in this article. So, yes, I believe that the introduction is not satisfactory as it is today. It should clarify what issues information theory helps to understand, what it is used for, and everyday examples where information theory had a role (e.g. compression technologies ?). I studied information theory a while ago, and I do not know all these answers anymore, but would like to refresh my mind... Pcarbonn 18:50, 7 Sep 2004 (UTC)

## Article Conflicts

Some of this article needs to be redone. There are conflicts with another article on information entropy. Mainly this article denies the link between thermodynamical entropy and information entropy. However the article on information entropy affirms this link. I'm not specialized in this field but from my own experience many physicists and information theorists believe in such a link. See R. Landauer and C. Bennett. - Paul M. 22 June 2005

- Okay I reverted some edits and made it so the article agrees with the article on information entropy. Also I added some references which back up my edit. - Paul M. 09:19, 22 June 2005

## Great work on the references

The way several parts are properly linked to original sources is great, and perhaps an initiative should be made to encourage this across wikipedia.
I agree an introduction is necesary more accesisble to the general public. syats

Shannon's paper is copyrighted (Bell, i.e. now AT&T) thus should be removed from wikipedia, shouldn't it ?
--Henri de Solages 17:21, 7 November 2005 (UTC)

- As far as I can see, Shannon's paper isn't on the wikipedia; and the pdf linked at the bottom is to a Bell Labs site. So what is it that is giving you copyright angst? Jheald 19:06, 7 November 2005 (UTC)

## Dembski

Correct me if I'm wrong, but I don't think Dembski uses information theory at all. The "information" in his Complex Specified Information is just 1/probability (and even that is a kludge, since his notion of probability assumes the complete absense of any mechanisms). Unless he actually uses the Shannon notion of information somewhere, the section should either be removed or else have a clarification added. — B.Bryant 01:29, 2 October 2005 (UTC)

## Needs Explanations

- In the very first equation mentioned, the equation determining the ammount of information in a message, the "propability of message " is not defined. The article states only that is the "propability of message " without saying how this number is arrived at. --203.91.90.238 07:29, 21 October 2005

(UTC)

- It should be said in which sense Shannon's formula can be called a "measure".

- Shannon's formula for entropy does behave a lot like a set-theoretic measure. It is additive for independent rvs. i.e. H(X,Y) = H(X) + H(Y). Joint entropy behaves like the measure of a set union, equivocation behaves like the measure of a set difference, and transinformation like the measure of a set intersection. And independent rvs behave like disjoint sets (wrt the information they carry). But how does one nail down just what kind of "sets" one is "measuring" in information theory? Or how can these ideas be added to the article? -- 130.94.162.64 08:04, 3 December 2005 (UTC)
- I added a section on measure theory per request. It boils down to this: it walks like a duck, it looks like a duck, it quacks like a duck, it satisfies the postulates of duck theory. Is it O.K. to call it a duck now? :-) 130.94.162.64 13:41, 3 December 2005 (UTC)

## Maxwell's Demon

Just curious on this quote: "This has been applied to the paradox of Maxwell's demon which would need to process information to reverse thermodynamic entropy; but erasing that information, to begin again, exactly balances out the thermodynamic gain that the demon would otherwise achieve."...Does this imply that a Maxwell's demon could never erase information on the fly, passively using energy applied to the process of reading the information to erase as it goes? From the wiki article on Maxwell's demon, it would seem that it's not the use of energy to erase that is the problem, so much as the act of erasure itself that increases entropy in the system. Or am I missing something?

- I think that's right - at least that is what I was trying to convey, based on what I'd read elsewhere - but I confess I haven't read the paper by Bennett, so I don't know the detail of his argument; and I don't know of any reason to believe it.

- The most straightforward connection (IMO) between information entropy and thermodynamic entropy is Jaynes's point of view (see
*MaxEnt thermodynamics*): simply that the thermodynamics entropy is the quantity of Shannon information which you*don't*know about a physical system; particularly, but not*necessarily*, the Shannon information you don't have if all you do have is a description in terms of the classic thermodynamic variables.

- Note that this entropy depends on
*what you know*, rather than being a property of the system.

- The classic demonstration of this is Leo Szilard setting Maxwell's demon to work a Szilard engine (1929), which really ought to have a wikipedia entry of its own. The idea is that you have a box at temperature T, with a single particle rattling around inside it. The box has two plungers, one at each end. If a demon can know that at an instant the particle is in the right hand side of the box, he can push in the left hand plunger without the particle hitting it - ie with no work. Similarly, if the particle is in the left hand side, the demon can push in the right hand plunger for free. The demon can then extract an energy (kT ln 2) from the particle isothermally pushing the plunger out again.

- The question then (supposing the demon can see through the walls of the box), is what stops the demon repeating the process again and again, essentially creating a perpetual energy pump.

- This I think is where Bennett tries to save the day by insisting the demon must spend some entropy to reset its memory.

- But I haven't mentioned any memory. I think if the demon can see the particle through the walls of the box, it can quite happily go on extracting the energy -- because if the demon knows exactly where the particle is, it doesn't have any entropy, so the demon is free to spend the full entropy over-estimated by assuming all it knew was the classical description.

- However, as I said, I haven't read Bennett's paper, and I don't really know how his argument works. Maybe somebody can tell us? All best, Jheald 22:30, 3 November 2005 (UTC)

- Some of the Landauer and Bennett argument is in the reversible computing article; but maybe the page could use a sharpening. As it stands, I'm still not 100% won over. Jheald 20:46, 4 November 2005 (UTC)

## Shannon information and Turing

According to the current version of the article:

- Recently, however, it has emerged that entropy was defined and used during the Second World War by Alan Turing at Bletchley Park. Turing named it "weight of evidence" and measured it in units called bans and decibans. (This is not to be confused with the weight of evidence defined by I.J. Good and described in the article statistical inference, which Turing also tackled and named "log-odds" or "lods".) Turing and Shannon collaborated during the war but it appears that they independently created the concept. (References are given in Alan Turing: The Enigma by Andrew Hodges.)

I think this doesn't quite hit the mark.

Shannon's contribution was the whole compass of information theory, which is what this article should be focussed on; and not just the idea of a quantitative measure of information.

Shannon's theory gave us the whole communication model, the idea of mutual information, the source coding theorem, the entropy definition, and the noisy channel coding theorem. The work at Bletchley Park didn't address any of this.

What is true is that the development of Shannon's theory at Bell Labs between 1940 and 1945 was hugely influenced by the challenges of cryptography. Without doubt, it was the challenges of encrypting and decrypting of text which got Shannon to focus on discrete, digital information. From then on, the two papers, on information theory and on cryptography, developed together inseparably; and the work for both was substantially complete by 1944.

The work at Bletchley Park in 1940 was focussed almost entirely on a quantity like *log(P(x|H _{1}) - log(P(x|H_{2})*, ie the log of a Bayes factor -- which we can recognise today as the difference between two information-theoretical description lengths -- and whether that was enough to identify a setting of an Enigma machine. Considering expectation values of this, as Turing and Good did, gives a quantity which is essentially the Kullback information (ie the expected information gain), but only in this one particular narrow context.

Shannon certainly covers this ground, in his discussion of "unicity distance" in the cryptography paper. How much he knew of the details of Turing's work, I have no idea. Kullback certainly knew every last thing. But would Shannon have known know the statistical methodology of the Banburismus process? I don't know that he would.

But it's surely Shannon who deserves the palm, because of the generality of his communications model (which he had in mind for analogue signals even in 1939, even though he could not then quantify anything) and consequently the generality of his information measure, which he could then apply to messages generally, giving his H = sum p ln p expectation formula, which is uniquely his, to quantify "how much information is there to communicate, on average"; rather than the much narrower question of how much information could be learnt about a particular inference problem.

And that's also why, if this discussion belongs anywhere, it belongs as a footnote to the "Information Entropy" article, rather than here in the "Information Theory" article, which ought to concentrate on the much bigger picture.

-- Jheald 23:54, 3 November 2005 (UTC)

- I disagree. I think Turing was pretty right on, especially in light of the fact that he discovered computation (i.e. invented the computer) -
*the*information processor. Kevin Baas^{talk: new}13:04, 4 November 2005 (UTC)

- That's as maybe. But Turing's theory of the logic of computation doesn't have a lot to do either with Turing's statistical analysis work at Bletchley Park, nor the
*statistical*analysis of "how you reproduce at one point a message selected at another", which, as the article says, is what information theory is about.

- That's as maybe. But Turing's theory of the logic of computation doesn't have a lot to do either with Turing's statistical analysis work at Bletchley Park, nor the

- Did Turing much influence Shannon? We know that Turing did meet Shannon at Bell Labs in 1943. But it seems most Americans had little interest in the details of the statistical methods he had been using against Engima. Turing wrote of his U.S. visit:

- "I find that comparatively little interest is taken in the Enigma over here apart from the production of the Bombes. I suppose this is natural enough seeing that they do not intercept any of the trafic other than the Shark. Apparently it is also partly on security grounds. Nobody seems to be told about rods or Offizier or Banburismus unless they are really going to do something about it".

- In the event, it was Shannon, nobody else, who established the entropy as a general property of the ensemble of expected messages. And Shannon who gave us the noisy channel coding theorem. Jheald 20:46, 4 November 2005 (UTC)

## "Signal (information theory)" Vs. "Information theory" articles

I was looking for a article on “electrical signals” which is redirected to “Signal (information theory)”. Is the “Signal (information theory)” subject matter related to this article and if so why is there no mention of signals in this article? Freestyle 20:48, 12 November 2005 (UTC)

- I did mention signal and noise in the article. -- 130.94.162.61 13:02, 19 January 2006 (UTC)

## Once over quickly?

JA: I will put some time in over the next couple of weeks with the aim of constructing an "on-ramp" to the information autobahn that we have here presently. I'll start with an intuitive conceptual framework that has informed my own thinking for lo! these many years. It may be rough hitting the right expository level at first, so give me a few tries. Jon Awbrey 14:54, 16 January 2006 (UTC)

- Sounds good to me. Getting better and better. There is more interest in the article now, so I think something good will eventually develop. This article is getting rather long, though, and some of the more detailed sections could be moved or merged into linked articles, with a short blurb left here. Like the template says, we don't want to remove the technical information, but make it more accessible. Feel free to edit or move information to information entropy, joint entropy, mutual information, conditional entropy, etc., or create new articles, even one on channel capacity (communications), for example. -- 130.94.162.61 13:15, 19 January 2006 (UTC)

## Pending tasks

JA: Francis, please read the notice at the top of the talk page, and the referral to WP:WPSci. I personally don't consider guidance like that as anything cast in stone, but it is at the very least worth considering and discussing. The suggested structure is to put the dedicated or detailed history section toward the end. I think that probably makes sense for this type of article, though I think it also makes sense to intersperse apt bits of historical sidelights throughout the article, as the occasion arises. Jon Awbrey 04:20, 19 January 2006 (UTC)

- Hey, sorry if I trod on anyones toes :) - feel free to revert my edits, move it around. Perhaps a brief history could be included in the introduction? I actually came here to try and find out whether information theory could be used in evaluating machine translation. I'm still looking :) - FrancisTyers 13:42, 19 January 2006 (UTC)

## References & Bibliography

JA: The distinction that was "catechized" on me is that the "reference section" (sometimes "endnotes") contains sources cited in the article, while the bibliography contains a more generic invitation to or exhaustion of the pertinent literature. I was getting around to sorting the sources into those two bins, but there is only so much time. Jon Awbrey 05:15, 19 January 2006 (UTC)

- Although that approach is a common one, it is not the only approach, and in fact, it is not the standard approach in most academic journals nor on Wikipedia. Per WP Policy, the section entitled
**References**is supposed to contain a list of bibliographic references, whereas the sources specifically cited within the article are listed in a separate section called**Footnotes.**WP articles do not include a section called**Bibliography.**See WP:CITE for more information. -- Metacomet 05:23, 19 January 2006 (UTC)

JA: Well of course it's not standard in research journals, since there's not supposed to be anything in the references that's not cited in the text, unless, of course, if it's a survey article, which, now that I think of it, is very like an encyclopedia article. But I'm glad we agree that it's just a guideline, which leaves the only question being: What helps the reader? I don't care all that much whether the difference is marked Ref/Bib or Note/Refs, the practical diff for the reader is what sort of list does he or she have to scan to track down the source of a citation, and what sort of list is handy to peruse by itself for further reading. Those are two different sorts of lists. Jon Awbrey 05:56, 19 January 2006 (UTC)

- Yes, they are two different types of lists, and each one is useful for a different purpose. That is why WP policy supports
*both*types of lists. But the policy is quite clear as to the naming conventions for the section headers, as I mentioned above. Moreover, I personally think that the WP policy in this case is effective in terms of providing information to the reader in a form that is easily accessible. -- Metacomet 00:20, 20 January 2006 (UTC)

JA: Ok, then, agreement on the basic principle. "Works cited" and "Further reading" are perhaps the clearest descriptives. It's just that "footnotes" sounds a little archaic in a web context, and many folks expect footnotes to be a numbered list, with the correspondingly superscripted citations in the text. Which is nice if you can get time for it, but a lot a work. Jon Awbrey 03:40, 20 January 2006 (UTC)

- I don't mean to be harsh, but I think you are missing something fundamental here. It really doesn't matter what you or I think about this issue. WP has adopted a
*standard*way of handling citations, and that standard is articulated as a guideline. There is really nothing to argue about, unless you want to try to change the standard, which I would guess is pretty unlikely. Again, if you don't believe me, please read the policy at WP:CITE. In addition, take a look at virtually*any*WP article, and you will see that the terminology that virtually*everyone*is using is as I have described. -- Metacomet 03:46, 20 January 2006 (UTC)

- Instead of all the non-constuctive criticism and bickering, it would be nice if someone would help edit the references section (and citations) in the article. It is admittedly a mess, but even some help with the punctuation and italicization would be nice. Much of the material in the article, especially the mathematical part, is (at least intended to be) fairly standard and can be found in virtually any textbook on the subject. Such material is not usually cited, but I have listed most of the standard textbooks since this is a somewhat obscure subject. Other material I have tried to cite more specifically, but I am not really familiar with Wikipedia standards for citations (I find them confusing) so any help with references will be appreciated. Thank you. -- 130.94.162.64 17:45, 16 February 2006 (UTC)

- JA: I can do that in the odd bits of time. I will put "works cited" under "References" and "further reading" under "Bibliography" and try to convert in-text refs to real refs. The WP standards are guidlines not mandates, and the main thing is to use common sense and provide full data in a way that is easy on the webby-eyes. I'll stick to the citation styles that I'm used to from math and science journals, with a bit of "historical layering" where needed, and try to avoid some of the nonsense of APA style. Jon Awbrey 17:58, 16 February 2006 (UTC)

Metacomet's claims are not supported by the guideline he cites, which says (in bold) *"The system of presenting references in a Wikipedia article may change over time; it is more important to have clarity and consistency in an article than to adhere to any particular system."* He is himself close to wiolating actual policy: see WP:Civil. This should stop. Septentrionalis 00:13, 25 February 2006 (UTC)

- I apologize if I was harsh or uncivil. I came on too strong in my earlier posting. I do, however, still maintain that the WP guideline on Citations does provide standards regarding the format and the naming of sections for references and footnotes. I also disagree with the idea that it is okay to have varying standards from one article to the next. -- Metacomet 02:16, 25 February 2006 (UTC)

- One other thing: I have moved on to other topics. So feel free to do whatever you want. -- Metacomet 02:16, 25 February 2006 (UTC)

I agree with Metacomet here, at least about the section names. Jon Awbrey brought up the key question above: "What helps the reader?". It seems pretty clear to me that the best thing for a reader on Wikipedia is for every article to use the same naming scheme for the reference sections, to help them find what they are looking for quickly. It doesn't really matter if JA thinks some other scheme would be better, or if I do. The utility of having a common standard format for all Wikipedia articles outweighs the merits of any other scheme.--Srleffler 04:19, 25 February 2006 (UTC)

## What's to hope?

is a question well asked here and I believe it can be answered by a general statement of the noisy-channel coding theorem. Thank you. -- 130.94.162.61 13:32, 19 January 2006 (UTC)

- JA: May be so. Classically, that question is associated with the questions of "regulative principles" and "abductive reasoning". It comes up in a big way with the "reference class problem" and in a more modest but constant way with the question of "who determines the prior distribution" in bayesian-type reasoning. But I do have but faint hope of getting to all that here, especially at the rate I'm currently working, and a lot of it would probably go better elsewhere. Jon Awbrey 13:47, 19 January 2006 (UTC)

## Headings...

Why are some of the headings so nonstandard (probably not the right word)? I refer to the headings 1, 1.1, 1.2, and 2. I'm not sure what else it should be (maybe 1 should be "Introduction" 2.1 and 2.2 should be moved up one level and "Looking more closely" should be removed). Superdosh 22:27, 19 January 2006 (UTC)

- JA: Hi, Superdosh. See the "Scarlet Letter" at the top of the talk page. We're in the middle of trying to arrange a more gentle introduction to the heavy slogging that follows. I'll be pre-queling some material that has worked for me in this regard before, plus making some connections to larger horizons, Wiener-schnitzel style. I'm sure it will take a few iterations, an expansion phase, followed by a series of contractions. Watch this space. Feel free to skip the navigation. Jon Awbrey 22:38, 19 January 2006 (UTC)

- I get that. I'm referring to the particular wording of the headings though. They don't sound quite encyclopedic? Superdosh 23:03, 20 January 2006 (UTC)

- JA: I'm not sure that "sounding like an encyclopedia" is the most important thing here, but, yes, the "invitational heads" are quite deliberately colloquial, the second of course being an allusion to W.S. McCulloch. 23:12, 20 January 2006 (UTC)

### Advanced applications?

- JA: Once again pursuant to the requested "attention", I added the adjective "advanced" in anticipation of having two separate sections, one for simple applications and one for more advanced applications, but we can let it ride till we get to that. Jon Awbrey 04:04, 22 January 2006 (UTC)

- The concepts of self-information, conditional entropy or equivocation, joint entropy, and mutual information or transinformation are rather basic to information theory, in my opinion. Not that they are easy or simple to understand, but the theory is based on them, so to speak. This article is getting too long. Don't deconstruct everything. Please try to remember this is an encyclopedia article, not a textbook. Nonetheless I did gloss over some even more basic concepts that you are covering now in great detail. How about another article: Introduction to Information Theory? That may not be quite standard for an encyclopedia article, either, I suppose. Hmm... -- 130.94.162.61 21:05, 8 February 2006 (UTC)

### Cultural horizons, motivational stuff

- JA: Having let it set out in the sun a while, I'll look at these sections again over the next few days and see if they can be KICAS-ified some more. I was just trying to "comply" with the request to provide a bit of a leg-up. Forgive me, I was so much younger and naiver then. Insistence is futile. Jon Awbrey 21:34, 8 February 2006 (UTC)

## Proposal for an Information theory (overview) article

- JA: Mulling over recent discussions, along with the general impress of the various "attention" stickers/stigmata, it does begin to strike me that it might be a good idea to have a gateway, hub, introduction, roundhouse, or overview article to the effect of "Information theory (Overview)". I just tried this out with a similar tangle of articles associated with Relational databases and it seems to be working pretty well so far. But rather than make up a new article title for the overview, with the concomitant Uncertainty over whether to capitalize "overview" or not, and the chance that seekers would not think to look there first, it seems to be a better strategy to make the un-epitheted article, this one, the overview, leaving the introduction, motivation, and history bits here, and moving the rest to a more specific name. So maybe we could think about this for a while. Jon Awbrey 14:20, 10 February 2006 (UTC)

- I think that's a great idea. I'd love to help, if you like. - grubber 18:32, 10 February 2006 (UTC)

- JA: Thanks, the mechanics of it should be pretty straightforward, as I won't touch any of the technical content — though various kinds of housekeeping and reorg may occur to others afterward. The place where I'm drawing a blank is on what to call the new technical article. In the Relational database case that I worked on, they already had motley crue of disjoint technical articles with dedicated/devoted names. Jon Awbrey 19:15, 10 February 2006 (UTC)

- I would suggest that an overview
*replace*this article and the explicit details be relagated to/merged into separate articles, such as information entropy, channel capacity, etc. And I think many sections should be shortened (i.e., the section "Overview" isn't much of an overview). - grubber 19:36, 10 February 2006 (UTC)

- I would suggest that an overview

- JA: Okay, will do, but maybe after the weekend to allow time for more comment. The current "overview" -- that was not my first choice of a name for it -- was wrought by me from some stuff that I have used several times before to motivate the subject for non-math audiences starting from an everyday life context. It was not supposed to be a survey or a synopsis or anything like that but was intended to address the recommendation posted above that we add "more context regarding applications or importance to make it more accessible to a general audience, or at least to technical readers outside this specialty". If there's an admin about, can you tell me the best way to preserve all of the relevant histories in splitting an article into separate articles? Thanks in advance, Jon Awbrey 20:14, 10 February 2006 (UTC)

- As far as "the histories" go, I assume you mean edit history? The current version will forever be in that history, but since this is a
*major*rehaul of the page, I would recommend starting a Information theory/draft page or the like to play around with until it is ready to go "live". There will be a lot of changes, and I doubt it's gonna get done in one sitting. This will allow us to play with it and revisit it a few times without having a "breaking" the main page for a while. - grubber 23:07, 10 February 2006 (UTC)

- As far as "the histories" go, I assume you mean edit history? The current version will forever be in that history, but since this is a

## Misdefinition of equivocation?

If I'm not mistaken, equivocation is currently misdefined as H(X|Y). It should be H(Y|X). Comments? 171.66.222.154 02:46, 15 February 2006 (UTC)

- That depends on how
*X*and*Y*are assigned to the input and output of the channel. I have arbitrarily set*Y*as the input of the channel and*X*as the output, as depicted in the ASCII art diagram (perhaps a confusing choice). I certainly would welcome any assistance setting this straight. I will have to take another look at Shannon's paper and some of my textbooks. (The ones by Reza, Ash, and Pierce are all I have at hand.) Now I'm confused. -- 130.94.162.64 22:53, 15 February 2006 (UTC) - It appears that you are right, though, at least according to Shannon. Thanks for the correction. -- 130.94.162.64 23:06, 15 February 2006 (UTC)
- It also appears that my scheme of
*X*for the received message space and*Y*for the transmitted message space is counterintuitive and confusing, and should be swapped. Any comments? I don't want to make it any more confusing than it needs to be. -- 130.94.162.64 00:02, 16 February 2006 (UTC)

- Much better to stick to "conditional entropy" (or conditional information) wherever possible (as I think you have been changing it to); it's then much more explicit what's conditional on what. I don't think the term "equivocation" has been much used since the 60s.

- IMO it's more intuitive for
*X*to be the original (transmitted) message space, and*Y*to be the space of received (noise-altered) messages. Without looking it up, I can't remember which way round the equivocation is defined. I suspect the equivocation of a channel is usually the channel's equivocation about what will be received given what has been transmitted, ie H(received|transmitted), but I could be wrong. Let's namecheck equivocation once; but after that let's stick to using conditional entropy or conditional information, and not worry any more. -- Jheald 12:25, 16 February 2006 (UTC).

- IMO it's more intuitive for

- That was my original thought, to define "H(received|transmitted)", the a priori conditional entropy, as the "equivocation of the channel", but I believe Shannon uses the a posteriori conditional entropy "H(transmitted|received)", which he simply calls the "equivocation." In any case, the phrase "
**the equivocation of**" is unambiguous and means the same as*X*about*Y***"the conditional entropy of**We should probably drop mention of the phrase "equivocation of the channel", which has apparently generated the confusion. However, "equivocation" is a valid synonym for the "conditional entropy" of one random variable given another as I have just described. Sometimes I just prefer the shorter term.*X*given*Y*." - Luckily we don't need to have the same a priori / a posteriori discussion about the transinformation / mutual information since it is symmetric. However, I do like the term "transinformation" in the context of a communications channel since it conveys the idea of the amount of information that is carried "across" the channel. "Mutual information" is IMHO more for gambling and other uses. :) -- 130.94.162.64 17:22, 16 February 2006 (UTC)

- That was my original thought, to define "H(received|transmitted)", the a priori conditional entropy, as the "equivocation of the channel", but I believe Shannon uses the a posteriori conditional entropy "H(transmitted|received)", which he simply calls the "equivocation." In any case, the phrase "

## Proposal for splitting off some sections

- JA: Okay, I just saw some directives about being sure to preserve the edit histories, as it was a provision of GFDL to do so. I wasn't intending to start with a major overhaul myself. A minimal chunking would be to spin off the Mathematical theory section and the Applications section to separate articles, leaving the rest here. The "references" (works actually cited) would go with the citing texts, leaving a "bibliography" (further reading and general references) here. Maybe forming just a draft outline here would be a start. So far, I've got:

1. Overview (shoehorn, whatever) stays 2. Mathematical theory >---> moves 3. Applications >---> moves 4. History stays 5. General bibliography stays 6. See also stays 7. External links stays

- JA: Jon Awbrey 04:20, 15 February 2006 (UTC)

- Myself, I would suggest almost the exact opposite. If you want to create a slimmed-down hub here, I would suggest it is the present introductory material that most needs to be drastically slimmed down to 3-4 short paragraphs max. The history could also be cleanly split out to its own article. It should be easy for people looking up "information theory" to get to the key mathematical definitions: those should stay, and one should not have to wade through shedloads of verbiage to have to get to them. A short listing of some of the main applications (mostly pointing to articles of their own) is useful and should stay.

- The article
*Information*currently gives quite a good rather short succinct description of the meaning of information in information theory (alongside descriptions of how the word is used in other contexts). -- Jheald 13:57, 15 February 2006 (UTC).

- JA: I seem to be getting conflicting recommendations here. But I don't see any particular hurry, so maybe we'll learn something from talking about the issues. It's kind of on the back burner for me, though, as I'm finding these pop-vs-tek nomens-lands to be very unproductive as a rule — there is just way too much ambivalence in the Wiki City about how or whether to do what it takes. For the moment, I still have some hope, so will persist a while. The last time I looked at the
*Information*article, it was just a motley mush, and I don't plan to wade into that except to supply a few [see also]'s for those who want a way out of it. So let's not go there. Back later. Jon Awbrey 14:54, 15 February 2006 (UTC)

- I don't think it's too conflicting. The basics should be here, like the essential definitions of IT. But, perhaps the equations are listed rather than explained thoroughly. I believe that this article should be largely non-technical. There is PLENTY of technical stuff to discuss in an article all on its own. - grubber 16:18, 15 February 2006 (UTC)

- JA: Maybe I misunderstood. My initial suggestion, based on the model of what seemed to be working for the previous tangle of articles on
*Relational database*,*Relational algebra*,*Relational model*, etc., was to create a new article called*Theory of information (overview)*, or the like, and I thought folks said No, keep the overview part here. Now, I'm hearing something different, more like, Return to where we were at the start, write 3 or 4 token intro paragraphs, and fuhgettabotit. Fine by me. Jon Awbrey 17:04, 15 February 2006 (UTC)

- My suggestion is that the rewrite
*replace*Information Theory. This article should be highly accessible and easy to read. The nitty-gritty could be moved to other articles, like Channel capacity and Shannon information, with the subfields being explained more succinctly, leaving the details of Kolmogorov complexity and Coding theory left to those articles.

- My suggestion is that the rewrite

- As an example, consider the Mathematics page. It serves as a jumping-off point for everything else, and the Mathematics page is actually
*shorter*than the Info Theory page! Ideally, I would suggest the Info Theory page be no more than, say, 3 pages or so. I think we could get across the essentials and the interesting stuff in that much length. - grubber 18:06, 15 February 2006 (UTC)

- As an example, consider the Mathematics page. It serves as a jumping-off point for everything else, and the Mathematics page is actually

## Should I go or should I stay

- JA: That's what I thought I heard from 1 contingent the 1st time. In terms of the proposed reorg, you are saying this:

1. Hub (on-ramp to info autobahn) stays 2. Mathematical theory >---> new article Channel capacity >---> new section Shannon information >---> new section 3. Applications >---> new article 4. History stays 5. General bibliography stays 6. See also stays 7. External links stays

- JA: But Jheald seems to be saying something else. (NB. I personally will not touch the
*Information*article any time soon — the last time I tried to help with a pop article like that, even with folks who are supposed to know what a function or a manifold is, it was sheer grief — better to tend our own garden first.) Jon Awbrey 18:42, 15 February 2006 (UTC)

- I really don't see Jheald saying anything substantially different. I agree with what he says. I'd say: if it's more appropriate elsewhere, trim it out. For example, state what channel capacity is, but leave Channel capacity with the job to explain it thoroughly. To be safe, if you cut something, make sure you link somehow to an article that has what you cut. I'll help you do all this too... Sounds like fun! - grubber 19:48, 15 February 2006 (UTC)

- JA: Well, I leave you folks to it. I took the "making technical articles accessible" bit seriously. From previous experience with math-anxious readers, I took the notion that you can do that in 3 or 4 paragraphs as a wildly optimistic underestimate. I am rather well-acquainted with the sort of reader who genuinely wants to understand the "real info" but takes one look at a sum-over-neg-probs-and-logs-thereof and simply loses heart. Now, we all know that there's a perfectly natural reason for contemplating such a thing, but merely plumping down a Book of Common Prayer in a North of Tweed Kirk and demanding folks to worship accordingly -- well, na, I rechnen we all wit what comes a that. Jon Awbrey 20:14, 15 February 2006 (UTC)

- I like the extra stuff that you've added to this page, though I think they should be made into their own article so that we can trim this one up quite a bit. What would be a good name for a new article to cover this information? - grubber 19:57, 16 February 2006 (UTC)

The essentials, IMHO: The communication model. Information sent from sender to receiver. The idea that for information theory, "The significant aspect [of a message] is that the actual message is one selected from a set of possible messages". (Shannon). Measuring information in bits. The equal probability case. Non-equal probabilities: entropy - the average number of yes/no questions. Information <-> probabilities; => compression, redundancy. Mutual information.

Channel capacity later, after the mathematical definitions, as the start of the applications section. (Though flagged previously in the opening lead). Equals mutual information because of the noisy channel coding theorem.

That's what I would want to see in the article, to be quite concisely and shortly summarised. There may be a place for more lengthy, more discursive, more gently introductory material; but I think not in the main encyclopaedia "hub" article, which should be tight and to the point, but handed off to a separate page. -- Jheald 21:50, 15 February 2006 (UTC).

## Extra stuff

- JA: Do you mean the intro to the main page or something on the talk page? I'm starting to think about absorbing some of the overview (what's it good for & what's information, etc.) into the article on
*Logic of information*, as it fits in with a kind of "semiotic information theory" that C.S. Peirce was lecturing on in the 1860's -- it was at colonial backwaters like Harvard & Lowell, so of course nobody ever heard of it. Jon Awbrey 20:08, 16 February 2006 (UTC)

- Nice idea. I think that would be appropriate. - grubber 20:56, 16 February 2006 (UTC)

## Terse-ifying

I've been working at shortening and condensing the main page... Take a look. It's tough to figure out what is essential -- and I'm sure that will change as we go. - grubber 01:05, 17 February 2006 (UTC)

- Looking better. -- 130.94.162.64 05:24, 17 February 2006 (UTC)

### Not so terse, there!

- The three fundamental questions in the introduction deal somewhat more with coding theory than information theory per se. Information theory is the foundation for coding theory, but it is not so easy to easy to boil it down to the magic number of three fundamental questions. We need questions more like "What is information?" and "How does a random variable bear information?" Someone, not so briefly, tried to deal with these issues, which are now being ignored. Some of the information in the information entropy article perhaps belongs in the introduction here. And the three fundamental questions fail to mention error correction or redundancy at all. The channel capacity is a very important concept, but the questions of lossy and lossless compression could at least be combined and error correction added as one of the fundamental questions, if we wish to retain the focus on coding theory. Also, many of the advances in information and coding theory and applications could be added to the history section of this article "Developments since 1948". -- 130.94.162.64 15:38, 17 February 2006 (UTC)

- I'm not sure you are likely to find anything that cannot be categorized under those three statements. Lossless compression deals with what information is and how much information something fundamentally contains; lossy compression deals with approximations and distortion-rate and rate-distortion; and channel capacity deals with all the coding issues (including error correction and redundancy - there are no errors if there is not a noisy medium, ie a channel). When I took graduate info theory, these three questions were the ones used to introduce me to the field, and as I've studied IT, those three questions seem to partition the field quite well. My aim was to give a very lay introduction to give people an idea of what IT is all about. You are right that there are many subtle ideas that are not apparent in such a terse overview. But I think that is the aim of the article, not the introduction. - grubber 18:10, 17 February 2006 (UTC)

- O.K., I'm conceding on this one. I never had the benefit of a college course to introduce me to information theory, so I am quite lacking on how it should be introduced. -- 130.94.162.64 19:26, 18 February 2006 (UTC)

- After seeing one mistake from this author (the converse to capacity is a strict inequality), I randomly checked another alteration made by him/her, which also revealed a lack of understanding about information theory (confusing mutual information with relative entropy). The former mistake was actually a correction to a significant mistake already in the articles on theShannon–Hartley theorem and the noisy channel coding theorem, so I would not want to discourage the author from contributing to Wikipedia. However, I believe it would be better if the author put suggestions in the talk page for review before putting them on the article. I don't mean to be rude or out of line, but two out of two mistakes should give some pause. (Also, regarding the "three" question, I changed it to two with one having two subquestions, as in Elements of Information Theory.) Calbaer 21:01, 26 April 2006 (UTC)

- That second mistake looks like it has been corrected by adding the word "average" to the offending sentence in the section on mutual information in this article. I think it is clear now (especially if one reads the linked article) what is being referred to, but perhaps it could be made a little more clear here in this article. This article needs more major work, though, including cleanup and removal of anything (including, I am sure, some of my contributions) that is not encyclopedic and verifiable. I've done a lot of work on this article in the past, but I'm getting out of my depth when it comes to these fine (but nonetheless important) details. I hope Calbaer (who seems quite qualified) will continue to do some more work on this and related articles, and not be put off by my mistakes. I know I've made a lot of mistakes, and I appreciate the corrections. By the way, this talk page is getting too long and should be archived or something. 130.94.162.64 01:02, 17 May 2006 (UTC)

- I was probably hasty in calling it a "mistake," though I still feel it should be "corrected" (and, indeed, am about to change it after it was added back in). Yes, mutual information and Kullback–Leibler divergence are related. But it is not clear why "information gained" should be linked to Kullback–Leibler divergence, which is not — in my mind, let alone the mind of someone new to the subject — automatically associated with information gain; to me, the association with information gain is with mutual information. It makes about as much sense to link to information entropy, since . Besides, why should a word like "gained" link to a concept being explained? (By the way, I won't be put off by mistakes. Perhaps I will be by non-Wikipedia commitments, however....) Calbaer 01:37, 7 June 2006 (UTC)

- Well, although those are good partitions of the topic, maybe there's a better way to call them? - grubber 01:49, 19 February 2006 (UTC)

## Outsider comments

As an outsider seeing that a fairly major re-write of this article seems to be in progress, I'll make some comments for your consideration, instead of jumping in to edit the article myself:

- The intro paragraph talks a lot about "data", but
*data*is never defined in the article (as compared to*entropy*,*message*, etc., which are carefully defined).

- Watch out for peacock terms, for example "an interesting and illuminating connection", and "notorious equation".

- In the history section, bare years seem to be more heavily linked than recommended, see Wikipedia:Manual of Style (dates and numbers).

- If you're looking for a place to shorten the article, the "Related concepts" section would seem to be a natural; it could easily be reduced to a couple of sentences or even two bullet items in the "See also".

- There's a giant box at the bottom of the page that declares information theory to be a sub-field of cybernetics; but in this article, cybernetics seems to be defined as a sub-field of information theory. Could the relationship be defined more clearly?

I hope this unsolicited peer review is helpful instead of just annoying. -- The Photon 06:09, 25 February 2006 (UTC)

- This is indeed helpful, and not annoying at all. Thank you for the valuable input. IMHO:

- "Data" is information.
- By all means edit away the "peacock terms".
- Feel free to remove excessive year links in the history section.
- Section on Kolmogorov complexity should disappear and become a bullet point somewhere. Measure theory section should be shortened & revised.
- I have no idea what cybernetics is other than a buzzword from the sixties meaning anything to do with computers, communications, and control.

- I hope this is a helpful reply to your comment. Have at it. -- 130.94.162.61 01:06, 15 March 2006 (UTC)

- JA: "Data is information"? You can't seriously mean that? Jon Awbrey 02:00, 15 March 2006 (UTC)

- I was under the impression that information was data in a context or with meaning (i.e. there is a hierarchy: data -> information -> knowledge). But this might be my
*aged*understanding of one class of information management failing to shine through :) - FrancisTyers 13:10, 15 March 2006 (UTC)- In the context of information theory, not semiotics, not philosophy, yes, data is information. I refer to the disambiguating comment at the very beginning of the article: "
*The topic of this article is distinct from the topics of Library and information science and Information technology*." 130.94.162.61 21:56, 16 March 2006 (UTC)

- In the context of information theory, not semiotics, not philosophy, yes, data is information. I refer to the disambiguating comment at the very beginning of the article: "

- I was under the impression that information was data in a context or with meaning (i.e. there is a hierarchy: data -> information -> knowledge). But this might be my

- JA: In the context of information theory as Shannon formulated it, confusing data with information is like confusing water with quarts. Jon Awbrey 22:10, 16 March 2006 (UTC)

- Do you mean information is a measure of data or vice versa? Data and information can both be measured quantitatively in bits, nats, hartleys, etc., so in a sense they are the same, even though they carry different connotations. See
*Redundancy (information theory)*for a definition of what is perhaps data but not information. 130.94.162.61 01:32, 17 March 2006 (UTC)

- Do you mean information is a measure of data or vice versa? Data and information can both be measured quantitatively in bits, nats, hartleys, etc., so in a sense they are the same, even though they carry different connotations. See

I'm new to this, but I thought I might point out that the ISBN is 0486682102 (a 2 has been missed off) for the reference to Reza's book.

- Thanks for pointing that out. I just fixed that. 130.94.162.64 00:31, 21 May 2006 (UTC)

“Shannon’s theory, although it is sometimes referred to as a “theory of information,” is nothing of the sort. It is a theory of coding and of channel capacity. To achieve his desired aim, Shannon concentrated on the average amount of data generated by a source or transmitted through a channel. What that data represented was irrelevant to his purposes. Indeed, whereas it is possible to average numerical amounts of data, there is no such thing as an “average” of information content.” a quotation from Keith Devlin

I wish someone would write about the following important contributions to information theory:

Bar-Hillel & Carnap on the Theory of Semantic Information; Fred Dretske’s theory ; Barwise and Seligman’s situation-theoretic approach and ‘channel theory’.

(I am not conversant with this subject, just interested in it.)

## The intro

As of March 2006, the intro to this article left much to be desired; e.g., channel capacity is not about quickness but throughput and reliability; lossy and lossless compression are generally considered under the general header of source coding; cryptography could be considered under channel coding but not always strictly; and channel capacity and compression shouldn't generally be considered as separate topic. After User:R. Koot got rid of the flawed intro, I redid it so that the three topics are not separate, that the first two are traditionally linked, and that matters not cleanly falling within these topics (e.g., cryptography) are often considered information theoretic due to their connection, albeit incomplete, to information theoretic concepts. It's shorter and more structured now, and thus should make for an easier read. Calbaer 17:57, 10 April 2006 (UTC)

- Much better. I'd like to see those bullet point go, though. And I think the reference to lossy compression is a bit dubious. —
*Ruud*18:03, 10 April 2006 (UTC)

- Thanks. I think the bullet points help in terms of understanding so that the intro is neither too dense nor too busy. Also, I'm not sure why the reference to lossy compression is dubious; rate distortion theory was part of Shannon's original research. Calbaer 19:38, 12 April 2006 (UTC)

- Nice, didn't know that. I'd still suggest starting with at least one (preferably two) paragraphs of plain text before the bullet points. I find bulleted text harder to read and visually less attractive. That wouldn't be much of a problem in the main text, but you'd have to capture the reader first. —
*Ruud*20:07, 12 April 2006 (UTC)

- Nice, didn't know that. I'd still suggest starting with at least one (preferably two) paragraphs of plain text before the bullet points. I find bulleted text harder to read and visually less attractive. That wouldn't be much of a problem in the main text, but you'd have to capture the reader first. —

- I'm not sure the best way to do this, but here's an example:

**Information theory**is a field of mathematics concerning the storage and transmission of data, beginning with the fundamental considerations of source coding and channel coding. Source coding involves compression (abbreviation) of data in a manner such that another person can recover either an*identical*copy of the uncompressed data (lossless data compression, which uses the concept of entropy) or an*approximate*copy of the uncompressed data (lossy data compression, which uses the theory of rate distortion). Channel coding considers how to transmit such (ideally compressed) data, asking at how high a rate data can be communicated to someone else through a noisy medium with an arbitrarily small probability of error.

- Is that better? Calbaer 23:20, 12 April 2006 (UTC)

- I'm sorry to see the bullets go, since I find that they help make parallels clearer. But, the intro sounds good! This article sure needs more work, and I'm glad there's a few others interested in helping. Keep up the good work! - grubber 13:57, 18 April 2006 (UTC)

- Well, if you want to revert, I wouldn't be offended. I didn't hear anyone standing up for the bullet points; if this were a more trafficked discussion page, I'd ask for an unofficial vote, but, as it is, I guess I'll just be patient and see if anyone else has an opinion. I'm happy with both the bullet point version [1] and the current one. Alternatively, it could have both:

**Information theory**is a field of mathematics concerning the storage and transmission of data and includes the fundamental concepts of source coding and channel coding. Source coding involves compression (abbreviation) of data in a manner such that another person can recover either an*identical*copy of the uncompressed data (lossless data compression, which uses the concept of entropy) or an*approximate*copy of the uncompressed data (lossy data compression, which uses the theory of rate distortion). Channel coding considers how to transmit such data, asking at how high a rate data can be communicated to someone else through a noisy medium with an arbitrarily small probability of error.

- These topics are rigorously addressed using mathematics introduced by Claude Shannon in 1948. His papers spawned the field of information theory. Although originally divided into the aforementioned topics of
- Source coding (lossless and lossy)
- Channel coding

- the field now addresses extended and combined problems such as those in network information theory and related problems including portfolio theory and cryptography.... Calbaer 18:52, 19 April 2006 (UTC)

- These topics are rigorously addressed using mathematics introduced by Claude Shannon in 1948. His papers spawned the field of information theory. Although originally divided into the aforementioned topics of

- No, the intro sounds good this way; I was just stating my preference. The one thing I notice is that the intro paragraph still feels dense, but as I read it, I can't really offer any suggestions on how it would be better. Hopefully as we get more eyes, we can massage this into a great article. Info Theory is too cool to let this page feel dull! :) - grubber 21:11, 19 April 2006 (UTC)

- Yeah, hopefully others will be as opinionated. It is dense, but I'd like the article (or at least the introduction) to be in the style of Cover and Thomas, which is both very dense yet very understandable. Perhaps the first sentence should be split in two, though: "
**Information theory**is a field of mathematics concerning the storage and transmission of data. The fundamental concepts of information theory are source coding (data compression) and channel coding (channel capacity). Source coding involves compression (abbreviation) of data in a manner such that another person can recover either an*identical*copy of the uncompressed data (lossless data compression, which uses the concept of entropy) or an*approximate*copy of the uncompressed data (lossy data compression, which uses the theory of rate distortion). Channel coding considers how one can transmit such data, asking at how high a rate data can be communicated to someone else through a noisy medium with an arbitrarily small probability of error." Like you say, we should make sure the page isn't dull, so getting into the heart of the manner in the introduction is important. Calbaer 22:09, 19 April 2006 (UTC)

- Yeah, hopefully others will be as opinionated. It is dense, but I'd like the article (or at least the introduction) to be in the style of Cover and Thomas, which is both very dense yet very understandable. Perhaps the first sentence should be split in two, though: "

- Well, I think WP should have all the required info in it somewhere, but I think we need to make sure that at least one IT page is approachable even by a lay person. For that reason, I think there's a lot of work that needs to be done on this page. - grubber 13:31, 20 April 2006 (UTC)

## Figure-Text agreement

JA: Here is how the channel capacity section looked 24 Dec 2005. Note: X = received, Y = transmitted. At some point somebody switched the words "received" and "transmitted". Probably best for several eyes to recheck consistency throughout this section. Jon Awbrey 18:42, 17 April 2006 (UTC)

**Channel Capacity**

Let us return for the time being to our consideration of the communications process over a discrete channel. At this time it will be helpful to have a simple model of the process:

+-------+ | Noise | +-------+ | V +-----------+ Y +-------+ X +--------+ |Transmitter|-------->|Channel|-------->|Receiver| +-----------+ +-------+ +--------+

Here again *X* represents the space of messages received, and *Y* the space of messages transmitted during a unit time over our channel. However, this time we shall be much more explicit about our use of probability distribution functions. Let be the conditional probability distribution function of *X* given *Y*. We will consider to be an inherent fixed property of our communications channel (representing the nature of the **noise** of our channel). Then the joint distribution of *X* and *Y* is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the amount of information, or the **signal**, we can communicate over the channel. The appropriate measure for this is the transinformation, and this maximum transinformation is called the **channel capacity** and is given by:

- I'm used to seeing X -> Y, but everyone gets a different notation. I would suggest that we try to make an image (or find one in WP that is already there) for this figure. ASCII text is a bit distracting in my opinion. - grubber 21:12, 19 April 2006 (UTC)

## Notational consistency & other errors in article

In the article both and are used for the binary entropy function. Some consistency would be desired here. There is a very nice graph of this function at Information entropy, but not much explanation of the notation that should be used for this important function. 130.94.162.64 21:25, 15 May 2006 (UTC)

- Neither is correct. In sources I've seen, is used, as an entropy function on a single scalar can be assumed to be the binary entropy function; and are bad because many information theorists would assume that these are expressions for Rényi entropy of order or , respectively. If anyone has a respected yet conflicting source, please let me know before I change this. Calbaer 19:29, 30 May 2006 (UTC)
- The notation seems to come from David J. C. MacKay's textbook. I don't know where the comes from. But by all means do what you need to do to make things clear. 130.94.162.64 00:47, 31 May 2006 (UTC)
- Ah, I see. I replaced all instances with , which doesn't make the b scriptsize in the PNGs (so it looks kind of odd), but at least can't be confused with Rényi entropy. would be better in theory, but might confuse first-time readers. Calbaer 00:23, 1 June 2006 (UTC)

- The notation seems to come from David J. C. MacKay's textbook. I don't know where the comes from. But by all means do what you need to do to make things clear. 130.94.162.64 00:47, 31 May 2006 (UTC)

Somewhere along the line someone switched the order of "ergodic" and "stationary" in the classification of sources of information in the article. Reza shows a diagram (p. 349) where

and also provides an example of a process that is stationary but not ergodic (on the same page). This ought to be checked out more carefully. 130.94.162.64 20:29, 22 May 2006 (UTC)

- The example in Reza's book is , where
*A*and*B*are i.i.d normal rvs. This is stationary, but not ergodic. On the other hand, the article Measurable function notes that "an ergodic process that is not stationary can, for example, be generated by running an ergodic Markov chain with an initial distribution other than its stationary distribution." This is true, but any such ergodic process will approach stationarity as the effects of the initial distribution fall off exponentially. 130.94.162.64 22:41, 25 May 2006 (UTC)

Also, I am having second thoughts about keeping some material: the measure theory section is rather long and esoteric, and the first paragraph on the intelligence section is not very well sourced. Both could probably go without hurting the article too much. I do like the example of GPS, though. Any comments? Even though I am mostly responsible for the material in question, I do not want to delete this stuff if others think it should stay or could be improved somehow. 130.94.162.64 15:33, 24 May 2006 (UTC)

## Self Teaching

One thing that might be useful would be sort of a flow chart of mathematical concepts and tools that a person would need if they wanted to learn how to understand Shannon's theory. That way those of us who think that it might be useful in our work, could carry out a study program that would allow us to utilize information theory.

I.e. something like...okay first you need Statistics...this/this/this concept, then you need X, then you need Y and so on.

- Information theory is mainly based on probability theory. 130.94.162.64 23:49, 9 May 2006 (UTC)

## The intro 2

The introduction was rewritten and the meaning is completely different. I believe it confuses many of the topics of information theory by claiming the fundamental quantities in information theory are "transfer cost" (presumably bandwidth and/or power and/or time) and "transfer loss" (error). This doesn't seem to be a good description of information theory, so, if no one argues for it, I'll revert. If someone sees something improved about it, please reply. Calbaer 23:51, 1 June 2006 (UTC)

- The introduction was getting worse and worse with every re-edit, so I reverted it to the status as of a few days ago. Changes by Ww made the intro too short and overly broad, not pegging down what information theory is, seemingly excluding topics such as storage out of the realm of "information theory." Chancemill noticed this and rewrote the intro, correcting these problems, but making it too long and unfocused. Heidijane changed a disambiguation, introducing some broken markup.
- I don't mind it being changed for clarity - and I just added a few changes based on what I believe to be the intentions of those who modified it - but if we could discuss complete rewrites here, rather than unilaterally making them, that would be good. The intro sets the tone for the article, so if people find it too confusing, long, or poorly written, they won't read the rest of the article. Calbaer 18:01, 2 June 2006 (UTC)
- The changes I made to the introduction were in fact to make it less specific and that was deliberate. The narrative arc for the intro of a technical article such as this should be something like general statement, brief significance, perhaps a bit of history, importance, and then the TOC. And it should read clearly (as should the entire article), even for those innocent of much math. This article as a whole did not reach this last point at all, and still does not. The changes I made in the introduction (and here and there throughout the non-matheematical parts) were intended to fit into the structure noted and to improve clarity or phrasing elsewhere. Editors here must, in my view, keep in mind that they are NOT writing for others informed on some subject (except secondarily) but must include the Average Reader as well. That means that the 'density' (mentioned above) should be confined to technical sections the Average Reader may skip. Said Reader should be able to exit the article (ideally early no) with a reasonble conceeptual understanding of at some part of the topic, and a sense of why it's important to others (disciplinea and people). My changes only began to begin satisfactory quality along those lines.
- Changes since then have been (unamimously it appears) for the worse, from this perspective. The intro jumps immediately into specific detail without explanation, or even hint, of what's being jumped into, and is therefore inadequate.
- I would revert to the intro structure I left, and invite phrasing and improvements in that. In present state it's chaotic and confusing to non-experts. Unsatisfactory, and seriously so. Comments from others? Suggestions? Disagreements? As to basic strategy, inadequate phrasing, ...? ww 00:57, 3 June 2006 (UTC)

- The intro, as rewritten, is not as much "average reader"-oriented as it is a rewrite of the previous intro with about half the details left out. A more complete yet accessable intro would be:
**Information theory**is a discipline in applied mathematics involving the storage and communication of data, assuring that as much as possible can be stored and/or communicated with a near-certain probability of either no or little loss of data. Applications of fundamental topics of information theory include ZIP files (lossless data compression), MP3s (lossy data compression), and DSL (channel coding). The field is at the crossroads of mathematics, statistics, computer science, and electrical engineering, and its impact has been crucial to success of the Voyager missions to deep space, the invention of the CD, the feasibility of mobile phones, the development of the Internet, the study of human perception, the understanding of black holes, and numerous other fields.

- although the article would still need to source coding/channel coding division after the introduction. Calbaer 02:16, 3 June 2006 (UTC)

- The intro, as rewritten, is not as much "average reader"-oriented as it is a rewrite of the previous intro with about half the details left out. A more complete yet accessable intro would be:

- Yes, the intro was probably a bit too confused and looks better now, but I am not sure if the current opening line captures the concept sufficiently.

**Information theory**is a field of applied mathematics concerning the storage and transmission of data

- I am not sure "storage and transmission of data" is the right way describe information theory, because that would also be the answer for "What do packet-switched networks do". Information theory, as I understand, is concerned with communication at a higher level of abstraction, whose actual goal is an optimized transfer of information which is adequately comprehensible to the receiver. IMO, the following (slightly) modified paragraph looks better.

**Information theory**is a field of applied mathematics concerning the optimized storage and transmission of information over any medium and includes the following fundamental concepts.

- Source coding involves compression (abbreviation) of data in a manner such that another person can recover either an
*identical*copy of the uncompressed data (lossless data compression — e.g., ZIP files — using the concept of entropy) or an*approximate*copy of the uncompressed data (lossy data compression — e.g., MP3s — as in the theory of rate distortion).

- Source coding involves compression (abbreviation) of data in a manner such that another person can recover either an

- Channel coding considers how to transmit such data, asking at how high a rate data can be communicated to someone else through a noisy medium with an arbitrarily small probability of error, as in communication with modems.

- The field is at the crossroads of mathematics, statistics, computer science, and electrical engineering. Chancemill 06:15, 3 June 2006 (UTC)
- Well, some like longer intros, some like shorter intros; some like bullet points, some don't (see above discussion). It would be nice to have something that everyone can agree upon enough to minimize rewrites. Regarding the goal of information theory, I suppose it would be better to start out, "
**Information theory**is a discipline in applied mathematics*quantifying*the storage and communication of data...." Measuring data is was separates information theory from other fields. Calbaer 06:34, 3 June 2006 (UTC)

- Well, some like longer intros, some like shorter intros; some like bullet points, some don't (see above discussion). It would be nice to have something that everyone can agree upon enough to minimize rewrites. Regarding the goal of information theory, I suppose it would be better to start out, "

- The field is at the crossroads of mathematics, statistics, computer science, and electrical engineering. Chancemill 06:15, 3 June 2006 (UTC)

<---

All: I think Calbaer's offering is a considerable improvement on what we had after folks had at my version of a couple of days ago. And so I Cahncemi's. But I do agree with Chancemi's observation about higher abstraction level. The intro should, in my view, make an easily understood point, then expand that point so the average Reader will see the ubiquity and importance of IT. A teaching approach, as it were, at the expense of 'jump off the cliff' scientific paper style. In fact, I'm in favor of bluntly saying that IT is critical across wide swaths of modern engineering. Since this is WP, the cliff style doesn't really fit. The article content, including quite technical stuff, is still appropriate, but we should make more than a small effort not to obfuscate the reasons why anyone would bother with all the tecnicalities.

So, I suggest we adopt Calbear's suggestion and adjust.fine tune from there. Or mine of a few days ago, and adjust from it. The cat we keep tripping over in the dark is an inability to consider those who are coming new to this. They require a less abrupt conceptual cliff, especially in the intro stuff. Not somehow wrong (or uncyclopedic) to 'kowtow' to them. (<-- writer speaks) ww 16:35, 3 June 2006 (UTC

- Okay, let me take another shot at this based on all of the above. I suppose it's not that important that people see the difference between source coding and channel coding in the first paragraph; that the two types of compression are closer conceptually than the idea of communication is fairly self-evident.
**Information theory**is a discipline in applied mathematics involving the quantification of data with the goal of enabling as much data as possible to be reliably stored on a medium and/or communicated over a channel. The usual quantity for data is the statistical bit, which measures information entropy. Applications of fundamental topics of information theory include ZIP files (lossless data compression), MP3s (lossy data compression), and DSL (channel coding). The field is at the crossroads of mathematics, statistics, computer science, and electrical engineering, and its impact has been crucial to success of the Voyager missions to deep space, the invention of the CD, the feasibility of mobile phones, the development of the Internet, the study of human perception, the understanding of black holes, and numerous other fields.

- To the end of helping the Average Reader, let me also propose paragraphs that could follow this, either in the introduction or overview:
- Consider communication via language. Two important aspects of a good language are as follows: First, the most common words (e.g., "a," "the," "I") should be shorter than less common words (e.g., "benefit," "generation," "mediocre"), so that sentences will not be too long. Such a tradeoff in word length is analogous to data compression and is the essential aspect of source coding. Second, if part of a sentence is unheard or misheard due to noise — e.g., a passing car — the listener should still be able to glean the meaning of the underlying message. Such robustness is as essential for a communication system as it is for a language; properly building such robustness into communications is done by channel coding. Source coding and channel coding are the fundamental goals of information theory.

- Note that nothing has been said about the
*importance*of the sentences in question. A platitude such as "Thank you; come again" has as many syllables as the urgent plea,"Call an ambulance!" Information theory, however, does not address this, as this is a matter of the quality of data rather than the quantity of data.

- Note that nothing has been said about the

- Information theory is generally considered to have been founded in 1948 by Claude Shannon in his seminal work,
*A Mathematical Theory of Communication*. The central... Calbaer 15:54, 4 June 2006 (UTC)

- Information theory is generally considered to have been founded in 1948 by Claude Shannon in his seminal work,

- In general, I like your proposal. I have quibbles about phrasing, but that's an artistic issue, not a content one. The only thing I would observe is that you go awfully quickly into a technical consideration in the first sentence or two. I say, let's install your current proposal and have at it. I'll have some phrasing suggestions. ww 21:49, 4 June 2006 (UTC)
- Will do. Yes, I was a bit uncomfortable with the mention of a "bit" so early on, but, as Chancemill pointed out, we want to differentiate between information theory and other aspects of data in the introduction. Hopefully the sentence is short enough that the reader will, if confused, read past it as first, then follow the links if there's greater interest. Calbaer 03:10, 5 June 2006 (UTC)

- In general, I like your proposal. I have quibbles about phrasing, but that's an artistic issue, not a content one. The only thing I would observe is that you go awfully quickly into a technical consideration in the first sentence or two. I say, let's install your current proposal and have at it. I'll have some phrasing suggestions. ww 21:49, 4 June 2006 (UTC)

- Good job on the opening lines! We should now probably look at the rest of the article. I'd like to see some more "dumbing down" to keep the article as interesting as possible for the lay user (not at the cost of removing technical content) Chancemill 05:37, 6 June 2006 (UTC)

- Actually, I'd hope we can avoid any dumbing down. We should do it the hard way, and write so well that even a tricky topic like this can be seen by the Average Reader -- if not in full technical detail -- for the centrally important topic it is. And without any wool pulling either. It's the right thing to do. ww 02:25, 7 June 2006 (UTC)

## Kullback–Leibler divergence and mutual information

I would like to see some mention in the article of the expression of the mutual information as the expected Kullback–Leibler divergence of the posterior probability distribution of *X* after observing *Y* to the prior distribution on *X*. By this I mean that

There was a link in the article that just barely hinted at this relationship, but it was twice deleted by Calbaer as being confusing. I'm not sure how best to express this in the article, but it seems important to me, as it models the communications process as a sort of Bayesian updating of one's imperfect knowledge (at the receiving end of the communications channel) of the symbol *X* that was transmitted based on observing *Y*, the received symbol. Based on each *y* that might be received, one can quantify that exactly information is "gained" about *X*. This expression actually seems more fundamental to me than that of the difference of the joint distribution to the product of the marginals, even though the latter may be more obvious. Sorry to ramble on so much, but this has really got me thinking. 130.94.162.64 23:31, 7 June 2006 (UTC)

- I deleted it because it was a link without any descriptive explanation, mathematical or otherwise; as you say, it barely hinted. If you want to state the above, it would likely be best to add to the end of the section, although you may want to bring the "Mutual information is closely..." sentence to the end. Note that Cover and Thomas define mutual information mathematically and then immediately note the relation to relative entropy. Soon thereafter, they state , while the observation you make it omitted (likely because it is implied by the other relative entropy property as well as by the relation to entropy). Thus, while it may be worthwhile, I don't think it needs to be near the top of the (rather short) section. So you could either state it after all this, or combine the various definitions in one equation block. The latter would likely be more illuminating, but also more foreboding. Calbaer 00:44, 8 June 2006 (UTC)
- Actually, divergence should be defined before it is used! So why not insert, before the section in question:
- The
**Kullback–Leibler divergence**(or**information divergence**, or**information gain**, or**relative entropy**) between two distributions is a natural distance measure from a "true" probability distribution*p*to an arbitrary probability distribution*q*. Typically*p*represents data, observations, or a precise calculated probability distribution. The measure*q*typically represents a theory, a model, a description or an approximation of*p*. It can be interpreted as the expected extra message-length per datum that must be communicated if a code that is optimal for a given (wrong) distribution*q*is used, compared to using a code based on the true distribution*p*. This is not a true metric due to its not being symmetric. - The Kullback–Leibler divergence of distributions
*p*and*q*is defined to be

- The
- I hadn't noticed its missing before! (The above is largely "stolen" from Kullback–Leibler divergence, except the metric comment, which is briefer than the one in that article.) Calbaer 00:55, 8 June 2006 (UTC)
- It looks better now, but the disputed link is back, this time redirecting to
*Information gain in decision trees*. Now there is a lot more context for it, of course, but I'm not sure if the very beginning of that section is the right place to make that link, especially when that relation is explained towards the bottom of the section. 130.94.162.64 20:25, 8 June 2006 (UTC)- I put "information gain" right next to the link to KL divergence; hopefully that's a good compromise, and hopefully the language is a bit more accessible now. Calbaer 03:03, 9 June 2006 (UTC)
- I'm not sure I agree with the placement of the phrase about being able to save an average of bits when encoding X if Y is known. That follows directly from but in general even though 130.94.162.64 08:09, 9 June 2006 (UTC)
- I agree. I'll change it back, but what would you say to get your point across? Calbaer 23:23, 13 June 2006 (UTC)
- I tried. What I said was, "this is a measure of how much, on the average, the probability distribution on
*X*will change if we are given the value of*Y*." That's probably the best I can do, since I don't really want to ramble on about communications channels and Bayesian inference. Feel free to improve that or cut it out if it's not up to snuff. 130.94.162.64 02:51, 14 June 2006 (UTC)- Sounds good as is to me. Calbaer 23:00, 14 June 2006 (UTC)

- I tried. What I said was, "this is a measure of how much, on the average, the probability distribution on

- I agree. I'll change it back, but what would you say to get your point across? Calbaer 23:23, 13 June 2006 (UTC)

- I'm not sure I agree with the placement of the phrase about being able to save an average of bits when encoding X if Y is known. That follows directly from but in general even though 130.94.162.64 08:09, 9 June 2006 (UTC)

- I put "information gain" right next to the link to KL divergence; hopefully that's a good compromise, and hopefully the language is a bit more accessible now. Calbaer 03:03, 9 June 2006 (UTC)

- It looks better now, but the disputed link is back, this time redirecting to

- Actually, divergence should be defined before it is used! So why not insert, before the section in question:

## Pending tasks (what do we want?)

Here are how I interpret the pending tasks:

- Make sure intro and overview are understandable with daily life examples. Making sure the reader can subjectively tell what a "bit" is, for example, is a good start. Discussion of elephant trumpets, whale songs, and black holes, do not contribute in this manner, although their mere mention (with links, say, to the no hair theorem) may be useful to the curious.

- Move applications up before mathematics (just after Overview) and make it independent. I'm not sure this is for the best, as Overview states some applications and, for the rest, it's useful to know some theory. Also, we need to be careful not to conflate security and information theory, since information knowledge is necessary for, yet not sufficient for, breaking ciphers. (For example, a Diffie-Hellman key exchange assumes that an adversary will have full knowledge, but the form of that knowledge assures that the adversary will not be able to use it within a reasonable amount of time. Thus, although secure in practice, Diffie-Hellman key exchange is information-theoretically insecure).

- A separate section on the theory already exists (although it could use a mention of a few more concepts, e.g., the asymptotic equipartition property).

- Move history down to end (just before References). Again, I'm not sure why this is best, and some "good" articles, like Thermodynamics, list history first. If moved, we need to check that this is just as readable, but the reader doesn't need to know scientific history to understand scientific phenomena. Clearly more post-1948 history is needed. Most applications of information theory, even elementary ones, are post-1948 (Huffman codes, arithmetic codes, LDPC codes, Turbo codes), and much of the theory is too, e.g., information divergence, information inequalities, Fisher sufficient statistics, Kolmogorov complexity, network information theory, etc. However, it might be best to move the history to its own article. The current section is far too in-depth.

There's still a lot to do and a lot to add, and we might reasonably ask if certain topics should be shortened or omitted, e.g., Kullback–Leibler divergence (which, although important, can generally be omitted from elementary explanation), differential entropy (which is most important for transinformation, which can be interpreted as a limit of discrete transinformation), gambling (which most information theoretists I know love but is a somewhat fringe topic). Again, the thermodynamics article is shorter and far less mathematical. Do we want this article to be long with full mathematical explanations or medium-sized with general explanations and links to the math? I don't know the right answer, but it might be nice to get some ideas of what people would like out of this.

A few final things to keep in mind:

- Diversions or examples absent of explanation confuse, not clarify.
- Sentences should be concise as possible, but no more so, for easy reading.
- Proofread before (or at least immediately after) changes.
- Others might not read the same way as you.

Due to these factors, it's good to discuss significant changes on the talk page. I find my "peer edited" sections are a lot better than my first attempts. And too much unilateral change can result in an unstable, unreadable, and flat-out wrong article. Calbaer 01:44, 15 June 2006 (UTC)

- I say keep the K–L divergence. Not only is it important, but it is really useful in understanding the mutual information. The treatment of other stuff like gambling, intelligence, and measure theory could be cut. The measure theory section shows a useful mnemonic device, but the analogy is incomplete due to the failure of the transinformation to remain non-negative in the case of three or more random variables. It could probably be cut out entirely. History of information theory could indeed have its own article, but keep a short summary of history in this article here. Also some mention of Gibbs' inequality would be nice as it shows the K–L divergence (and thus the bivariate mutual information) to be non-negative. The AEP could be mentioned in the source theory section, since it is a property of certain sources. Don't hesitate to add what you think should be added. --130.94.162.64 17:17, 20 June 2006 (UTC)
- K-L divergence is certainly fundamental, so it should stay. Measure theory is also fundamental, but it is not necessary for a good understanding of information theory, so it could go. History could be summarized and moved. I'll get to this when I can, though if someone else wants to do so first, go ahead. Calbaer 18:36, 20 June 2006 (UTC)

- the section that mentions measure theory should be kept, or moved somewhere else, rather than deleted. also, the following sentence in that section is misleading/incorrect: "it justifies, in a certain formal sense, the practice of calling Shannon's entropy a "measure" of information." entropy is not a measure, given an random variable
*f*, one integrates*f lnf*against a suitable measure (the Lebesgue measure, the counting measure, etc) to get the entropy. Mct mht 18:52, 20 June 2006 (UTC)- I think that the section confuses more than it helps; most readers will not have a working knowledge of measure theory, and, as you say, the section needs at least a little reworking. I'll delete it and add a link to Information theory and measure theory. Calbaer 19:47, 20 June 2006 (UTC)

Seems to me a big priority should be to break out all the details into other articles. Information Theory is very much a science. If you look at the "Mathematics" Wikipedia article, you'll see that there are few actual mathematics concepts dicussed. Its more a history and categorization of the sub-sciences of mathematics, with links. I picture "Information Theory" as a fairly small article (only because the science is young) giving only a bit more than a dictionary definition of "information theory", and then a ton of links with brief descriptions to categories, related theories, practical applications, algorithms, coding methods, etc.. I think this article should be more of a starting point. Details should be elsewhere. Also, until this can be done, at least a set of links might be nice. I guess I'm saying here I'm surprised that a search of the "Information Theory" article in Wikipedia doesn't find the phrase "forward error correction"! qz27, 22 June 2006

- Good point, and one that argues for the elimination of the following as all but links: Self-information, Kullback–Leibler divergence, differential entropy. We shouldn't be scared to have a little math in the article, but regular, joint, and conditional entropy can be defined together and mutual entropy in terms of them. That's enough for the idealizations of source and channel coding. If people want to know why the math is as it is, there could be a definitions in information theory article or something like that. Two things though: 1) "error correction" is mentioned three times in the article (and I don't think it's that important to specifically state FEC in a "starting point" article) and 2) are you volunteering to do this? Calbaer 06:13, 23 July 2006 (UTC)