# Binary symmetric channel

Template:More footnotes A binary symmetric channel (or BSC) is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit (a zero or a one), and the receiver receives a bit. It is assumed that the bit is usually transmitted correctly, but that it will be "flipped" with a small probability (the "crossover probability"). This channel is used frequently in information theory because it is one of the simplest channels to analyze.

## Description

The BSC is a binary channel; that is, it can transmit only one of two symbols (usually called 0 and 1). (A non-binary channel would be capable of transmitting more than 2 symbols, possibly even an infinite number of choices.) The transmission is not perfect, and occasionally the receiver gets the wrong bit.

This channel is often used by theorists because it is one of the simplest noisy channels to analyze. Many problems in communication theory can be reduced to a BSC. Conversely, being able to transmit effectively over the BSC can give rise to solutions for more complicated channels.

## Definition

A binary symmetric channel with crossover probability p denoted by $BSC_{p}$ , is a channel with binary input and binary output and probability of error p; that is, if X is the transmitted random variable and Y the received variable, then the channel is characterized by the conditional probabilities

Pr( Y = 0 | X = 0 ) = 1 − p
Pr( Y = 0 | X = 1) = p
Pr( Y = 1 | X = 0 ) = p
Pr( Y = 1 | X = 1 ) = 1 − p

It is assumed that 0 ≤ p ≤ 1/2. If p > 1/2, then the receiver can swap the output (interpret 1 when it sees 0, and vice versa) and obtain an equivalent channel with crossover probability 1 − p ≤ 1/2.

### Capacity of BSCp

The capacity of the channel is 1 − H(p), where H(p) is the binary entropy function.

The converse can be shown by a sphere packing argument. Given a codeword, there are roughly 2n H(p) typical output sequences. There are 2n total possible outputs, and the input chooses from a codebook of size 2nR. Therefore, the receiver would choose to partition the space into "spheres" with 2n / 2nR = 2n(1 − R) potential outputs each. If R > 1 − H(p), then the spheres will be packed too tightly asymptotically and the receiver will not be able to identify the correct codeword with vanishing probability.Template:What

## Shannon's channel capacity theorem for BSCp

Shannon's noisy coding theorem is general for all kinds of channels. We consider a special case of this theorem for a binary symmetric channel with an error probability p.

### Noisy coding theorem for BSCp

What this theorem actually implies is, a message when picked from $\{0,1\}^{k}$ , encoded with a random encoding function $E$ , and send across a noisy $BSC_{p}$ , there is a very high probability of recovering the original message by decoding, if $k$ or in effect the rate of the channel is bounded by the quantity stated in the theorem. The decoding error probability is exponentially small.

We shall now prove Theorem 1 .

Proof We shall first describe the encoding function $E$ and the decoding function $D$ used in the theorem. We will use the probabilistic method to prove this theorem. Shannon's theorem was one of the earliest applications of this method.

Ultimately, we will show (by integrating the probabilities) that at least one such choice $(E,D)$ satisfies the conclusion of theorem; that is what is meant by the probabilistic method.

The proof runs as follows. Suppose $p$ and $\epsilon$ are fixed. First we show, for a fixed $m\in \{0,1\}^{k}$ and $E$ chosen randomly, the probability of failure over $BSC_{p}$ noise is exponentially small in n. At this point, the proof works for a fixed message $m$ . Next we extend this result to work for all $m$ . We achieve this by eliminating half of the codewords from the code with the argument that the proof for the decoding error probability holds for at least half of the codewords. The latter method is called expurgation. This gives the total process the name random coding with expurgation.

Let $y$ be the received codeword. In order for the decoded codeword $D(y)$ not to be equal to the message $m$ , one of the following events must occur:

At this point, the proof works for a fixed message $m$ . But we need to make sure that the above bound holds for all the messages $m$ simultaneously. For that, let us sort the $2^{k}$ messages by their decoding error probabilities. Now by applying Markov's inequality, we can show the decoding error probability for the first $2^{k-1}$ messages to be at most $2.2^{-\delta n}$ . Thus in order to confirm that the above bound to hold for every message $m$ , we could just trim off the last $2^{k-1}$ messages from the sorted order. This essentially gives us another encoding function $E^{\prime }$ with a corresponding decoding function $D^{\prime }$ with a decoding error probability of at most $2^{-\delta n+1}$ with the same rate. Taking $\delta ^{\prime }$ to be equal to $\delta -{\frac {1}{n}}$ we bound the decoding error probability to $2^{-\delta ^{\prime }n}$ . This expurgation process completes the proof of Theorem 1.

## Converse of Shannon's capacity theorem

The converse of the capacity theorem essentially states that $1-H(p)$ is the best rate one can achieve over a binary symmetric channel. Formally the theorem states:

For a detailed proof of this theorem, the reader is asked to refer to the bibliography. The intuition behind the proof is however showing the number of errors to grow rapidly as the rate grows beyond the channel capacity. The idea is the sender generates messages of dimension $k$ , while the channel $BSC_{p}$ introduces transmission errors. When the capacity of the channel is $H(p)$ , the number of errors is typically $2^{H(p+\epsilon )n}$ for a code of block length $n$ . The maximum number of messages is $2^{k}$ . The output of the channel on the other hand has $2^{n}$ possible values. If there is any confusion between any two messages, it is likely that $2^{k}2^{H(p+\epsilon )n}\geq 2^{n}$ . Hence we would have $k\geq \lceil (1-H(p+\epsilon )n)\rceil$ , a case we would like to avoid to keep the decoding error probability exponentially small.

## Codes for BSCp

Very recently, a lot of work has been done and is also being done to design explicit error-correcting codes to achieve the capacities of several standard communication channels. The motivation behind designing such codes is to relate the rate of the code with the fraction of errors which it can correct.

The approach behind the design of codes which meet the channel capacities of $BSC$ , $BEC$ have been to correct a lesser number of errors with a high probability, and to achieve the highest possible rate. Shannon’s theorem gives us the best rate which could be achieved over a $BSC_{p}$ , but it does not give us an idea of any explicit codes which achieve that rate. In fact such codes are typically constructed to correct only a small fraction of errors with a high probability, but achieve a very good rate. The first such code was due to George D. Forney in 1966. The code is a concatenated code by concatenating two different kinds of codes. We shall discuss the construction Forney's code for the Binary Symmetric Channel and analyze its rate and decoding error probability briefly here. Various explicit codes for achieving the capacity of the binary erasure channel have also come up recently.

## Forney's code for BSCp

For the outer code $C_{\text{out}}$ , a Reed-Solomon code would have been the first code to have come in mind. However, we would see that the construction of such a code cannot be done in polynomial time. This is why a binary linear code is used for $C_{\text{out}}$ .

### Decoding error probability for C*

We have given a general technique to construct $C^{*}$ . For more detailed descriptions on $C_{\text{in}}$ and $C{out}$ please read the following references. Recently a few other codes have also been constructed for achieving the capacities. LDPC codes have been considered for this purpose for their faster decoding time.