Dirac delta function

From formulasearchengine
Jump to navigation Jump to search


Schematic representation of the Dirac delta function by a line surmounted by an arrow. The height of the arrow is usually used to specify the value of any multiplicative constant, which will give the area under the function. The other convention is to write the area next to the arrowhead.

In mathematics, the Dirac delta function, or Template:Mvar function, is a generalized function, or distribution, on the real number line that is zero everywhere except at zero, with an integral of one over the entire real line.[1][2][3] The delta function is sometimes thought of as an infinitely high, infinitely thin spike at the origin, with total area one under the spike, and physically represents the density of an idealized point mass or point charge.[4] It was introduced by theoretical physicist Paul Dirac. In the context of signal processing it is often referred to as the unit impulse symbol (or function).[5] Its discrete analog is the Kronecker delta function which is usually defined on a finite domain and takes values 0 and 1.

From a purely mathematical viewpoint, the Dirac delta is not strictly a function, because any extended-real function that is equal to zero everywhere but a single point must have total integral zero.[6] The delta function only makes sense as a mathematical object when it appears inside an integral. While from this perspective the Dirac delta can usually be manipulated as though it were a function, formally it must be defined as a distribution that is also a measure. In many applications, the Dirac delta is regarded as a kind of limit (a weak limit) of a sequence of functions having a tall spike at the origin. The approximating functions of the sequence are thus "approximate" or "nascent" delta functions.


The graph of the delta function is usually thought of as following the whole x-axis and the positive y-axis. Despite its name, the delta function is not truly a function, at least not a usual one with range in real numbers. For example, the objects f(x) = δ(x) and g(x) = 0 are equal everywhere except at x = 0 yet have integrals that are different. According to Lebesgue integration theory, if f and g are functions such that f = g almost everywhere, then f is integrable if and only if g is integrable and the integrals of f and g are identical. Rigorous treatment of the Dirac delta requires measure theory or the theory of distributions.

The Dirac delta is used to model a tall narrow spike function (an impulse), and other similar abstractions such as a point charge, point mass or electron point. For example, to calculate the dynamics of a baseball being hit by a bat, one can approximate the force of the bat hitting the baseball by a delta function. In doing so, one not only simplifies the equations, but one also is able to calculate the motion of the baseball by only considering the total impulse of the bat against the ball rather than requiring knowledge of the details of how the bat transferred energy to the ball.

In applied mathematics, the delta function is often manipulated as a kind of limit (a weak limit) of a sequence of functions, each member of which has a tall spike at the origin: for example, a sequence of Gaussian distributions centered at the origin with variance tending to zero.


Joseph Fourier presented what is now called the Fourier integral theorem in his treatise Théorie analytique de la chaleur in the form:[7]

which is tantamount to the introduction of the δ-function in the form:[8]

Later, Augustin Cauchy expressed the theorem using exponentials:[9][10]

Cauchy pointed out that in some circumstances the order of integration in this result was significant.[11][12]

As justified using the theory of distributions, the Cauchy equation can be rearranged to resemble Fourier's original formulation and expose the δ-function as:

where the δ-function is expressed as:

A rigorous interpretation of the exponential form and the various limitations upon the function f necessary for its application extended over several centuries. The problems with a classical interpretation are explained as follows:[13]

The greatest drawback of the classical Fourier transformation is a rather narrow class of functions (originals) for which it can be effectively computed. Namely, it is necessary that these functions decrease sufficiently rapidly to zero (in the neighborhood of infinity) in order to insure the existence of the Fourier integral. For example, the Fourier transform of such simple functions as polynomials does not exist in the classical sense. The extension of the classical Fourier transformation to distributions considerably enlarged the class of functions that could be transformed and this removed many obstacles.

Further developments included generalization of the Fourier integral, "beginning with Plancherel's pathbreaking L2-theory (1910), continuing with Wiener's and Bochner's works (around 1930) and culminating with the amalgamation into L. Schwartz's theory of distributions (1945) ...",[14] and leading to the formal development of the Dirac delta function.

An infinitesimal formula for an infinitely tall, unit impulse delta function (infinitesimal version of Cauchy distribution) explicitly appears in an 1827 text of Augustin Louis Cauchy.[15] Siméon Denis Poisson considered the issue in connection with the study of wave propagation as did Gustav Kirchhoff somewhat later. Kirchhoff and Hermann von Helmholtz also introduced the unit impulse as a limit of Gaussians, which also corresponded to Lord Kelvin's notion of a point heat source. At the end of the 19th century, Oliver Heaviside used formal Fourier series to manipulate the unit impulse.[16] The Dirac delta function as such was introduced as a "convenient notation" by Paul Dirac in his influential 1930 book Principles of Quantum Mechanics.[17] He called it the "delta function" since he used it as a continuous analogue of the discrete Kronecker delta.


The Dirac delta can be loosely thought of as a function on the real line which is zero everywhere except at the origin, where it is infinite,

and which is also constrained to satisfy the identity


This is merely a heuristic characterization. The Dirac delta is not a function in the traditional sense as no function defined on the real numbers has these properties.[17] The Dirac delta function can be rigorously defined either as a distribution or as a measure.

As a measure

One way to rigorously define the delta function is as a measure, which accepts as an argument a subset A of the real line R, and returns δ(A) = 1 if 0 ∈ A, and δ(A) = 0 otherwise.[19] If the delta function is conceptualized as modeling an idealized point mass at 0, then δ(A) represents the mass contained in the set A. One may then define the integral against δ as the integral of a function against this mass distribution. Formally, the Lebesgue integral provides the necessary analytic device. The Lebesgue integral with respect to the measure δ satisfies

for all continuous compactly supported functions f. The measure δ is not absolutely continuous with respect to the Lebesgue measure — in fact, it is a singular measure. Consequently, the delta measure has no Radon–Nikodym derivative — no true function for which the property

holds.[20] As a result, the latter notation is a convenient abuse of notation, and not a standard (Riemann or Lebesgue) integral.

As a probability measure on R, the delta measure is characterized by its cumulative distribution function, which is the unit step function[21]

This means that H(x) is the integral of the cumulative indicator function 1(−∞, x] with respect to the measure δ; to wit,

Thus in particular the integral of the delta function against a continuous function can be properly understood as a Stieltjes integral:[22]

All higher moments of δ are zero. In particular, characteristic function and moment generating function are both equal to one.

As a distribution

In the theory of distributions a generalized function is thought of not as a function itself, but only in relation to how it affects other functions when it is "integrated" against them. In keeping with this philosophy, to define the delta function properly, it is enough to say what the "integral" of the delta function against a sufficiently "good" test function is. If the delta function is already understood as a measure, then the Lebesgue integral of a test function against that measure supplies the necessary integral.

A typical space of test functions consists of all smooth functions on R with compact support. As a distribution, the Dirac delta is a linear functional on the space of test functions and is defined by[23]


for every test function φ.

For δ to be properly a distribution, it must be "continuous" in a suitable sense. In general, for a linear functional S on the space of test functions to define a distribution, it is necessary and sufficient that, for every positive integer N there is an integer MN and a constant CN such that for every test function φ, one has the inequality[24]

With the δ distribution, one has such an inequality (with CN = 1) with MN = 0 for all N. Thus δ is a distribution of order zero. It is, furthermore, a distribution with compact support (the support being {0}).

The delta distribution can also be defined in a number of equivalent ways. For instance, it is the distributional derivative of the Heaviside step function. This means that, for every test function φ, one has

Intuitively, if integration by parts were permitted, then the latter integral should simplify to

and indeed, a form of integration by parts is permitted for the Stieltjes integral, and in that case one does have

In the context of measure theory, the Dirac measure gives rise to a distribution by integration. Conversely, equation (Template:EquationNote) defines a Daniell integral on the space of all compactly supported continuous functions φ which, by the Riesz representation theorem, can be represented as the Lebesgue integral of φ with respect to some Radon measure.


The delta function can be defined in n-dimensional Euclidean space Rn as the measure such that

for every compactly supported continuous function f. As a measure, the n-dimensional delta function is the product measure of the 1-dimensional delta functions in each variable separately. Thus, formally, with x = (x1,x2,...,xn), one has[5]


The delta function can also be defined in the sense of distributions exactly as above in the one-dimensional case.[25] However, despite widespread use in engineering contexts, (Template:EquationNote) should be manipulated with care, since the product of distributions can only be defined under quite narrow circumstances.[26]

The notion of a Dirac measure makes sense on any set whatsoever.[19] Thus if X is a set, x0 ∈ X is a marked point, and Σ is any sigma algebra of subsets of X, then the measure defined on sets A ∈ Σ by

is the delta measure or unit mass concentrated at x0.

Another common generalization of the delta function is to a differentiable manifold where most of its properties as a distribution can also be exploited because of the differentiable structure. The delta function on a manifold M centered at the point x0 ∈ M is defined as the following distribution:


for all compactly supported smooth real-valued functions φ on M.[27] A common special case of this construction is when M is an open set in the Euclidean space Rn.

On a locally compact Hausdorff space X, the Dirac delta measure concentrated at a point x is the Radon measure associated with the Daniell integral (Template:EquationNote) on compactly supported continuous functions φ. At this level of generality, calculus as such is no longer possible, however a variety of techniques from abstract analysis are available. For instance, the mapping is a continuous embedding of X into the space of finite Radon measures on X, equipped with its vague topology. Moreover, the convex hull of the image of X under this embedding is dense in the space of probability measures on X.[28]


Scaling and symmetry

The delta function satisfies the following scaling property for a non-zero scalar α:[29]

and so


In particular, the delta function is an even distribution, in the sense that

which is homogeneous of degree −1.

Algebraic properties

The distributional product of δ with x is equal to zero:

Conversely, if xf(x) = xg(x), where f and g are distributions, then

for some constant c.[30]


The integral of the time-delayed Dirac delta is given by:

This is sometimes referred to as the sifting property[31] or the sampling property. The delta function is said to "sift out" the value at t = T.

It follows that the effect of convolving a function f(t) with the time-delayed Dirac delta is to time-delay f(t) by the same amount:

      (using  (Template:EquationNote): )

This holds under the precise condition that f be a tempered distribution (see the discussion of the Fourier transform below). As a special case, for instance, we have the identity (understood in the distribution sense)

Composition with a function

More generally, the delta distribution may be composed with a smooth function g(x) in such a way that the familiar change of variables formula holds, that

provided that g is a continuously differentiable function with g′ nowhere zero.[32] That is, there is a unique way to assign meaning to the distribution so that this identity holds for all compactly supported test functions f. This distribution satisfies δ(g(x)) = 0 if g is nowhere zero, and otherwise if g has a real root at x0, then

It is natural therefore to define the composition δ(g(x)) for continuously differentiable functions g by

where the sum extends over all roots of g(x), which are assumed to be simple.[32] Thus, for example

In the integral form the generalized scaling property may be written as

Properties in n dimensions

The delta distribution in an n-dimensional space satisfies the following scaling property instead:

so that δ is a homogeneous distribution of degree −n. Under any reflection or rotation ρ, the delta function is invariant:

As in the one-variable case, it is possible to define the composition of δ with a bi-Lipschitz function[33] g: RnRn uniquely so that the identity

for all compactly supported functions f.

Using the coarea formula from geometric measure theory, one can also define the composition of the delta function with a submersion from one Euclidean space to another one of different dimension; the result is a type of current. In the special case of a continuously differentiable function g: RnR such that the gradient of g is nowhere zero, the following identity holds[34]

where the integral on the right is over g−1(0), the n − 1 dimensional surface defined by g(x) = 0 with respect to the Minkowski content measure. This is known as a simple layer integral.

More generally, if S is a smooth hypersurface of Rn, then we can associated to S the distribution that integrates any compactly supported smooth function g over S:

where σ is the hypersurface measure associated to S. This generalization is associated with the potential theory of simple layer potentials on S. If D is a domain in Rn with smooth boundary S, then δS is equal to the normal derivative of the indicator function of D in the distribution sense:

where n is the outward normal.[35][36] For a proof, see e.g. the article on the surface delta function.

Fourier transform

The delta function is a tempered distribution, and therefore it has a well-defined Fourier transform. Formally, one finds[37]

Properly speaking, the Fourier transform of a distribution is defined by imposing self-adjointness of the Fourier transform under the duality pairing of tempered distributions with Schwartz functions. Thus is defined as the unique tempered distribution satisfying

for all Schwartz functions φ. And indeed it follows from this that

As a result of this identity, the convolution of the delta function with any other tempered distribution S is simply S:

That is to say that δ is an identity element for the convolution on tempered distributions, and in fact the space of compactly supported distributions under convolution is an associative algebra with identity the delta function. This property is fundamental in signal processing, as convolution with a tempered distribution is a linear time-invariant system, and applying the linear time-invariant system measures its impulse response. The impulse response can be computed to any desired degree of accuracy by choosing a suitable approximation for δ, and once it is known, it characterizes the system completely. See LTI system theory:Impulse response and convolution.

The inverse Fourier transform of the tempered distribution f(ξ) = 1 is the delta function. Formally, this is expressed

and more rigorously, it follows since

for all Schwartz functions f.

In these terms, the delta function provides a suggestive statement of the orthogonality property of the Fourier kernel on R. Formally, one has

This is, of course, shorthand for the assertion that the Fourier transform of the tempered distribution


which again follows by imposing self-adjointness of the Fourier transform.

By analytic continuation of the Fourier transform, the Laplace transform of the delta function is found to be[38]

Distributional derivatives

The distributional derivative of the Dirac delta distribution is the distribution δ′ defined on compactly supported smooth test functions φ by[39]

The first equality here is a kind of integration by parts, for if δ were a true function then

The k-th derivative of δ is defined similarly as the distribution given on test functions by

In particular δ is an infinitely differentiable distribution.

The first derivative of the delta function is the distributional limit of the difference quotients:[40]

More properly, one has

where τh is the translation operator, defined on functions by τhφ(x) = φ(x+h), and on a distribution S by

In the theory of electromagnetism, the first derivative of the delta function represents a point magnetic dipole situated at the origin. Accordingly, it is referred to as a dipole or the doublet function.[41]

The derivative of the delta function satisfies a number of basic properties, including:

Furthermore, the convolution of δ' with a compactly supported smooth function f is

which follows from the properties of the distributional derivative of a convolution.

Higher dimensions

More generally, on an open set U in the n-dimensional Euclidean space Rn, the Dirac delta distribution centered at a point a ∈ U is defined by[43]

for all φ ∈ S(U), the space of all smooth compactly supported functions on U. If α = (α1, ..., αn) is any multi-index and ∂α denotes the associated mixed partial derivative operator, then the αth derivative ∂αδa of δa is given by[43]

That is, the αth derivative of δa is the distribution whose value on any test function φ is the αth derivative of φ at a (with the appropriate positive or negative sign).

The first partial derivatives of the delta function are thought of as double layers along the coordinate planes. More generally, the normal derivative of a simple layer supported on a surface is a double layer supported on that surface, and represents a laminar magnetic monopole. Higher derivatives of the delta function are known in physics as multipoles.

Higher derivatives enter into mathematics naturally as the building blocks for the complete structure of distributions with point support. If S is any distribution on U supported on the set {a} consisting of a single point, then there is an integer m and coefficients cα such that[44]

Representations of the delta function

The delta function can be viewed as the limit of a sequence of functions

where ηε(x) is sometimes called a nascent delta function{{safesubst:#invoke:anchor|main}}. This limit is meant in a weak sense: either that


for all continuous functions f having compact support, or that this limit holds for all smooth functions f with compact support. The difference between these two slightly different modes of weak convergence is often subtle: the former is convergence in the vague topology of measures, and the latter is convergence in the sense of distributions.

Approximations to the identity

Typically a nascent delta function ηε can be constructed in the following manner. Let η be an absolutely integrable function on R of total integral 1, and define

In n dimensions, one uses instead the scaling

Then a simple change of variables shows that ηε also has integral 1.[45] One shows easily that (Template:EquationNote) holds for all continuous compactly supported functions f, and so ηε converges weakly to δ in the sense of measures.

The ηε constructed in this way are known as an approximation to the identity.[46] This terminology is because the space L1(R) of absolutely integrable functions is closed under the operation of convolution of functions: fgL1(R) whenever f and g are in L1(R). However, there is no identity in L1(R) for the convolution product: no element h such that fh = f for all f. Nevertheless, the sequence ηε does approximate such an identity in the sense that

This limit holds in the sense of mean convergence (convergence in L1). Further conditions on the ηε, for instance that it be a mollifier associated to a compactly supported function,[47] are needed to ensure pointwise convergence almost everywhere.

If the initial η = η1 is itself smooth and compactly supported then the sequence is called a mollifier. The standard mollifier is obtained by choosing η to be a suitably normalized bump function, for instance

In some situations such as numerical analysis, a piecewise linear approximation to the identity is desirable. This can be obtained by taking η1 to be a hat function. With this choice of η1, one has

which are all continuous and compactly supported, although not smooth and so not a mollifier.

Probabilistic considerations

In the context of probability theory, it is natural to impose the additional condition that the initial η1 in an approximation to the identity should be positive, as such a function then represents a probability distribution. Convolution with a probability distribution is sometimes favorable because it does not result in overshoot or undershoot, as the output is a convex combination of the input values, and thus falls between the maximum and minimum of the input function. Taking η1 to be any probability distribution at all, and letting ηε(x) = η1(x/ε)/ε as above will give rise to an approximation to the identity. In general this converges more rapidly to a delta function if, in addition, η has mean 0 and has small higher moments. For instance, if η1 is the uniform distribution on [−1/2, 1/2], also known as the rectangular function, then:[48]

Another example is with the Wigner semicircle distribution

This is continuous and compactly supported, but not a mollifier because it is not smooth.


Nascent delta functions often arise as convolution semigroups. This amounts to the further constraint that the convolution of ηε with ηδ must satisfy

for all ε, δ > 0. Convolution semigroups in L1 that form a nascent delta function are always an approximation to the identity in the above sense, however the semigroup condition is quite a strong restriction.

In practice, semigroups approximating the delta function arise as fundamental solutions or Green's functions to physically motivated elliptic or parabolic partial differential equations. In the context of applied mathematics, semigroups arise as the output of a linear time-invariant system. Abstractly, if A is a linear operator acting on functions of x, then a convolution semigroup arises by solving the initial value problem

in which the limit is as usual understood in the weak sense. Setting ηε(x) = η(ε, x) gives the associated nascent delta function.

Some examples of physically important convolution semigroups arising from such a fundamental solution include the following.

The heat kernel

The heat kernel, defined by

represents the temperature in an infinite wire at time t > 0, if a unit of heat energy is stored at the origin of the wire at time t = 0. This semigroup evolves according to the one-dimensional heat equation:

In probability theory, ηε(x) is a normal distribution of variance ε and mean 0. It represents the probability density at time t = ε of the position of a particle starting at the origin following a standard Brownian motion. In this context, the semigroup condition is then an expression of the Markov property of Brownian motion.

In higher-dimensional Euclidean space Rn, the heat kernel is

and has the same physical interpretation, mutatis mutandis. It also represents a nascent delta function in the sense that ηε → δ in the distribution sense as ε → 0.

The Poisson kernel

The Poisson kernel

is the fundamental solution of the Laplace equation in the upper half-plane.[49] It represents the electrostatic potential in a semi-infinite plate whose potential along the edge is held at fixed at the delta function. The Poisson kernel is also closely related to the Cauchy distribution. This semigroup evolves according to the equation

where the operator is rigorously defined as the Fourier multiplier

Oscillatory integrals

In areas of physics such as wave propagation and wave mechanics, the equations involved are hyperbolic and so may have more singular solutions. As a result, the nascent delta functions that arise as fundamental solutions of the associated Cauchy problems are generally oscillatory integrals. An example, which comes from a solution of the Euler–Tricomi equation of transonic gas dynamics,[50] is the rescaled Airy function

Although using the Fourier transform, it is easy to see that this generates a semigroup in some sense, it is not absolutely integrable and so cannot define a semigroup in the above strong sense. Many nascent delta functions constructed as oscillatory integrals only converge in the sense of distributions (an example is the Dirichlet kernel below), rather than in the sense of measures.

Another example is the Cauchy problem for the wave equation in R1+1:[51]

The solution u represents the displacement from equilibrium of an infinite elastic string, with an initial disturbance at the origin.

Other approximations to the identity of this kind include the sinc function (used widely in electronics and telecommunications)

and the Bessel function

Plane wave decomposition

One approach to the study of a linear partial differential equation

where L is a differential operator on Rn, is to seek first a fundamental solution, which is a solution of the equation

When L is particularly simple, this problem can often be resolved using the Fourier transform directly (as in the case of the Poisson kernel and heat kernel already mentioned). For more complicated operators, it is sometimes easier first to consider an equation of the form

where h is a plane wave function, meaning that it has the form

for some vector ξ. Such an equation can be resolved (if the coefficients of L are analytic functions) by the Cauchy–Kovalevskaya theorem or (if the coefficients of L are constant) by quadrature. So, if the delta function can be decomposed into plane waves, then one can in principle solve linear partial differential equations.

Such a decomposition of the delta function into plane waves was part of a general technique first introduced essentially by Johann Radon, and then developed in this form by Fritz John (1955).[52] Choose k so that n + k is an even integer, and for a real number s, put

Then δ is obtained by applying a power of the Laplacian to the integral with respect to the unit sphere measure dω of g(x · ξ) for ξ in the unit sphere Sn−1:

The Laplacian here is interpreted as a weak derivative, so that this equation is taken to mean that, for any test function φ,

The result follows from the formula for the Newtonian potential (the fundamental solution of Poisson's equation). This is essentially a form of the inversion formula for the Radon transform, because it recovers the value of φ(x) from its integrals over hyperplanes. For instance, if n is odd and k = 1, then the integral on the right hand side is

where Rφ(ξ, p) is the Radon transform of φ:

An alternative equivalent expression of the plane wave decomposition, from Template:Harvtxt, is

for n even, and

for n odd.

Fourier kernels

{{#invoke:see also|seealso}} In the study of Fourier series, a major question consists of determining whether and in what sense the Fourier series associated with a periodic function converges to the function. The nth partial sum of the Fourier series of a function f of period 2π is defined by convolution (on the interval [−π,π]) with the Dirichlet kernel: