Template:DoseConcentrationClearance: Difference between revisions
en>Ohiostandard m Spelling correction: amont => amount |
No edit summary |
||
Line 1: | Line 1: | ||
{{Use dmy dates|date=October 2012}} | |||
The '''smoothing spline''' is a method of [[smoothing]] (fitting a [[smooth curve]] to a set of noisy [[observation]]s) using a [[Spline (mathematics)|spline]] function. | |||
==Definition== | |||
Let <math>(x_i,Y_i);x_1<x_2<\dots<x_n, i \in \mathbb{Z} </math> be a sequence of observations, modeled by the relation <math>Y_i = \mu(x_i)</math>. The smoothing spline estimate <math>\hat\mu</math> of the function <math>\mu</math> is defined to be the minimizer (over the class of twice differentiable functions) of<ref>{{Cite book|title=Generalized Additive Models|last=Hastie|first=T. J.|coauthors=Tibshirani, R. J.|year=1990|publisher=Chapman and Hall|isbn=0-412-34390-8}}</ref> | |||
:<math> | |||
\sum_{i=1}^n (Y_i - \hat\mu(x_i))^2 + \lambda \int_{x_1}^{x_n} \hat\mu''(x)^2 \,dx. | |||
</math> | |||
Remarks: | |||
# <math>\lambda \ge 0</math> is a smoothing parameter, controlling the trade-off between fidelity to the data and roughness of the function estimate. | |||
# The integral is evaluated over the range of the <math>x_i</math>. | |||
# As <math>\lambda\to 0</math> (no smoothing), the smoothing spline converges to the interpolating spline. | |||
# As <math>\lambda\to\infty</math> (infinite smoothing), the roughness penalty becomes paramount and the estimate converges to a [[Ordinary least squares|linear least squares]] estimate. | |||
# The roughness penalty based on the [[second derivative]] is the most common in modern statistics literature, although the method can easily be adapted to penalties based on other derivatives. | |||
# In early literature, with equally-spaced <math>x_i</math>, second or third-order differences were used in the penalty, rather than derivatives. | |||
# When the sum-of-squares term is replaced by a log-likelihood, the resulting estimate is termed ''penalized likelihood''. The smoothing spline is the special case of penalized likelihood resulting from a Gaussian likelihood. | |||
==Derivation of the smoothing spline== | |||
It is useful to think of fitting a smoothing spline in two steps: | |||
# First, derive the values <math>\hat\mu(x_i);i=1,\ldots,n</math>. | |||
# From these values, derive <math>\hat\mu(x)</math> for all ''x''. | |||
Now, treat the second step first. | |||
Given the vector <math>\hat{m} = (\hat\mu(x_1),\ldots,\hat\mu(x_n))^T</math> of fitted values, the sum-of-squares part of the spline criterion is fixed. It remains only to minimize <math>\int \hat\mu''(x)^2 \, dx</math>, and the minimizer is a natural cubic [[Spline (mathematics)|spline]] that interpolates the points <math>(x_i,\hat\mu(x_i))</math>. This interpolating spline is a linear operator, and can be written in the form | |||
:<math> | |||
\hat\mu(x) = \sum_{i=1}^n \hat\mu(x_i) f_i(x) | |||
</math> | |||
where <math>f_i(x)</math> are a set of spline basis functions. As a result, the roughness penalty has the form | |||
:<math> | |||
\int \hat\mu''(x)^2 dx = \hat{m}^T A \hat{m}. | |||
</math> | |||
where the elements of ''A'' are <math>\int f_i''(x) f_j''(x)dx</math>. The basis functions, and hence the matrix ''A'', depend on the configuration of the predictor variables <math>x_i</math>, but not on the responses <math>Y_i</math> or <math>\hat m</math>. | |||
Now back to the first step. The penalized sum-of-squares can be written as | |||
:<math> | |||
\|Y - \hat m\|^2 + \lambda \hat{m}^T A \hat m, | |||
</math> | |||
where <math>Y=(Y_1,\ldots,Y_n)^T</math>. | |||
Minimizing over <math>\hat m</math> gives | |||
:<math> | |||
\hat m = (I + \lambda A)^{-1} Y. | |||
</math> | |||
==De Boor's approach== | |||
De Boor's approach exploits the same idea, of finding a balance between having a smooth curve and being close to the given data.<ref name="DeBoor2001">{{Cite book|title=A Practical Guide to Splines (Revised Edition)|last=De Boor|first=C.|year=2001|publisher=Springer|pages=207–214|isbn=0-387-90356-9}}</ref> | |||
<math>p\sum_{i=1}^n \left ( \frac{Y_i - \hat\mu \left (x_i \right )}{\delta_i} \right )^2+\left ( 1-p \right )\int \left ( \hat\mu^{\left (m \right )}\left ( x \right ) \right )^2 \, dx</math> | |||
where <math>p</math> is a parameter called smooth factor and belongs to the interval <math>[0,1]</math>, and <math>\delta_i;i=1,\dots,n</math> are the quantities controlling the extent of smoothing (they represent the weight <math>\delta_i^{-2}</math> of each point <math>Y_i</math>). In practice, since [[cubic splines]] are mostly used, <math>m</math> is usually <math>2</math>. The solution for <math>m=2</math> was proposed by Reinsch in 1967.<ref name="Reinsch1967" /> For <math>m=2</math>, when <math>p</math> approaches <math>1</math>, <math>\hat\mu</math> converges to the "natural" spline interpolant to the given data.<ref name="DeBoor2001" /> As <math>p</math> approaches <math>0</math>, <math>\hat\mu</math> converges to a straight line (the smoothest curve). Since finding a suitable value of <math>p</math> is a task of trial and error, a redundant constant <math>S</math> was introduced for convenience.<ref name="Reinsch1967">{{Cite web|author=Reinsch, Christian H|title=Smoothing by Spline Functions|url=http://www.cise.ufl.edu/class/cap5416fa10/resources/Reinsch_1967.pdf |accessdate=11 March 2011}}</ref> | |||
<math>S</math> is used to numerically determine the value of <math>p</math> so that the function <math>\hat\mu</math> meets the following condition: | |||
<math>\sum_{i=1}^n \left ( \frac{Y_i - \hat\mu \left (x_i \right )}{\delta_i} \right )^2 \le S</math> | |||
The algorithm described by de Boor starts with <math>p=0</math> and increases <math>p</math> until the condition is met.<ref name="DeBoor2001" /> If <math>\delta_i</math> is an estimation of the standard deviation for <math>Y_i</math>, the constant <math>S</math> is recommended to be chosen in the interval <math>\left [ n-\sqrt{2n},n+\sqrt{2n} \right ]</math>. Having <math>S=0</math> means the solution is the "natural" spline interpolant.<ref name="Reinsch1967" /> Increasing <math>S</math> means we obtain a smoother curve by getting farther from the given data. | |||
==Creating a multidimensional spline== | |||
Given the constraint from the definition formula <math>x_1<x_2< \dots <x_n</math> we can conclude that the algorithm doesn't work for any sets of data. If we plan to use this algorithm for random points in a multidimensional space we need to find a solution to give as input to the algorithm sets of data where these constraints are met. A solution for this is to introduce a parameter so that the input data would be represented as single-valued functions depending on that parameter; after this the smoothing will be performed for each function. In a bidimensional space a solution would be to parametrize <math>x</math> and <math>y</math> so that they would become <math>x(t)</math> and <math>y(t)</math> where <math>t_1<t_2< \dots <t_n</math>. A convenient solution for <math>t</math> is the cumulating distance <math>t_{i+1}=t_{i}+\sqrt{(x_{i+1}-x_{i})^2+(y_{i+1}-y_{i})^2}</math> where <math>t_1=0</math>.<ref>{{Cite web|author=Robert E. Smith Jr., Joseph M Price and Lona M. Howser|title=A Smoothing Algorithm Using Cubic Spline Functions|url=http://www.pdas.com/refs/tnd7397.pdf |accessdate=31 May 2011}}</ref><ref>{{Cite web|author=N. Y. Graham|title=Smoothing With Periodic Cubic Splines|url=http://www.alcatel-lucent.com/bstj/vol62-1983/articles/bstj62-1-101.pdf |accessdate=31 May 2011}}</ref> | |||
A more detailed analysis on parametrization is done by E.T.Y Lee.<ref>{{Cite web|author=E.T.Y. Lee|title=Choosing nodes in parametric curve interpolation|url=http://www.cs.bgu.ac.il/~leonid/na105/Splines/Lee.pdf |accessdate=28 June 2011}}</ref> | |||
==Related methods== | |||
Smoothing splines are related to, but distinct from: | |||
* Regression splines. In this method, the data is fitted to a set of spline basis functions with a reduced set of knots, typically by least squares. No roughness penalty is used. | |||
* Penalized Splines. This combines the reduced knots of regression splines, with the roughness penalty of smoothing splines.<ref>{{Cite book|title=Semiparametric Regression|last=Ruppert|first=David|coauthors=Wand, M. P. and Carroll, R. J.|publisher=Cambridge University Press|year=2003|isbn=0-521-78050-0}}</ref> | |||
* [[Elastic map]]s method for [[manifold learning]]. This method combines the [[least squares]] penalty for approximation error with the bending and stretching penalty of the approximating manifold and uses the coarse discretization of the optimization problem. | |||
==Source code== | |||
Source code for [[Spline (mathematics)|spline]] smoothing can be found in the examples from [[Carl R. de Boor|Carl de Boor's]] book ''A Practical Guide to Splines''. The examples are in [[Fortran]] [[programming language]]. The updated sources are available also on Carl de Boor's official site [http://pages.cs.wisc.edu/~deboor/]. | |||
==Further reading== | |||
* Wahba, G. (1990). ''Spline Models for Observational Data''. SIAM, Philadelphia. | |||
* Green, P. J. and Silverman, B. W. (1994). ''Nonparametric Regression and Generalized Linear Models''. CRC Press. | |||
* De Boor, C. (2001). ''A Practical Guide to Splines (Revised Edition)''. Springer. | |||
==References== | |||
{{Reflist}} | |||
[[Category:Regression analysis]] | |||
[[Category:Splines]] | |||
[[Category:Statistical methods]] |
Revision as of 22:42, 4 July 2013
30 year-old Entertainer or Range Artist Wesley from Drumheller, really loves vehicle, property developers properties for sale in singapore singapore and horse racing. Finds inspiration by traveling to Works of Antoni Gaudí. The smoothing spline is a method of smoothing (fitting a smooth curve to a set of noisy observations) using a spline function.
Definition
Let be a sequence of observations, modeled by the relation . The smoothing spline estimate of the function is defined to be the minimizer (over the class of twice differentiable functions) of[1]
Remarks:
- is a smoothing parameter, controlling the trade-off between fidelity to the data and roughness of the function estimate.
- The integral is evaluated over the range of the .
- As (no smoothing), the smoothing spline converges to the interpolating spline.
- As (infinite smoothing), the roughness penalty becomes paramount and the estimate converges to a linear least squares estimate.
- The roughness penalty based on the second derivative is the most common in modern statistics literature, although the method can easily be adapted to penalties based on other derivatives.
- In early literature, with equally-spaced , second or third-order differences were used in the penalty, rather than derivatives.
- When the sum-of-squares term is replaced by a log-likelihood, the resulting estimate is termed penalized likelihood. The smoothing spline is the special case of penalized likelihood resulting from a Gaussian likelihood.
Derivation of the smoothing spline
It is useful to think of fitting a smoothing spline in two steps:
Now, treat the second step first.
Given the vector of fitted values, the sum-of-squares part of the spline criterion is fixed. It remains only to minimize , and the minimizer is a natural cubic spline that interpolates the points . This interpolating spline is a linear operator, and can be written in the form
where are a set of spline basis functions. As a result, the roughness penalty has the form
where the elements of A are . The basis functions, and hence the matrix A, depend on the configuration of the predictor variables , but not on the responses or .
Now back to the first step. The penalized sum-of-squares can be written as
De Boor's approach
De Boor's approach exploits the same idea, of finding a balance between having a smooth curve and being close to the given data.[2]
where is a parameter called smooth factor and belongs to the interval , and are the quantities controlling the extent of smoothing (they represent the weight of each point ). In practice, since cubic splines are mostly used, is usually . The solution for was proposed by Reinsch in 1967.[3] For , when approaches , converges to the "natural" spline interpolant to the given data.[2] As approaches , converges to a straight line (the smoothest curve). Since finding a suitable value of is a task of trial and error, a redundant constant was introduced for convenience.[3] is used to numerically determine the value of so that the function meets the following condition:
The algorithm described by de Boor starts with and increases until the condition is met.[2] If is an estimation of the standard deviation for , the constant is recommended to be chosen in the interval . Having means the solution is the "natural" spline interpolant.[3] Increasing means we obtain a smoother curve by getting farther from the given data.
Creating a multidimensional spline
Given the constraint from the definition formula we can conclude that the algorithm doesn't work for any sets of data. If we plan to use this algorithm for random points in a multidimensional space we need to find a solution to give as input to the algorithm sets of data where these constraints are met. A solution for this is to introduce a parameter so that the input data would be represented as single-valued functions depending on that parameter; after this the smoothing will be performed for each function. In a bidimensional space a solution would be to parametrize and so that they would become and where . A convenient solution for is the cumulating distance where .[4][5]
A more detailed analysis on parametrization is done by E.T.Y Lee.[6]
Related methods
Smoothing splines are related to, but distinct from:
- Regression splines. In this method, the data is fitted to a set of spline basis functions with a reduced set of knots, typically by least squares. No roughness penalty is used.
- Penalized Splines. This combines the reduced knots of regression splines, with the roughness penalty of smoothing splines.[7]
- Elastic maps method for manifold learning. This method combines the least squares penalty for approximation error with the bending and stretching penalty of the approximating manifold and uses the coarse discretization of the optimization problem.
Source code
Source code for spline smoothing can be found in the examples from Carl de Boor's book A Practical Guide to Splines. The examples are in Fortran programming language. The updated sources are available also on Carl de Boor's official site [1].
Further reading
- Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
- Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. CRC Press.
- De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer.
References
43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.
- ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 2.0 2.1 2.2 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534 - ↑ 3.0 3.1 3.2 Template:Cite web
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ 20 year-old Real Estate Agent Rusty from Saint-Paul, has hobbies and interests which includes monopoly, property developers in singapore and poker. Will soon undertake a contiki trip that may include going to the Lower Valley of the Omo.
My blog: http://www.primaboinca.com/view_profile.php?userid=5889534