Fizeau experiment: Difference between revisions

Revision as of 21:25, 21 October 2013

Segmented regression, also known as piecewise regression or 'broken-stick regression', is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval. Segmented regression analysis can also be performed on multivariate data by partitioning the various independent variables. Segmented regression is useful when the independent variables, clustered into different groups, exhibit different relationships between the variables in these regions. The boundaries between the segments are breakpoints.

Segmented linear regression is segmented regression whereby the relations in the intervals are obtained by linear regression.

Segmented linear regression, two segments

Segmented linear regression with two segments separated by a breakpoint can be useful to quantify an abrupt change of the response function (Yr) of a varying influential factor (x). The breakpoint can be interpreted as a critical, safe, or threshold value beyond or below which (un)desired effects occur. The breakpoint can be important in decision making ^[1]

The figures illustrate some of the results and regression types obtainable.

A segmented regression analysis is based on the presence of a set of ( y, x ) data, in which y is the dependent variable and x the independent variable.

The least squares method applied separately to each segment, by which the two regression lines are made to fit the data set as closely as possible while minimizing the sum of squares of the differences (SSD) between observed (y) and calculated (Yr) values of the dependent variable, results in the following two equations:

Yr = A₁.x + K₁ for x < BP (breakpoint)
Yr = A₂.x + K₂ for x > BP (breakpoint)

where:

Yr is the expected (predicted) value of y for a certain value of x;

A₁ and A₂ are regression coefficients (indicating the slope of the line segments);

K₁ and K₂ are regression constants (indicating the intercept at the y-axis).

The data may show many types or trends,^[2] see the figures.

The method also yields two correlation coefficients (R):

$R_{1}^{2}=1-{\frac {\sum (y-Y_{r})^{2}}{\sum (y-Y_{a1})^{2}}}$ for x < BP (breakpoint)

and

$R_{2}^{2}=1-{\frac {\sum (y-Y_{r})^{2}}{\sum (y-Y_{a2})^{2}}}$ for x > BP (breakpoint)

where:

\sum (y-Y_{r})^{2}

is the minimized SSD per segment

and

Y_a1 and Y_a2 are the average values of y in the respective segments.

In the determination of the most suitable trend, statistical tests must be performed to ensure that this trend is reliable (significant).

When no significant breakpoint can be detected, one must fall back on a regression without breakpoint.

Example

For the blue figure at the right that gives the relation between yield of mustard (Yr = Ym, t/ha) and soil salinity (x = Ss, expressed as electric conductivity of the soil solution EC in dS/m) it is found that:^[3]

BP = 4.93, A₁ = 0, K₁ = 1.74, A₂ = −0.129, K₂ = 2.38, R₁² = 0.0035 (insignificant), R₂² = 0.395 (significant) and:

Ym = 1.74 t/ha for Ss < 4.93 (breakpoint)
Ym = −0.129 Ss + 2.38 t/ha for Ss > 4.93 (breakpoint)

indicating that soil salinities < 4.93 dS/m are safe and soil salinities > 4.93 dS/m reduce the yield @ 0.129 t/ha per unit increase of soil salinity.

The figure also shows confidence intervals and uncertainty as elaborated hereunder.

Test procedures

Template:Regression bar The following statistical tests are used to determine the type of trend:

significance of the breakpoint (BP) by expressing BP as a function of regression coefficients A₁ and A₂ and the means Y₁ and Y₂ of the y-data and the means X₁ and X₂ of the x data (left and an right of BP), using the laws of propagation of errors in additions and multiplications to compute the standard error (SE) of BP, and applying Student's t-test
significance of A₁ and A₂ applying Student's t-distribution and the standard error SE of A₁ and A₂
significance of the difference of A₁ and A₂ applying Student's t-distribution using the SE of their difference.
significance of the difference of Y₁ and Y₂ applying Student's t-distribution using the SE of their difference.

In addition, use is made of the correlation coefficient of all data (Ra), the coefficient of determination or coefficient of explanation, confidence intervals of the regression functions, and Anova analysis.^[4]

The coefficient of determination for all data (Cd), that is to be maximized under the conditions set by the significance tests, is found from:

$C_{d}=1-{\sum (y-Y_{r})^{2} \over \sum (y-Y_{a})^{2}}$

where Yr is the expected (predicted) value of y according to the former regression equations and Ya is the average of all y values.

The Cd coefficient ranges between 0 (no explanation at all) to 1 (full explanation, perfect match).
In a pure, unsegmented, linear regression, the values of Cd and Ra² are equal. In a segmented regression, Cd needs to be significantly larger than Ra² to justify the segmentation.

The optimal value of the breakpoint may be found such that the Cd coefficient is maximum.

References

↑ Frequency and Regression Analysis. Chapter 6 in: H.P.Ritzema (ed., 1994), Drainage Principles and Applications, Publ. 16, pp. 175-224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. ISBN 90-70754-33-9 . Free download from the webpage [1] , under nr. 13, or directly as PDF : [2]
↑ Drainage research in farmers' fields: analysis of data. Part of project “Liquid Gold” of the International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. Download as PDF : [3]
↑ R.J.Oosterbaan, D.P.Sharma, K.N.Singh and K.V.G.K.Rao, 1990, Crop production and soil salinity: evaluation of field data from India by segmented linear regression. In: Proceedings of the Symposium on Land Drainage for Salinity Control in Arid and Semi-Arid Regions, February 25th to March 2nd, 1990, Cairo, Egypt, Vol. 3, Session V, p. 373 - 383.
↑ Statistical significance of segmented linear regression with break-point using variance analysis and F-tests. Download from [4] under nr. 13, or directly as PDF : [5]

[1] Frequency and Regression Analysis. Chapter 6 in: H.P.Ritzema (ed., 1994), Drainage Principles and Applications, Publ. 16, pp. 175-224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. ISBN 90-70754-33-9 . Free download from the webpage [1] , under nr. 13, or directly as PDF : [2]

[2] Drainage research in farmers' fields: analysis of data. Part of project “Liquid Gold” of the International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. Download as PDF : [3]

[3] R.J.Oosterbaan, D.P.Sharma, K.N.Singh and K.V.G.K.Rao, 1990, Crop production and soil salinity: evaluation of field data from India by segmented linear regression. In: Proceedings of the Symposium on Land Drainage for Salinity Control in Arid and Semi-Arid Regions, February 25th to March 2nd, 1990, Cairo, Egypt, Vol. 3, Session V, p. 373 - 383.

[4] Statistical significance of segmented linear regression with break-point using variance analysis and F-tests. Download from [4] under nr. 13, or directly as PDF : [5]

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
-Hi there, I am Alyson Pomerleau and I think it sounds quite great when you say it. To climb is some thing I truly appreciate performing. He is an info officer. Alaska is where he's usually been living.<br><br>Here is my web blog - clairvoyant psychic ([http://c045.danah.co.kr/home/index.php?document_srl=1356970&mid=qna http://c045.danah.co.kr/])
+'''Segmented regression''', also known as '''piecewise regression''' or 'broken-stick regression', is a method in [[regression analysis]] in which the [[independent variable]] is partitioned into intervals and a separate line segment is fit to each interval. Segmented regression analysis can also be performed on multivariate data by partitioning the various independent variables. Segmented regression is useful when the independent variables, clustered into different groups, exhibit different relationships between the variables in these regions. The boundaries between the segments are ''breakpoints''.
+'''Segmented linear regression''' is segmented regression whereby the relations in the intervals  are obtained by [[linear regression]].
+==Segmented linear regression, two segments==
+[[File:SegReg3.gif|thumb|200px|1st limb horizontal]]
+[[File:SegReg1.gif|thumb|200px|1st limb sloping up]]
+[[File:SegReg2.gif|thumb|200px|1st limb sloping down]]
+Segmented linear regression with two segments separated by a ''breakpoint'' can be useful to quantify an abrupt change of the response function (Yr) of a varying influential factor ('''x''').  The breakpoint can be interpreted as a ''critical'', ''safe'', or ''threshold'' value beyond or below which (un)desired effects occur. The breakpoint can be important in decision making <ref>''Frequency and Regression Analysis''. Chapter 6 in: H.P.Ritzema (ed., 1994), ''Drainage Principles and Applications'', Publ. 16, pp. 175-224, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. ISBN 90-70754-33-9 . Free download from the webpage [http://www.waterlog.info/articles.htm] , under nr. 13, or directly as PDF : [http://www.waterlog.info/pdf/regtxt.pdf]</ref>
+The figures illustrate some of the results and regression types obtainable.
+A segmented regression analysis is based on the presence of a set of ( '''y, x''' ) data, in which '''y''' is the [[dependent variable]] and '''x''' the [[independent variable]].
+The [[least squares]] method applied separately to each segment, by which the two regression lines are made to fit the data set as closely as possible while minimizing the ''sum of squares of the differences'' (SSD) between observed ('''y''') and calculated (Yr) values of the dependent variable, results in the following two equations:
+* Yr = A<sub>1</sub>.'''x''' + K<sub>1</sub> &nbsp; &nbsp; for '''x''' < BP (breakpoint)
+* Yr = A<sub>2</sub>.'''x''' + K<sub>2</sub> &nbsp; &nbsp; for '''x''' > BP (breakpoint)
+where:<br>
+:Yr is the expected (predicted) value of '''y''' for a certain value of '''x''';
+:A<sub>1</sub> and A<sub>2</sub> are [[regression coefficient]]s (indicating the slope of the line segments);
+:K<sub>1</sub> and K<sub>2</sub> are ''regression constants'' (indicating the intercept at the '''y'''-axis).
+The data may show many types or trends,<ref>'' Drainage research in farmers' fields: analysis of data''. Part of project “Liquid Gold” of the International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. Download as PDF : [http://www.waterlog.info/pdf/analysis.pdf]</ref> see the figures.
+The method also yields two [[Pearson product-moment correlation coefficient|correlation coefficients]] (R):
+*<math>R_1 ^ 2 = 1 - \frac{\sum (y - Y_r) ^ 2 }{ \sum (y - Y_{a1})^2}</math> &nbsp; &nbsp; for '''x''' < BP (breakpoint)
+and
+*<math>R_2 ^ 2 = 1 - \frac{\sum (y - Y_r) ^ 2 }{ \sum (y - Y_{a2})^2}</math>  &nbsp; &nbsp; for '''x''' > BP (breakpoint)
+where:<br>
+: <math> \sum (y - Y_r) ^2 </math> is the minimized SSD per segment
+and
+:<big>Y<sub>a1</sub></big> and <big>Y<sub>a2</sub></big> are the average values of '''y''' in the respective segments.
+In the determination of the most suitable trend, [[statistical tests]] must be performed to ensure that this trend is reliable (significant).
+When no significant breakpoint can be detected, one must fall back on a regression without breakpoint.
+==Example==
+[[File:MUSTARD.JPG|thumb|250px|Segmented linear regression, type 3b]]
+For the blue figure at the right that gives the relation between yield of mustard (Yr = Ym, t/ha) and soil salinity ('''x''' = Ss, expressed as electric conductivity of the soil solution EC in dS/m) it is found that:<ref>R.J.Oosterbaan, D.P.Sharma, K.N.Singh and K.V.G.K.Rao, 1990, ''Crop production and soil salinity: evaluation of field data from India by segmented linear regression''. In: Proceedings of the Symposium on Land Drainage for Salinity Control in Arid and Semi-Arid Regions, February 25th to March 2nd, 1990, Cairo, Egypt, Vol. 3, Session V, p. 373 - 383.</ref>
+BP = 4.93, A<sub>1</sub> = 0, K<sub>1</sub> = 1.74, A<sub>2</sub> = &minus;0.129, K<sub>2</sub> = 2.38, R<sub>1</sub><sup>2</sup> = 0.0035 (insignificant), R<sub>2</sub><sup>2</sup> = 0.395 (significant) and:
+* Ym = 1.74 t/ha &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;for Ss < 4.93 (breakpoint)
+* Ym = &minus;0.129 Ss + 2.38 t/ha &nbsp; &nbsp; for Ss > 4.93 (breakpoint)
+indicating that soil salinities < 4.93 dS/m are safe and soil salinities > 4.93 dS/m reduce the yield @ 0.129 t/ha per unit increase of soil salinity.
+The figure also shows confidence intervals and uncertainty as elaborated hereunder.
+==Test procedures==
+[[File:CHAO.gif|thumb|250px|Example time series, type 5]]
+{{Regression bar}}
+The following ''statistical tests'' are used to determine the type of trend:
+# significance of the breakpoint (BP) by expressing BP as a function of  ''regression coefficients'' A<sub>1</sub> and A<sub>2</sub> and the means Y<sub>1</sub> and Y<sub>2</sub> of the '''y'''-data and the means X<sub>1</sub> and X<sub>2</sub> of the '''x''' data (left and an right of BP), using the laws of [[propagation of uncertainty|propagation of errors]] in additions and multiplications to compute the [[standard error]] (SE) of BP, and applying [[Student's t-test]]
+# significance of A<sub>1</sub> and A<sub>2</sub> applying Student's t-distribution and the ''standard error'' SE of A<sub>1</sub> and A<sub>2</sub>
+# significance of the difference of A<sub>1</sub> and A<sub>2</sub> applying Student's t-distribution using the SE of their difference.
+# significance of the difference of Y<sub>1</sub> and Y<sub>2</sub> applying Student's t-distribution using the SE of their difference.
+In addition, use is made of the [[Pearson product-moment correlation coefficient|correlation coefficient]] of all data (Ra), the [[coefficient of determination]] or coefficient of explanation, [[confidence interval]]s of the regression functions, and [[Anova]] analysis.<ref>''Statistical significance of segmented linear regression with break-point using variance analysis and F-tests''. Download from [http://www.waterlog.info/faqs.htm] under nr. 13, or directly as PDF : [http://www.waterlog.info/pdf/anova.pdf]</ref>
+The coefficient of determination for all data (Cd), that is to be maximized under the conditions set by the significance tests, is found from:
+*<math>C_d=1-{\sum (y-Y_r)^2\over\sum (y-Y_a)^2}</math>
+where Yr is the expected (predicted) value of '''y''' according to the former regression equations and Ya is the average of all '''y''' values.
+The Cd coefficient ranges between 0 (no explanation at all) to 1 (full explanation, perfect match). <br>
+In a pure, unsegmented, linear regression, the values of Cd and Ra<sup>2</sup> are equal. In a segmented regression, Cd needs to be significantly larger than Ra<sup>2</sup> to justify the segmentation.
+The [[Optimization (mathematics)|optimal]] value of the breakpoint may be found such that the Cd coefficient is [[Maxima and minima|maximum]].
+==See also==
+* [[Simple regression]]
+* [[Linear regression]]
+* [[Ordinary least squares]]
+* [[Multivariate adaptive regression splines]]
+* [[Local regression]]
+* [[Regression discontinuity design]]
+* [[SegReg|SegReg (software)]] for segmented regression
+==References==
+<references />
+{{DEFAULTSORT:Segmented Regression}}
+[[Category:Regression analysis]]
+[[Category:Statistical models]]
+[[Category:Data analysis]]

Fizeau experiment: Difference between revisions

Revision as of 21:25, 21 October 2013

Contents

Segmented linear regression, two segments

Example

Test procedures

See also

References

Navigation menu

Fizeau experiment: Difference between revisions

Revision as of 21:25, 21 October 2013

Segmented linear regression, two segments

Example

Test procedures

See also

References

Navigation menu

Search