|
|
Line 1: |
Line 1: |
| The '''iterative proportional fitting procedure''' ('''IPFP''', also known as '''biproportional fitting''' in statistics, '''RAS algorithm'''<ref>{{cite journal
| | I'm Eddie and I liѵe աith my huѕband and our 2 children in Bergerаc, in thе south part. My hobbies are Art collеcting, Antiquing and Shooting sport.<br>http://www.highschoolofperformingarts.com/uploads/files/pres.php?uploads/files/pres.php=salomon-xt-slab-5-shoes<br><br>my wеƄpage; [http://Grohova.cz/img/tmp/pres.php?nike/air/force=adidas-shoes-barricade-6.0 Air jordan 11 kaskus] |
| |last=Bacharach|first=M.
| |
| |year=1965
| |
| |title=Estimating Nonnegative Matrices from Marginal Data
| |
| |journal=International Economic Review
| |
| |volume=6|pages=294–310
| |
| |doi=10.2307/2525582
| |
| |jstor=2525582
| |
| |issue=3
| |
| |publisher=Blackwell Publishing
| |
| }}</ref> in economics and '''matrix raking''' or '''matrix scaling''' in computer science) is an [[iterative algorithm]] for estimating cell values of a [[contingency table]] such that the marginal totals remain fixed and the estimated table decomposes into an [[outer product]].
| |
| | |
| First introduced by [[W. Edwards Deming|Deming]] and Stephan in 1940<ref>{{cite journal
| |
| |last=Deming |first=W. E.|authorlink=W. Edwards Deming
| |
| |last2=Stephan |first2=F. F.
| |
| |year=1940
| |
| |title=On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known
| |
| |journal=[[Annals of Mathematical Statistics]]
| |
| |volume=11 |issue=4 |pages=427–444
| |
| |mr=3527 |doi=10.1214/aoms/1177731829
| |
| }}</ref> (they proposed IPFP as an algorithm leading to a minimizer of the [[Pearson X-squared statistic]], which it ''does not'',<ref>{{cite journal
| |
| |last=Stephan |first=F. F.|year=1942
| |
| |title=Iterative method of adjusting frequency tables when expected margins are known
| |
| |journal=[[Annals of Mathematical Statistics]]
| |
| |volume=13 |issue=2 |pages=166–178
| |
| |mr=6674 | zbl = 0060.31505 |doi=10.1214/aoms/1177731604
| |
| }}</ref> and even failed to prove convergence), it has seen various extensions and related research. A rigorous proof of convergence by means of [[differential geometry]] is due to [[Stephen Fienberg|Fienberg]] (1970).<ref>{{cite journal
| |
| |last=Fienberg |first=S. E.|authorlink=Stephen Fienberg
| |
| |year=1970
| |
| |title=An Iterative Procedure for Estimation in Contingency Tables
| |
| |journal=[[Annals of Mathematical Statistics]]
| |
| |volume=41 |issue=3 |pages=907–917
| |
| |mr=266394 | zbl = 0198.23401 | jstor = 2239244 |doi=10.1214/aoms/1177696968
| |
| }}</ref> He interpreted the family of contingency tables of constant crossproduct ratios as a particular (''IJ'' − 1)-dimensional manifold of constant interaction and showed that the IPFP is a fixed-point iteration on that manifold. Nevertheless, he assumed strictly positive observations. Generalization to tables with zero entries is still considered a hard and only partly solved problem.
| |
| | |
| An exhaustive treatment of the algorithm and its mathematical foundations can be found in the book of Bishop et al. (1975).<ref>{{cite book
| |
| |title=Discrete Multivariate Analysis: Theory and Practice
| |
| |last=Bishop |first=Y. M. M.
| |
| |first2=S. E. |last2=Fienberg |authorlink2=Stephen Fienberg
| |
| |first3=P. W. |last3=Holland
| |
| |year=1975
| |
| |publisher=MIT Press|isbn=978-0-262-02113-5 |mr=381130
| |
| }}</ref> The first general proof of convergence, built on non-trivial measure theoretic theorems and entropy minimization, is due to Csiszár (1975).<ref>{{cite journal
| |
| |last=Csiszár |first=I.|authorlink=Imre Csiszár
| |
| |year=1975
| |
| |title=''I''-Divergence of Probability Distributions and Minimization Problems
| |
| |journal=Annals of Probability
| |
| |volume=3 |issue=1 |pages=146–158
| |
| |mr=365798 | zbl = 0318.60013 | jstor = 2959270 |doi=10.1214/aop/1176996454
| |
| }}</ref>
| |
| Relatively new results on convergence and error behavior have been published by Pukelsheim and Simeone (2009)
| |
| .<ref>{{cite web |title=On the Iterative Proportional Fitting Procedure: Structure of Accumulation Points and L1-Error Analysis |url=http://opus.bibliothek.uni-augsburg.de/volltexte/2009/1368/ |date= |work= |publisher= Pukelsheim, F. and Simeone, B. |accessdate=2009-06-28}}</ref> They proved simple necessary and sufficient conditions for the convergence of the IPFP for arbitrary two-way tables (i.e. tables with zero entries) by analysing an <math>L_1</math>-error function.
| |
| | |
| Other general algorithms can be modified to yield the same limit as the IPFP, for instance the [[Newton–Raphson method]] and
| |
| the [[EM algorithm]]. In most cases, IPFP is preferred due to its computational speed, numerical stability and algebraic simplicity.
| |
| | |
| == Algorithm 1 (classical IPFP) ==
| |
| | |
| Given a two-way (''I'' × ''J'')-table of counts <math>(x_{ij})</math>, where the cell values are assumed to be Poisson or multinomially distributed, we wish to estimate a decomposition <math>\hat{m}_{ij} = a_i b_j</math> for all ''i'' and ''j'' such that <math>(\hat{m}_{ij})</math> is the [[maximum likelihood]] estimate (MLE) of the expected values <math>(m_{ij})</math> leaving the marginals <math>\textstyle x_{i+} = \sum_j x_{ij}\,</math> and <math>\textstyle x_{+j} = \sum_i x_{ij}\,</math> fixed. The assumption that the table factorizes in such a manner is known as the ''model of independence'' (I-model). Written in terms of a [[log-linear model]], we can write this assumption as <math>\log\ m_{ij} = u + v_i + w_j + z_{ij}</math>, where <math>m_{ij} := \mathbb{E}(x_{ij})</math>, <math>\sum_i v_i = \sum_j w_j = 0</math> and the interaction term vanishes, that is <math>z_{ij} = 0</math> for all ''i'' and ''j''.
| |
| | |
| Choose initial values <math>\hat{m}_{ij}^{(0)} := 1</math> (different choices of initial values may lead to changes in convergence behavior), and for <math>\eta \geq 1</math> set
| |
| | |
| : <math>\hat{m}_{ij}^{(2\eta - 1)} = \frac{\hat{m}_{ij}^{(2\eta-2)}x_{i+}}{\sum_{k=1}^J \hat{m}_{ik}^{(2\eta-2)}}</math>
| |
| | |
| : <math>\hat{m}_{ij}^{(2\eta)} = \frac{\hat{m}_{ij}^{(2\eta-1)}x_{+j}}{\sum_{k=1}^I \hat{m}_{kj}^{(2\eta-1)}}.</math>
| |
| | |
| Notes:
| |
| * Convergence does not depend on the actual distribution. Distributional assumptions are necessary for inferring that the limit <math>(\hat{m}_{ij}) := \lim_{\eta\rightarrow\infty} (\hat{m}^{(\eta)}_{ij})</math> is an MLE indeed.
| |
| | |
| * IPFP can be manipulated to generate any positive marginals be replacing <math>x_{i+}</math> by the desired row marginal <math>u_i</math> (analogously for the column marginals).
| |
| | |
| * IPFP can be extended to fit the ''model of quasi-independence'' (Q-model), where <math>m_{ij} = 0</math> is known a priori for <math>(i,j)\in S</math>. Only the initial values have to be changed: Set <math>\hat{m}_{ij}^{(0)} = 0</math> if <math>(i,j)\in S</math> and 1 otherwise.
| |
| | |
| == Algorithm 2 (factor estimation) ==
| |
| | |
| Assume the same setting as in the classical IPFP.
| |
| Alternatively, we can estimate the row and column factors separately: Choose initial values <math>\hat{b}_j^{(0)} := 1</math>, and for <math>\eta \geq 1</math> set
| |
| | |
| : <math>\hat{a}_i^{(\eta)} = \frac{x_{i+}}{\sum_j \hat{b}_j^{(\eta-1)}},</math>
| |
| | |
| : <math>\hat{b}_j^{(\eta)} = \frac{x_{+j}}{\sum_i \hat{a}_i^{(\eta)}}</math>
| |
| | |
| Setting <math>\hat{m}_{ij}^{(2\eta)} = \hat{a}_i^{(\eta)}\hat{b}_j^{(\eta)}</math>, the two variants of the algorithm are mathematically equivalent (can be seen by formal induction).
| |
| | |
| Notes:
| |
| | |
| * In matrix notation, we can write <math>(\hat{m}_{ij}) = \hat{a}\hat{b}^T</math>, where <math>\hat{a} = (\hat{a}_1,\ldots,\hat{a}_I)^T = \lim_{\eta\rightarrow\infty} \hat{a}^{(\eta)}</math> and <math>\hat{b} = (\hat{b}_1,\ldots,\hat{b}_J)^T = \lim_{\eta\rightarrow\infty} \hat{b}^{(\eta)}</math>.
| |
| * The factorization is not unique, since it is <math>m_{ij} = a_i b_j = (\gamma a_i)(\frac{1}{\gamma}b_j)</math> for all <math>\gamma > 0</math>.
| |
| * The factor totals remain constant, i.e. <math>\sum_i \hat{a}_i^{(\eta)} = \sum_i \hat{a}_i^{(1)}</math> for all <math>\eta \geq 1</math> and <math>\sum_j \hat{b}_j^{(\eta)} = \sum_j \hat{b}_j^{(0)}</math> for all <math>\eta \geq 0</math>.
| |
| * To fit the Q-model, where <math>m_{ij} = 0</math> a priori for <math>(i,j)\in S</math>, set <math>\delta_{ij} = 0</math> if (<math>i,j)\in S</math> and <math>\delta_{ij} = 1</math> otherwise. Then
| |
| | |
| :: <math>\hat{a}_i^{(\eta)} = \frac{x_{i+}}{\sum_j \delta_{ij}\hat{b}_j^{(\eta-1)}},</math>
| |
| | |
| :: <math>\hat{b}_j^{(\eta)} = \frac{x_{+j}}{\sum_i \delta_{ij}\hat{a}_i^{(\eta)}}</math>
| |
| | |
| :: <math>\hat{m}_{ij}^{(2\eta)} = \delta_{ij}\hat{a}_i^{(\eta)}\hat{b}_j^{(\eta)}</math>
| |
| | |
| Obviously, the I-model is a particular case of the Q-model.
| |
| | |
| == Algorithm 3 (RAS) ==
| |
| | |
| The Problem: Let <math>M := (m^{(0)}_{ij}) \in \mathbb{R}^{I\times J}</math> be the initial matrix with nonnegative entries, <math>u \in \mathbb{R}^I</math> a vector of specified
| |
| row marginals (e.i. row sums) and <math>v \in \mathbb{R}^J</math> a vector of column marginals. We wish to compute a matrix <math>\hat{M} = (\hat{m}_{ij}) \in \mathbb{R}^{I\times J}</math> similar to ''M'' with predefined marginals, meaning
| |
| | |
| : <math>\hat{a}_{i+} = \sum_{j=1}^n \hat{a}_{ij} = u_i</math>
| |
| | |
| and
| |
| | |
| : <math>\hat{a}_{+j} = \sum_{i=1}^m \hat{a}_{ij} = v_j</math>
| |
| | |
| Define the diagonalization operator <math>diag: \mathbb{R}^k \longrightarrow \mathbb{R}^{k\times k}</math>, which produces a (diagonal) matrix with its input vector on the main diagonal and zero elsewhere. Then, for <math>\eta \geq 0</math>, set
| |
| | |
| : <math>M^{(2\eta + 1)} = \text{diag}(r^{(\eta+1)})M^{(2\eta)}</math>
| |
| | |
| : <math>M^{(2\eta + 2)} = M^{(2\eta+1)}\text{diag}(s^{(\eta+1)})</math>
| |
| | |
| where
| |
| | |
| : <math>r_i^{\eta + 1} = \frac{u_i}{\sum_j m_{ij}^{(2\eta)}}</math>
| |
| | |
| : <math>s_j^{\eta + 1} = \frac{v_j}{\sum_i m_{ij}^{(2\eta+1)}}</math>
| |
| | |
| Finally, we obtain <math>\hat{M} = \lim_{\eta\rightarrow\infty} M^{(\eta)}.</math>
| |
| | |
| == Discussion and comparison of the algorithms ==
| |
| | |
| Although RAS seems to be the solution of an entirely different problem, it is indeed identical to the classical IPFP. In practice,
| |
| one would not implement actual matrix multiplication, since diagonal matrices are involved. Reducing the operations to the necessary ones,
| |
| it can easily be seen that RAS does the same as IPFP. The vaguely demanded 'similarity' can be explained as follows: IPFP (and thus RAS)
| |
| maintains the crossproduct ratios, e.i.
| |
| | |
| : <math>\frac{m^{(0)}_{ij}m^{(0)}_{hk}}{m^{(0)}_{ik}m^{(0)}_{hj}} = \frac{m^{(\eta)}_{ij}m^{(\eta)}_{hk}}{m^{(\eta)}_{ik}m^{(\eta)}_{hj}}\ \forall\ \eta \geq 0\text{ and }i\neq h,\quad j\neq k</math>
| |
| | |
| since <math>m^{(\eta)}_{ij} = a_i^{(\eta)}b_j^{(\eta)}.</math>
| |
| | |
| This property is sometimes called '''structure conservation''' and directly leads to the geometrical interpretation of contingency tables and the proof of convergence in the seminal paper of Fienberg (1970).
| |
| | |
| Nevertheless, direct factor estimation (algorithm 2) is under all circumstances the best way to deal with IPF: Whereas classical IPFP needs
| |
| | |
| : <math>IJ(2+J) + IJ(2+I) = I^2J + IJ^2 + 4IJ \, </math>
| |
| | |
| elementary operations in each iteration step (including a row and a column fitting step), factor estimation needs only
| |
| | |
| : <math>I(1+J) + J(1+I) = 2IJ + I + J \, </math>
| |
| | |
| operations being at least one order in magnitude faster than classical IPFP.
| |
| | |
| == Existence and uniqueness of MLEs ==
| |
| | |
| Necessary and sufficient conditions for the existence and uniqueness of MLEs are complicated in the general case (see<ref>{{cite book |title=The Analysis of Frequency Data |last=Haberman |first=S. J.|year=1974 |publisher=Univ. Chicago Press|isbn=978-0-226-31184-5}}</ref>), but sufficient conditions for 2-dimensional tables are simple:
| |
| | |
| * the marginals of the observed table do not vanish (that is, <math>x_{i+} > 0,\ x_{+j} > 0</math>) and
| |
| * the observed table is inseparable (e.i. the table does not permute to a block-diagonal shape).
| |
| | |
| If unique MLEs exist, IPFP exhibits linear convergence in the worst case (Fienberg 1970), but exponential convergence has also been observed (Pukelsheim and Simeone 2009). If a direct estimator (i.e. a closed form of <math>(\hat{m}_{ij})</math>) exists, IPFP converges after 2 iterations. If unique MLEs do not exist, IPFP converges toward the so-called ''extended MLEs'' by design (Haberman 1974), but convergence may be arbitrarily slow and often computationally infeasible.
| |
| | |
| If all observed values are strictly positive, existence and uniqueness of MLEs and therefore convergence is ensured.
| |
| | |
| == Goodness of fit ==
| |
| | |
| Checking if the assumption of independence is adequate, one uses the [[Pearson X-squared statistic]]
| |
| | |
| : <math>X^2 = \sum_{i,j}\frac{(x_{ij}-\hat{m_{ij}})^2}{\hat{m_{ij}}}</math> | |
| | |
| or alternatively the [[likelihood-ratio test]] ([[G-test]]) statistic
| |
| | |
| : <math>G = 2\sum_{i,j} x_{ij}\log\ \frac{x_{ij}}{\hat{m}_{ij}}</math>.
| |
| | |
| Both statistics are asymptotically <math>\Chi^2_r</math>-distributed, where <math>r = (I-1)(J-1)</math> is the number of degrees of freedom.
| |
| That is, if the [[p-value]]s <math>1 - \Chi^2_r(X^2)</math> and <math>1 - \Chi^2_r(G)</math> are not too small (> 0.05 for instance), there is no indication to discard the hypothesis of independence.
| |
| | |
| == Interpretation ==
| |
| | |
| If the rows correspond to different values of property A, and the columns correspond to different values of property B, and the hypothesis of independence is not discarded, the properties A and B are considered independent.
| |
| | |
| == Example ==
| |
| | |
| Consider a table of observations (taken from the entry on [[contingency table]]s):
| |
| | |
| <center>
| |
| {| class="wikitable"
| |
| |-----
| |
| |
| |
| || right-handed || left-handed || TOTAL
| |
| |-----
| |
| | male || 43 || 9 || 52
| |
| |-----
| |
| | female || 44 || 4 || 48
| |
| |-----
| |
| | TOTAL || 87 || 13 || 100
| |
| |}</center>
| |
| | |
| For executing the classical IPFP, we first initialize the matrix with ones, leaving the marginals untouched:
| |
| | |
| <center>
| |
| {| class="wikitable"
| |
| |-----
| |
| |
| |
| || right-handed || left-handed || TOTAL
| |
| |-----
| |
| | male || 1 || 1 || 52
| |
| |-----
| |
| | female || 1 || 1 || 48
| |
| |-----
| |
| | TOTAL || 87 || 13 || 100
| |
| |}</center>
| |
| | |
| Of course, the marginal sums do not correspond to the matrix anymore, but this is fixed in the next two iterations of IPFP. The first iteration deals with the row sums:
| |
| | |
| <center>
| |
| {| class="wikitable"
| |
| |-----
| |
| |
| |
| || right-handed || left-handed || TOTAL
| |
| |-----
| |
| | male || 26 || 26 || 52
| |
| |-----
| |
| | female || 24 || 24 || 48
| |
| |-----
| |
| | TOTAL || 87 || 13 || 100
| |
| |}</center>
| |
| | |
| Note that, by definition, the row sums always constitute a perfect match after odd iterations, as do the column sums for even ones. The subsequent iteration updates the matrix column-wise:
| |
| | |
| <center>
| |
| {| class="wikitable"
| |
| |-----
| |
| |
| |
| || right-handed || left-handed || TOTAL
| |
| |-----
| |
| | male || 45.24 || 6.76 || 52
| |
| |-----
| |
| | female || 41.76 || 6.24 || 48
| |
| |-----
| |
| | TOTAL || 87 || 13 || 100
| |
| |}</center>
| |
| | |
| Now, both row and column sums of the matrix match the given marginals again.
| |
| | |
| The [[p-value]] of this matrix approximates to <math>p(X^2) \approx 0.1824671</math>, meaning: gender and left-handedness/right-handedness can be considered independent.
| |
| | |
| == Notes ==
| |
| {{reflist}}
| |
| | |
| {{DEFAULTSORT:Iterative Proportional Fitting}}
| |
| [[Category:Categorical data]]
| |
| [[Category:Statistical algorithms]]
| |