Continuous functional calculus: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Mgkrupa
Added {{Functional Analysis}} footer
en>Brirush
mNo edit summary
 
Line 1: Line 1:
The '''curse of dimensionality''' refers to various phenomena that arise when analyzing and organizing data in [[high-dimensional space]]s (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the [[three-dimensional space|three-dimensional]] [[physical space]] of everyday experience.
Aleta is what's written via her birth certificate albeit she doesn't really these being called like that. Massachusetts should be where he's always lived. Managing consumers is what she delivers in her day workplace but she's always wanted her own [http://search.about.com/?q=business business]. To motivate is something her life partner doesn't really like but yet she does. She is running and sticking to a blog here: http://prometeu.net<br><br>


There are multiple phenomena referred to by this name in domains such as [[numerical analysis]], [[Sampling (statistics)|sampling]], [[combinatorics]], [[machine learning]], [[data mining]] and [[database]]s. The common theme of these problems is that when the dimensionality increases, the [[volume]] of the space increases so fast that the available data becomes sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data however all objects appear to be sparse and dissimilar in many ways which prevents common data organization strategies from being efficient.
Feel free to visit my webpage; [http://prometeu.net clash of clans hack android]
 
The term ''curse of dimensionality'' was coined by [[Richard E. Bellman]] when considering problems in dynamic optimization.<ref>{{Cite book|author1=Richard Ernest Bellman|author2=Rand Corporation|title=Dynamic programming|url=http://books.google.com/books?id=7omhQgAACAAJ|year=1957|publisher=Princeton University Press|isbn=978-0-691-07951-6|postscript=}},<br />Republished: {{Cite book|author=Richard Ernest Bellman|title=Dynamic Programming|url=http://books.google.com/books?id=fyVtp3EMxasC|year=2003|publisher=Courier Dover Publications|isbn=978-0-486-42809-3}}</ref><ref>{{Cite book|author=Richard Ernest Bellman|title=Adaptive control processes: a guided tour|url=http://books.google.com/books?id=POAmAAAAMAAJ|year=1961|publisher=Princeton University Press}}</ref>
 
== The "curse of dimensionality" as open problem ==
 
The "curse of dimensionality" is often used{{Citation needed|date=October 2012}} as a blanket excuse for not dealing with high-dimensional data. However, the effects are not yet completely understood by the scientific community, and there is ongoing research. On one hand, the notion of [[intrinsic dimension]] refers to the fact that any low-dimensional data space can trivially be turned into a higher dimensional space by adding redundant (e.g.&nbsp;duplicate) or randomized dimensions, and in turn many high-dimensional data sets can be reduced to lower dimensional data without significant information loss. This is also reflected by the effectiveness of [[dimension reduction]] methods such as [[principal component analysis]] in many situations. For distance functions and nearest neighbor search, recent research also showed that data sets that exhibit the curse of dimensionality properties can still be processed unless there are too many irrelevant dimensions, while relevant dimensions can make some problems such as [[cluster analysis]] actually easier.<ref name="houle-ssdbm10" /><ref name="houle-sstd11" /> Secondly, methods such as [[Markov chain Monte Carlo]] or shared nearest neighbor methods<ref name="houle-ssdbm10" /> often work very well on data that were considered intractable by other methods due to high dimensionality.
 
== Curse of dimensionality in different domains ==
 
=== Combinatorics ===
In some problems, each variable can take one of several discrete values, or the range of possible values is divided to give a finite number of possibilities.  Taking the variables together, a huge number of combinations of values must be considered. This effect is also known as the [[combinatorial explosion]]. Even in the simplest case of d binary variables, the number of possible combinations already is <math>O(2^d)</math>, exponential in the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations.
 
=== Sampling ===
{{confusing|section|date=December 2013|reason=clarification needed on the hypercube formula. Why the factor of 10<sup>n(10-1)</sup>? In (10-1) does 10 stand for the number of dimensions of the unit hypercube and 1 for the number of dimensions of the unit interval and therefore can be generalized as n(D-d)?}}
There is an exponential increase in [[volume]] associated with adding extra dimensions to a [[Space (mathematics)|mathematical space]].  For example, 10<sup>2</sup>=100 evenly-spaced sample points suffice to sample a [[unit interval]] (a "1-dimensional cube") with no more than 10<sup>-2</sup>=0.01 distance between points; an equivalent sampling of a 10-dimensional [[unit hypercube]] with a lattice that has a spacing of 10<sup>-2</sup>=0.01 between adjacent points would require 10<sup>20</sup> sample points. In general, with a spacing distance of 10<sup>-n</sup> the 10-dimensional hypercube appears to be a factor of 10<sup>n(10-1)</sup> "larger" than the 1-dimensional hypercube, which is the unit interval. In the above example n=2: when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be 10<sup>18</sup> "larger" than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below.
 
=== Optimization ===
When solving dynamic [[optimization (mathematics)|optimization]] problems by numerical [[backward induction]], the objective function must be computed for each combination of values.  This is a significant obstacle when the dimension of the "state variable" is large.
 
=== Machine learning ===
In [[machine learning]] problems that involve learning a "state-of-nature" (maybe an infinite distribution) from a finite number of data samples in a high-dimensional [[feature space]] with each feature having a number of possible values, an enormous amount of training data are required to ensure that there are several samples with each combination of values.  With a fixed number of training samples, the predictive power reduces as the dimensionality increases, and this is known as the ''Hughes effect''<ref>{{cite doi|10.1007/s11004-008-9156-6}}</ref> or ''Hughes phenomenon'' (named after Gordon F. Hughes).<ref>{{cite journal |last=Hughes |first=G.F. |title=On the mean accuracy of statistical pattern recognizers |journal=IEEE Transactions on Information Theory |volume=14 |issue=1 |pages=55–63 |date=January 1968 |doi=10.1109/TIT.1968.1054102 |url=http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1054102}}</ref><ref>Not to be confused with the unrelated, but similarly named, ''Hughes effect in [[electromagnetism]]'' (named after [http://spiedl.aip.org/vsearch/servlet/VerityServlet?KEY=SPIEDL&possible1=Hughes%2C+Declan+C.&possible1zone=author&maxdisp=25&smode=strresults&pjournals=OPEGAR%2CJBOPFO%2CPSISDG%2CJEIME5%2CJMMMGF%2CJARSC4%2CJNOACQ&deliveryType=spiedl&aqs=true Declan C. Hughes]) which refers to an asymmetry in the [[hysteresis]] curves of [[Magnetic core|laminated cores]] made of certain [[magnetic materials]], such as [[permalloy]] or [[mu-metal]], in alternating magnetic fields.</ref>
 
=== Bayesian statistics ===
The curse of dimensionality has often been a difficulty with [[Bayesian statistics]], for which the [[posterior distribution]]s often have many parameters.
 
However, this problem has been largely overcome by the advent of simulation-based Bayesian inference, especially using [[Markov chain Monte Carlo]] methods, which suffices for many practical problems. Of course, simulation-based methods converge slowly and therefore simulation-based methods are not a panacea for high-dimensional problems.
 
=== Distance functions ===
When a measure such as a [[Euclidean distance]] is defined using many coordinates, there is little difference in the distances between different pairs of samples.
 
One way to illustrate the "vastness" of high-dimensional Euclidean space is to compare the proportion of a [[hypersphere]] with radius <math>r</math> and dimension <math>d</math>, to that of a [[hypercube]] with sides of length <math>2r</math>, and equivalent dimension.
The volume of such a sphere is: <math>\frac{2r^d\pi^{d/2}}{d\Gamma(d/2)}</math>
The volume of the cube would be: <math>(2r)^d</math>
As the dimension <math>d</math> of the space increases, the hypersphere becomes an insignificant volume relative to that of the hypercube.  This can clearly be seen by comparing the proportions as the dimension <math>d</math> goes to infinity:
:<math>\frac{\pi^{d/2}}{d2^{d-1}\Gamma(d/2)}\rightarrow 0</math> as <math>d \rightarrow \infty</math>.
 
Thus, in some sense, nearly all of the high-dimensional space is "far away" from the centre, or, to put it another way, the high-dimensional unit space can be said to consist almost entirely of the "corners" of the hypercube, with almost no "middle". This is an important intuition for understanding the [[chi-squared distribution]].{{Why?|date=May 2011}}
 
Given a single distribution, the minimum and the maximum distances become indiscernible as the difference between the minimum and maximum compared to the minimum distance converges to zero:<ref>{{cite doi | 10.1007/3-540-49257-7_15 }}</ref>
:<math>\lim_{d \to \infty} \frac{\operatorname{dist}_\max - \operatorname{dist}_\min}{\operatorname{dist}_\min} \to 0</math>.
 
This is often cited as distance functions losing their usefulness in high dimensionality.
 
=== Nearest neighbor search ===
The effect complicates [[nearest neighbor search]] in high dimensional space.  It is not possible to quickly reject candidates by using the difference in one coordinate as a lower bound for a distance based on all the dimensions.<ref>{{cite journal |first1=R.B. |last1=Marimont |first2=M.B. |last2=Shapiro |title=Nearest Neighbour Searches and the Curse of Dimensionality |journal=IMA J Appl Math |volume=24 |issue=1 |pages=59–70 |year=1979 |doi=10.1093/imamat/24.1.59 |url=http://imamat.oxfordjournals.org/content/24/1/59.short}}</ref><ref>{{cite journal |first1=Edgar |last1=Chávez |first2=Gonzalo |last2=Navarro |first3=Ricardo |last3=Baeza-Yates |first4=José Luis |last4=Marroquín |title=Searching in Metric Spaces |journal=ACM Computing Surveys |volume=33 |issue=3 |pages=273–321 |year=2001 |doi=10.1145/502807.502808 |url=http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.7845&rep=rep1&type=pdf |format=PDF}}</ref>
 
However, recent research indicates that the mere number of dimensions does not necessarily result in problems,<ref name="houle-ssdbm10">{{cite doi | 10.1007/978-3-642-13818-8_34 }}</ref> since relevant additional dimensions can also increase the contrast. In addition, the resulting ranking remains useful to discern close and far neighbors. Irrelevant ("noise") dimensions however reduce the contrast as expected. In [[time series analysis]], where the data are inherently high-dimensional, distance functions also work reliably as long as the [[signal-to-noise ratio]] is high enough.<ref name="houle-sstd11">{{cite doi | 10.1007/978-3-642-22922-0_25 }}</ref>
 
====''k''-nearest neighbor classification====
Another effect of high dimensionality on distance functions concerns ''k''-nearest neighbor (''k''-NN) [[Graph (mathematics)|graphs]] constructed from a [[data set]] using some distance functions. As dimensionality increases, the [[indegree]] distribution of the ''k''-NN [[Directed graph|digraph]] becomes [[Skewness|skewed]] to the right, resulting in the emergence of hubs, as data instances that appear in many more ''k''-NN lists of other instances from the [[data set]] than expected. This phenomenon can have a considerable impact on various techniques for [[Classification (machine learning)|classification]] (including the [[K-nearest neighbor algorithm|''k''-NN classifier]]), [[semi-supervised learning]], and [[Cluster analysis|clustering]],<ref>{{Cite journal
| last1=Radovanović | first1=Miloš
| last2=Nanopoulos | first2=Alexandros
| last3=Ivanović | first3=Mirjana
| year=2010
| title=Hubs in space: Popular nearest neighbors in high-dimensional data
| journal=Journal of Machine Learning Research
| volume=11
| pages=2487–2531
| url=http://www.jmlr.org/papers/volume11/radovanovic10a/radovanovic10a.pdf
| format=PDF
}}</ref> and it also affects [[information retrieval]].<ref>{{Cite doi | 10.1145/1835449.1835482 }}</ref>
 
=== Anomaly detection ===
 
In a recent survey, Zimek et al. identified the following problems when searching for [[anomaly detection|anomalies]] in high-dimensional data:<ref name="survey">{{cite doi | 10.1002/sam.11161}}</ref>
 
# Concentration of scores and distances: derived values such as distances become numerically similar
# Irrelevant attributes: in high dimensional data, a significant amount of attributes may be irrelevant
# Definition of reference sets: for local methods, reference sets are often nearest-neighbor based
# Incomparable scores for different dimensionalities: different subspaces produce incomparable scores
# Interpretability of scores: the scores often no longer convey a semantic meaning
# Exponential search space: the search space can no longer be systematically scanned
# [[Data snooping]] bias: given the large search space, for every desired significance an hypothesis can be found
# Hubness: certain objects occur more frequently in neighbor lists than others.
 
Many of the analyzed specialized methods tackle one or another of these problems, but there remain many open research questions.
 
==See also==
{{Div col|cols=3}}
*[[Bellman equation]]
*[[Backwards induction]]
*[[Cluster analysis]]
*[[Clustering high-dimensional data]]
*[[Combinatorial explosion]]
*[[Concentration of measure]]
*[[Dimension reduction]]
*[[Dynamic programming]]
*[[Fourier-related transforms]]
*[[High-dimensional space]]
*[[Linear least squares (mathematics)|Linear least squares]]
*[[Multilinear principal component analysis|Multilinear PCA]]
*[[Multilinear subspace learning]]
*[[Principal component analysis]]
*[[Quasi-random]]
*[[Singular value decomposition]]
*[[Time series]]
*[[Wavelet]]
{{Div col end}}
 
==References==
{{Reflist}}
 
[[Category:Numerical analysis]]
[[Category:Dynamic programming]]
[[Category:Machine learning]]

Latest revision as of 15:15, 23 November 2014

Aleta is what's written via her birth certificate albeit she doesn't really these being called like that. Massachusetts should be where he's always lived. Managing consumers is what she delivers in her day workplace but she's always wanted her own business. To motivate is something her life partner doesn't really like but yet she does. She is running and sticking to a blog here: http://prometeu.net

Feel free to visit my webpage; clash of clans hack android