Venturi flume

My name is Jestine (34 years old) and my hobbies are Origami and Microscopy.

Here is my web site; http://Www.hostgator1centcoupon.info/ (support.file1.com)

For a class of predicates $H$ defined on a set $X$ and a set of samples $x = (x_{1}, x_{2}, \dots, x_{m})$ , where $x_{i} \in X$ , the empirical frequency of $h \in H$ on $x$ is $\hat{Q_{x}} (h) = \frac{1}{m} | {i : 1 \leq i \leq m, h (x_{i}) = 1} |$ . The Uniform Convergence Theorem states, roughly,that if $H$ is "simple" and we draw samples independently (with replacement) from $X$ according to a distribution $P$ , then with high probability all the empirical frequency will be close to its expectation, where the expectation is given by $Q_{P} (h) = P {y \in X : h (y) = 1}$ . Here "simple" means that the Vapnik-Chernovenkis dimension of the class $H$ is small relative to the size of the sample.
In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.

Uniform convergence theorem statement^[1]

If $H$ is a set of ${0, 1}$ -valued functions defined on a set $X$ and $P$ is a probability distribution on $X$ then for $ϵ > 0$ and $m$ a positive integer, we have,

P^{m} {| Q_{P} (h) - \hat{Q_{x}} (h) | \geq ϵ

for some

h \in H} \leq 4 Π_{H} (2 m) e^{- \frac{ϵ^{2} m}{8}} .

where, for any $x \in X^{m}$ ,

Q_{P} (h) = P {(y \in X : h (y) = 1}

,

\hat{Q_{x}} (h) = \frac{1}{m} | {i : 1 \leq i \leq m, h (x_{i}) = 1} |

and

| x | = m

.

P^{m}

indicates that the probability is taken over

x

consisting of

m

i.i.d. draws from the distribution

P

.

$Π_{H}$ is defined as: For any ${0, 1}$ -valued functions $H$ over $X$ and $D \subseteq X$ ,

Π_{H} (D) = {h \cap D : h \in H}

.

And for any natural number $m$ the shattering number $Π_{H} (m)$ is defined as.

Π_{H} (m) = m a x | {h \cap D : | D | = m, h \in H} |

.

From the point of Learning Theory one can consider $H$ to be the Concept/Hypothesis class defined over the instance set $X$ . Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.

Sauer–Shelah lemma

The Sauer–Shelah lemma^[2] relates the shattering number $Π_{h} (m)$ to the VC Dimension.

Lemma: $Π_{H} (m) \leq {(\frac{e m}{d})}^{d}$ , where $d$ is the VC Dimension of the concept class $H$ .

Corollary: $Π_{H} (m) \leq m^{d}$ .

Proof of uniform convergence theorem ^[1]

Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.

Symmetrization: We transform the problem of analyzing $| Q_{P} (h) - {\hat{Q}}_{x} (h) | \geq ϵ$ into the problem of analyzing $| {\hat{Q}}_{r} (h) - {\hat{Q}}_{s} (h) | \geq ϵ / 2$ , where $r$ and $s$ are i.i.d samples of size $m$ drawn according to the distribution $P$ . One can view $r$ as the original randomly drawn sample of length $m$ , while $s$ may be thought as the testing sample which is used to estimate $Q_{P} (h)$ .
Permutation: Since $r$ and $s$ are picked identically and independently, so swapping elements between them will not change the probability distribution on $r$ and $s$ . So, we will try to bound the probability of $| {\hat{Q}}_{r} (h) - {\hat{Q}}_{s} (h) | \geq ϵ / 2$ for some $h \in H$ by considering the effect of a specific collection of permutations of the joint sample $x = r | | s$ . Specifically, we consider permutations $σ (x)$ which swap $x_{i}$ and $x_{m + i}$ in some subset of $1, 2, . . ., m$ . The symbol $r | | s$ means the concatenation of $r$ and $s$ .
Reduction to a finite class: We can now restrict the function class $H$ to a fixed joint sample and hence, if $H$ has finite VC Dimension, it reduces to the problem to one involving a finite function class.

We present the technical details of the proof.

Symmetrization

Lemma: Let $V = {x \in X^{m} : | Q_{P} (h) - \hat{Q_{x}} (h) | \geq ϵ$ for some $h \in H}$ and

R = {(r, s) \in X^{m} \times X^{m} : | \hat{Q_{r}} (h) - \hat{Q_{s}} (h) | \geq ϵ / 2

for some

h \in H}

.

Then for $m \geq \frac{2}{ϵ^{2}}$ , $P^{m} (V) \leq 2 P^{2 m} (R)$ .

Proof: By the triangle inequality,
if $| Q_{P} (h) - \hat{Q_{r}} (h) | \geq ϵ$ and $| Q_{P} (h) - \hat{Q_{s}} (h) | \leq ϵ / 2$ then $| \hat{Q_{r}} (h) - \hat{Q_{s}} (h) | \geq ϵ / 2$ .
Therefore,

P^{2 m} (R) \geq P^{2 m} {\exists h \in H, | Q_{P} (h) - \hat{Q_{r}} (h) | \geq ϵ

and

| Q_{P} (h) - \hat{Q_{s}} (h) | \leq ϵ / 2}

= \int_{V} P^{m} {s : \exists h \in H, | Q_{P} (h) - \hat{Q_{r}} (h) | \geq ϵ

and

| Q_{P} (h) - \hat{Q_{s}} (h) | \leq ϵ / 2} d P^{m} (r) = A

[since

r

and

s

are independent].

Now for $r \in V$ fix an $h \in H$ such that $| Q_{P} (h) - \hat{Q_{r}} (h) | \geq ϵ$ . For this $h$ , we shall show that

P^{m} {| Q_{P} (h) - \hat{Q_{s}} (h) | \leq \frac{ϵ}{2}} \geq \frac{1}{2}

.

Thus for any $r \in V$ , $A \geq \frac{P^{m} (V)}{2}$ and hence $P^{2 m} (R) \geq \frac{P^{m} (V)}{2}$ . And hence we perform the first step of our high level idea.
Notice, $m \cdot \hat{Q_{s}} (h)$ is a binomial random variable with expectation $m \cdot Q_{P} (h)$ and variance $m \cdot Q_{P} (h) (1 - Q_{P} (h))$ . By Chebyshev's inequality we get,

P^{m} {| Q_{P} (h) - \hat{Q_{s} (h)} | > \frac{ϵ}{2}} \leq \frac{m \cdot Q_{P} (h) (1 - Q_{P} (h))}{(ϵ m / 2)^{2}} \leq \frac{1}{ϵ^{2} m} \leq \frac{1}{2}

for the mentioned bound on

m

. Here we use the fact that

x (1 - x) \leq 1 / 4

for

x

.

Permutations

Let $Γ_{m}$ be the set of all permutations of ${1, 2, 3, \dots, 2 m}$ that swaps $i$ and $m + i$ $\forall i$ in some subset of ${1, 2, 3, . . ., 2 m}$ .

Lemma: Let $R$ be any subset of $X^{2 m}$ and $P$ any probability distribution on $X$ . Then,

P^{2 m} (R) = E [P r [σ (x) \in R]] \leq m a x_{x \in X^{2 m}} (P r [σ (x) \in R]),

where the expectation is over $x$ chosen according to $P^{2 m}$ , and the probability is over $σ$ chosen uniformly from $Γ_{m}$ .

Proof: For any $σ \in Γ_{m},$

P^{2 m} (R) = P^{2 m} {x : σ (x) \in R}

[since coordinate permutations preserve the product distribution

P^{2 m}

].

∴ P^{2 m} (R) = \int_{X^{2 m}} 1_{R} (x) d P^{2 m} (x)

= \frac{1}{| Γ_{m} |} \sum_{σ \in Γ_{m}} \int_{X^{2 m}} 1_{R} (σ (x)) d P^{2 m} (x)

= \int_{X^{2 m}} \frac{1}{| Γ_{m} |} \sum_{σ \in Γ_{m}} 1_{R} (σ (x)) d P^{2 m} (x)

[Because

| Γ_{m} |

is finite]

= \int_{X^{2 m}} P r [σ (x) \in R] d P^{2 m} (x)

[The expectation]

\leq m a x_{x \in X^{2 m}} (P r [σ (x) \in R])

.

The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.

Reduction to a finite class

Lemma: Basing on the previous lemma,

m a x_{x \in X^{2 m}} (P r [σ (x) \in R]) \leq 4 Π_{H} (2 m) e^{- \frac{ϵ^{2} m}{8}}

.

Proof: Let us define $x = (x_{1}, x_{2}, . . ., x_{2 m})$ and $t = | H |_{x} |$ which is atmost $Π_{H} (2 m)$ . This means there are functions $h_{1}, h_{2}, . . ., h_{t} \in H$ such that for any $h \in H, \exists i$ between $1$ and $t$ with $h_{i} (x_{k}) = h (x_{k})$ for $1 \leq k \leq 2 m$ .
We see that $σ (x) \in R$ iff for some $h$ in $H$ satisfies, $| \frac{1}{m} | {1 \leq i \leq m : h (x_{σ_{i}}) = 1} | - \frac{1}{m} | {m + 1 \leq i \leq 2 m : h (x_{σ_{i}}) = 1} | | \geq \frac{ϵ}{2}$ . Hence if we define $w_{i}^{j} = 1$ if $h_{j} (x_{i}) = 1$ and $w_{i}^{j} = 0$ otherwise.
For $1 \leq i \leq m$ and $1 \leq j \leq t$ , we have that $σ (x) \in R$ iff for some $j$ in $1, . . ., t$ satisfies $| \frac{1}{m} (\sum_{i} w_{σ (i)}^{j} - \sum_{i} w_{σ (m + i)}^{j}) | \geq \frac{ϵ}{2}$ . By union bound we get,

P r [σ (x) \in R] \leq t \cdot m a x (P r [| \frac{1}{m} (\sum_{i} w_{σ_{i}}^{j} - \sum_{i} w_{σ_{m + i}}^{j}) | \geq \frac{ϵ}{2}])

\leq Π_{H} (2 m) \cdot m a x (P r [| \frac{1}{m} (\sum_{i} w_{σ_{i}}^{j} - \sum_{i} w_{σ_{m + i}}^{j}) | \geq \frac{ϵ}{2}])

.

Since, the distribution over the permutations $σ$ is uniform for each $i$ , so $w_{σ_{i}}^{j} - w_{σ_{m + i}}^{j}$ equals $\pm | w_{i}^{j} - w_{m + i}^{j} |$ , with equal probability.
Thus,

P r [| \frac{1}{m} (\sum_{i} (w_{σ_{i}}^{j} - w_{σ_{m + i}}^{j})) | \geq \frac{ϵ}{2}] = P r [| \frac{1}{m} (\sum_{i} | w_{i}^{j} - w_{m + i}^{j} | β_{i}) | \geq \frac{ϵ}{2}]

,

where the probability on the right is over $β_{i}$ and both the possibilities are equally likely. By Hoeffding's inequality, this is at most $2 e^{- \frac{m ϵ^{2}}{8}}$ .

Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

[books.google.com-1] 1.0 ^1.1 Martin Anthony Peter,l.Bartlett. Neural Network Learning: Theoretical Foundations,Pages-46-50.First Edition,1999.Cambridge University Press, ISBN 0-521-57353-X

[2] Sham Kakade and Ambuj Tewari,CMSC 35900 (Spring 2008) Learning Theory,Lecture 11

[1]

[2]

Venturi flume

Contents

Uniform convergence theorem statement^[1]

Sauer–Shelah lemma

Proof of uniform convergence theorem ^[1]

Symmetrization

Permutations

Reduction to a finite class

References

Navigation menu

Venturi flume

Uniform convergence theorem statement[1]

Sauer–Shelah lemma

Proof of uniform convergence theorem [1]

Symmetrization

Permutations

Reduction to a finite class

References

Navigation menu

Search

Uniform convergence theorem statement^[1]

Proof of uniform convergence theorem ^[1]