Nagata–Smirnov metrization theorem: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Suslindisambiguator
No edit summary
 
en>EmausBot
m Bot: Migrating 3 interwiki links, now provided by Wikidata on d:Q2621667
Line 1: Line 1:
The author is known as Irwin Wunder but it's not the most masucline name out there. It's not a typical thing but what she likes doing is foundation jumping and now she is attempting to earn money with it. For years he's been operating as a receptionist. North Dakota is where me and my spouse reside.<br><br>Here is my blog post ... [http://Xow.me/dietmealdelivery74619 http://Xow.me/]
{{multiple issues|
{{disputed|date=December 2012}}
{{Refimprove|date=November 2012}}
{{confusing|date=September 2012}}
}}
 
In [[machine learning]], the '''delta rule''' is a [[gradient descent]] learning rule for updating the weights of the inputs to [[artificial neurons]] in [[Feedforward neural network#Single-layer_perceptron|single-layer neural network]].<ref>{{cite web|last=Russell|first=Ingrid|title=The Delta Rule|url=http://uhavax.hartford.edu/compsci/neural-networks-delta-rule.html|publisher=University of Hartford|accessdate=5 November 2012}}</ref>  It is a special case of the more general [[backpropagation]] algorithm. For a neuron <math>j \,</math> with [[activation function]] <math>g(x) \,</math>, the delta rule for <math>j \,</math>'s <math>i \,</math>th weight <math>w_{ji} \,</math> is given by
 
:<math>\Delta w_{ji}=\alpha(t_j-y_j) g'(h_j) x_i  \,</math>,
 
where
 
{| cellpadding="2"
|
| <math>\alpha \,</math> is a small constant called ''learning rate''
|-
|
| <math>g(x) \,</math> is the neuron's activation function
|-
|
| <math>t_j \,</math> is the target output
|-
|
| <math>h_j \,</math> is the weighted sum of the neuron's inputs
|-
|
| <math>y_j \,</math> is the actual output
|-
|
| <math>x_i \,</math> is the <math>i \,</math>th input.
|}
 
It holds that <math>h_j=\sum x_i w_{ji} \,</math> and <math>y_j=g(h_j) \,</math>.
 
The delta rule is commonly stated in simplified form for a neuron with a linear activation function as
 
:<math>\Delta w_{ji}=\alpha(t_j-y_j) x_i  \,</math>
 
It should be noted that while the delta rule is similar to the [[perceptron]]'s update rule, the derivation is different. The perceptron uses the [[Heaviside step function]] as the activation function <math>g(h)</math>, and that means that <math>g'(h)</math> does not exist at zero, and is equal to zero elsewhere, which makes the direct application of the delta rule impossible.
 
==Derivation of the delta rule==
The delta rule is derived by attempting to minimize the error in the output of the neural network through [[gradient descent]]. The error for a neural network with <math>j \,</math> outputs can be measured as
 
:<math>E=\sum_{j} \frac{1}{2}(t_j-y_j)^2 \,</math>.  
 
In this case, we wish to move through "weight space" of the neuron (the space of all possible values of all of the neuron's weights) in proportion to the gradient of the error function with respect to each weight. In order to do that, we calculate the [[partial derivative]] of the error with respect to each weight. For the <math>i \,</math>th weight, this derivative can be written as  
 
:<math>\frac{\partial E}{ \partial w_{ji} } \,</math>.
 
Because we are only concerning ourselves with the <math>j \,</math>th neuron, we can substitute the error formula above while omitting the summation:
 
:<math>\frac{\partial E}{ \partial w_{ji} } = \frac{ \partial \left ( \frac{1}{2} \left( t_j-y_j \right ) ^2 \right ) }{ \partial w_{ji} } \,</math>
 
Next we use the [[chain rule]] to split this into two derivatives:
 
:<math>= \frac{ \partial \left ( \frac{1}{2} \left( t_j-y_j \right ) ^2 \right ) }{ \partial y_j } \frac{ \partial y_j }{ \partial w_{ji} } \,</math>
 
To find the left derivative, we simply apply the general [[power rule]]:
 
:<math>= - \left ( t_j-y_j \right ) \frac{ \partial y_j }{ \partial w_{ji} } \,</math>
 
To find the right derivative, we again apply the chain rule, this time differentiating with respect to the total input to <math>j \,</math>, <math>h_j \,</math>:
 
:<math>= - \left ( t_j-y_j \right ) \frac{ \partial y_j }{ \partial h_j } \frac{ \partial h_j }{ \partial w_{ji} } \,</math>
 
Note that the output of the neuron <math>y_j \,</math> is just the neuron's activation function <math>g \,</math> applied to the neuron's input <math>h_j \,</math>. We can therefore write the derivative of <math>y_j \,</math> with respect to <math>h_j \,</math> simply as <math>g \,</math>'s first derivative:
 
:<math>= - \left ( t_j-y_j \right ) g'(h_j) \frac{ \partial h_j }{ \partial w_{ji} } \,</math>
 
Next we rewrite <math>h_j \,</math> in the last term as the sum over all <math>k \,</math> weights of each weight <math>w_{jk} \,</math> times its corresponding input <math>x_k \,</math>:
 
:<math>= - \left ( t_j-y_j \right ) g'(h_j) \frac{ \partial \left ( \sum_{k} x_k w_{jk} \right ) }{ \partial w_{ji} } \,</math>
 
Because we are only concerned with the <math>i \,</math>th weight, the only term of the summation that is relevant is <math>x_i w_{ji} \,</math>. Clearly,
 
:<math>\frac{ \partial x_i w_{ji} }{ \partial w_{ji} }=x_i \,</math>,
 
giving us our final equation for the gradient:
 
:<math>\frac{\partial E}{ \partial w_{ji} } = - \left ( t_j-y_j \right ) g'(h_j) x_i \,</math>
 
As noted above, gradient descent tells us that our change for each weight should be proportional to the gradient. Choosing a proportionality constant <math>\alpha \,</math> and eliminating the minus sign to enable us to move the weight in the negative direction of the gradient to minimize error, we arrive at our target equation:
 
:<math>\Delta w_{ji}=\alpha(t_j-y_j) g'(h_j) x_i \,</math>.
 
==See also==
* [[Stochastic gradient descent]]
 
* [[Backpropagation]]
 
==References==
{{Reflist}}
 
{{DEFAULTSORT:Delta Rule}}
[[Category:Neural networks]]
 
[[de:LMS-Algorithmus]]

Revision as of 07:00, 6 May 2013

Template:Multiple issues

In machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in single-layer neural network.[1] It is a special case of the more general backpropagation algorithm. For a neuron j with activation function g(x), the delta rule for j's ith weight wji is given by

Δwji=α(tjyj)g(hj)xi,

where

α is a small constant called learning rate
g(x) is the neuron's activation function
tj is the target output
hj is the weighted sum of the neuron's inputs
yj is the actual output
xi is the ith input.

It holds that hj=xiwji and yj=g(hj).

The delta rule is commonly stated in simplified form for a neuron with a linear activation function as

Δwji=α(tjyj)xi

It should be noted that while the delta rule is similar to the perceptron's update rule, the derivation is different. The perceptron uses the Heaviside step function as the activation function g(h), and that means that g(h) does not exist at zero, and is equal to zero elsewhere, which makes the direct application of the delta rule impossible.

Derivation of the delta rule

The delta rule is derived by attempting to minimize the error in the output of the neural network through gradient descent. The error for a neural network with j outputs can be measured as

E=j12(tjyj)2.

In this case, we wish to move through "weight space" of the neuron (the space of all possible values of all of the neuron's weights) in proportion to the gradient of the error function with respect to each weight. In order to do that, we calculate the partial derivative of the error with respect to each weight. For the ith weight, this derivative can be written as

Ewji.

Because we are only concerning ourselves with the jth neuron, we can substitute the error formula above while omitting the summation:

Ewji=(12(tjyj)2)wji

Next we use the chain rule to split this into two derivatives:

=(12(tjyj)2)yjyjwji

To find the left derivative, we simply apply the general power rule:

=(tjyj)yjwji

To find the right derivative, we again apply the chain rule, this time differentiating with respect to the total input to j, hj:

=(tjyj)yjhjhjwji

Note that the output of the neuron yj is just the neuron's activation function g applied to the neuron's input hj. We can therefore write the derivative of yj with respect to hj simply as g's first derivative:

=(tjyj)g(hj)hjwji

Next we rewrite hj in the last term as the sum over all k weights of each weight wjk times its corresponding input xk:

=(tjyj)g(hj)(kxkwjk)wji

Because we are only concerned with the ith weight, the only term of the summation that is relevant is xiwji. Clearly,

xiwjiwji=xi,

giving us our final equation for the gradient:

Ewji=(tjyj)g(hj)xi

As noted above, gradient descent tells us that our change for each weight should be proportional to the gradient. Choosing a proportionality constant α and eliminating the minus sign to enable us to move the weight in the negative direction of the gradient to minimize error, we arrive at our target equation:

Δwji=α(tjyj)g(hj)xi.

See also

References

43 year old Petroleum Engineer Harry from Deep River, usually spends time with hobbies and interests like renting movies, property developers in singapore new condominium and vehicle racing. Constantly enjoys going to destinations like Camino Real de Tierra Adentro.

de:LMS-Algorithmus