Perceptron convergence theorem

Making the following assumptions:

From (4) we can write:

\begin{displaymath}
\ve{w}_{n+1} = \ve{w}_n + \ve{x}_n
\end{displaymath} (5)

And solving the recurrent equation:

\begin{displaymath}
\ve{w}_{n+1} = \sum_{k=1}^n \ve{x}_n
\end{displaymath} (6)

Because of the linearly separable assumption, $\exists \quad \ve{w}_o \quad \vert \quad \ve{w}_o^t\ve{x}_n > 0 \quad \forall n$ then we can define:

\begin{displaymath}
\alpha = \min_{\ve{x}_n \in C_1} \ve{w}_o^t\ve{x}_n
\end{displaymath} (7)

Multiplying both sides of (7) by $\ve{w}_o^t$:

\begin{displaymath}
\ve{w}_o^t \ve{w}_{n+1} = \ve{w}_o^t \sum_{k=0}^n \ve{x}_n
\end{displaymath} (8)

And applying (8):

\begin{displaymath}
\ve{w}_o^t \ve{w}_{n+1} \geq n\alpha
\end{displaymath} (9)

Applying Cauchy-Schwarz:

\begin{displaymath}
\vert\vert\ve{w}_o\vert\vert^2\vert\vert\ve{w}_{n+1}\vert\vert^2 \geq (\ve{w}_o^t \ve{w}_{n+1})^2
\end{displaymath} (10)


\begin{displaymath}
\vert\vert\ve{w}_{n+1}\vert\vert^2 \geq \frac{(n\alpha)^2}{\vert\vert\ve{w}_o\vert\vert^2}
\end{displaymath} (11)

From (6): ||w_k+1||^2 = ||w_k||^2 + ||x_k||^2 + 2w_k^tx_k         for         k=1,,n         and     x_n C_1

||w_k+1||^2 ||w_k||^2 + ||x_k||^2

||w_k+1||^2 - ||x_k||^2 ||w_k||^2         for         k=1,,n

Defining $\beta$ as: = _x_k C_1 ||x_k||^2

||w_k+1||^2 _k=1^n ||x_k||^2 n

So the squared norm of the weight vector grows at most linearly with the number of iterations $n$, which for large $n$ conflicts with (12), so to satisfy both (12) and (17):

(n_max)^2||w_o||^2 = n_max

n_max = ||w_o||^2^2

The adaptation of the perceptron thus, shall terminate after $n_{max}$ steps. A variation consists in presenting the same pattern to then network, until $\ve{w}_{n+1}^t\ve{x}_n$ has the correct sign. The choice of $\ve{w}_o$ just decreases or increases the number of steps required to converge.

Pedro Larroy 2005-04-29