Please disable your adblock and script blockers to view this page

Understanding Convolutions (2014)

& W_{0,1} & W_{0,2
W_{1,0} &
& W_{3,1
w_0 & w_1 &
0 &
0 & w_0 & w_1

\sigma(W_{1,0}x_0 + W_{1,1}x_1 + W_{1,2}x_2
Eliana Lorch
Michael Nielsen
Dario Amodei
~~=~ \sum_{a+(b+c)=d

\sigma(W_0x_0 + W_1x_1 -b)\]\[y_1




No matching tags

Positivity     58.00%   
   Negativity   42.00%
The New York Times
Write a review: Hacker News

It could go any \(a\) and \(b\), as long as they add to 3.The probabilities are \(f(1) \cdot g(2)\) and \(f(0) \cdot g(3)\), respectively.In order to find the total likelihood of the ball reaching a total distance of \(c\), we can’t consider only one possible way of reaching \(c\). So, summing over every solution to \(a+b=c\), we can denote the total likelihood as:\[\sum_{a+b=c} f(a) \cdot g(b)\]Turns out, we’re doing a convolution! In particular, the convolution of \(f\) and \(g\), evluated at \(c\) is defined:\[(f\ast g)(c) = \sum_{a+b=c} f(a) \cdot g(b)~~~~\]If we substitute \(b = c-a\), we get:\[(f\ast g)(c) = \sum_a f(a) \cdot g(c-a)\]This is the standard definition2 of convolution.To make this a bit more concrete, we can think about this in terms of positions the ball might land. Then, afterwards, the probability that it started a distance \(x\) from where it landed is \(f(-x)\).If we know the ball lands at a position \(c\) after the second drop, what is the probability that the previous position was \(a\)?So the probability that the previous position was \(a\) is \(g(-(a-c)) = g(c-a)\).Now, consider the probability each intermediate position contributes to the ball finally landing at \(c\). Now, as it falls, it’s position shifts not only in one dimension, but in two.Convolution is the same as before:\[(f\ast g)(c) = \sum_{a+b=c} f(a) \cdot g(b)\]Except, now \(a\), \(b\) and \(c\) are vectors. To be more explicit,\[(f\ast g)(c_1, c_2) = \sum_{\begin{array}{c}a_1+b_1=c_1\\a_2+b_2=c_2\end{array}} f(a_1,a_2) \cdot g(b_1,b_2)\]Or in the standard definition:\[(f\ast g)(c_1, c_2) = \sum_{a_1, a_2} f(a_1, a_2) \cdot g(c_1-a_1,~ c_2-a_2)\]Just like one-dimensional convolutions, we can think of a two-dimensional convolution as sliding one function on top of another, multiplying and adding.One common application of this is image processing. On edges, however, adjacent pixels are very different in the direction perpendicular to the edge.The gimp documentation has many other examples.So, how does convolution relate to convolutional neural networks?Consider a 1-dimensional convolutional layer with inputs \(\{x_n\}\) and outputs \(\{y_n\}\), like we discussed in the previous post:As we observed, we can describe the outputs in terms of the inputs:\[y_n = A(x_{n}, x_{n+1}, ...)\]Generally, \(A\) would be multiple neurons.

As said here by