Convolutional neural network
Author: Yann Lecun
- Multiple copies of same neuron, same activation function, weights and biases
- Only connected to some local neurons, instead of full connection
- Automatically do Feature Engineering
- Kernel: E.g. element-wise multiplication and summation
- Architecture:
- Input → Convolutional layer → RELU → Pooling (dim red) → Fully connected layer
- Drawback: Large data set needed
Applications:
- Photo tagging
Convolution operation
Example with zero padding:
Given
x[i] = [6,2] h[i] = [1,2,5,4]
With zero padding, and inverted filter x (otherwise operation would be cross-correlation).
[2 6] | | V V 0 [1 2 5 4] = 2 * 0 + 6 * 1 = 6
Second step:
[2 6] | | V V 0 [1 2 5 4] = 2 * 1 + 6 * 2 = 14 (the arrows represent the connection between the kernel and the input)
Third step:
[2 6] | | V V 0 [1 2 5 4] = 2 * 2 + 6 * 5 = 34
Fourth step:
[2 6] | | V V 0 [1 2 5 4] = 2 * 5 + 6 * 4 = 34
Fifth step:
[2 6] | | V V 0 [1 2 5 4] 0 = 2 * 4 + 6 * 0 = 8
The result of the convolution for this case, listing all the steps, would then be: Y = [6 14 34 34 8]
Result size
$n \times n$ image, $f \times f$ kernel, $n-f+1 \times n-f+1$ result (Valid padding = no padding)
With padding:
For padding $p$: $n+2p-f+1 \times n+2p-f+1$
Same padding $p=(f-1)/2$
With strides
$(n+2p-f)/s + 1 \times (n+2p-f)/s + 1 $
With Volumes:
$ 6 \times 6 \times 3$ * $3 \times 3 \times 3$ = $4 \times 4$
$n-f+1 \times n-f+1 \times n_c'$ Number of filters $n_c'$
Use many filters, to detect multiple features
In Python with scipy
# full method np.convolve(x,h,"full") array([ 6, 14, 34, 34, 8]) # same method np.convolve(x,h,"same") #no zero padding at end array([ 6, 14, 34, 34]) # valid method np.convolve(x,h,"valid") #no zero padding array([14, 34, 34])
In tensor flow
- 3×3 filter (4D tensor = [3,3,1,1] = [width, height, channels, number of filters])
- 10×10 image (4D tensor = [1,10,10,1] = [batch size, width, height, number of channels]
- The output size for zero padding 'SAME' mode will be same as input = 10×10
- The output size without zero padding 'VALID' mode: input size - kernel dimension +1 = 10 -3 + 1 = 8 = 8×8
import tensorflow as tf #Building graph input = tf.Variable(tf.random_normal([1,10,10,1])) filter = tf.Variable(tf.random_normal([3,3,1,1])) op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID') op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME') #Initialization and session init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) print("Input \n") print('{0} \n'.format(input.eval())) print("Filter/Kernel \n") print('{0} \n'.format(filter.eval())) print("Result/Feature Map with valid positions \n") result = sess.run(op) print(result) print('\n') print("Result/Feature Map with padding \n") result2 = sess.run(op2) print(result2)
Max Pooling
Fixed hyper-parameters:
- Filter size f
- Stride s
Typical values: $f=2, s=2$
Usually no padding is used.
Channel is the same (depth)
Average Pooling
In deep networks 7x7x1000 ⇒ 1x1x1000
Winning competitions
- Ensembling of outputs
- Multi-crop at test-time: Run classifier on multiple versions of test images (cropped, mirrored, …) and average results