====== Convolutional neural network ====== Author: Yann Lecun * Multiple copies of same neuron, same activation function, weights and biases * Only connected to some local neurons, instead of full connection * Automatically do Feature Engineering * Kernel: E.g. element-wise multiplication and summation * Architecture: * Input -> Convolutional layer -> RELU -> Pooling (dim red) -> Fully connected layer * Drawback: Large data set needed Applications: * Photo tagging ===== Convolution operation ===== Example with zero padding: Given x[i] = [6,2] h[i] = [1,2,5,4] With zero padding, and inverted filter x (otherwise operation would be cross-correlation). [2 6] | | V V 0 [1 2 5 4] = 2 * 0 + 6 * 1 = 6 Second step: [2 6] | | V V 0 [1 2 5 4] = 2 * 1 + 6 * 2 = 14 (the arrows represent the connection between the kernel and the input) Third step: [2 6] | | V V 0 [1 2 5 4] = 2 * 2 + 6 * 5 = 34 Fourth step: [2 6] | | V V 0 [1 2 5 4] = 2 * 5 + 6 * 4 = 34 Fifth step: [2 6] | | V V 0 [1 2 5 4] 0 = 2 * 4 + 6 * 0 = 8 The result of the convolution for this case, listing all the steps, would then be: Y = [6 14 34 34 8] ==== Result size ==== $n \times n$ image, $f \times f$ kernel, $n-f+1 \times n-f+1$ result (Valid padding = no padding) With padding: For padding $p$: $n+2p-f+1 \times n+2p-f+1$ Same padding $p=(f-1)/2$ === With strides === $(n+2p-f)/s + 1 \times (n+2p-f)/s + 1 $ === With Volumes: === $ 6 \times 6 \times 3$ * $3 \times 3 \times 3$ = $4 \times 4$ $n-f+1 \times n-f+1 \times n_c'$ Number of filters $n_c'$ Use many filters, to detect multiple features ==== In Python with scipy ==== # full method np.convolve(x,h,"full") array([ 6, 14, 34, 34, 8]) # same method np.convolve(x,h,"same") #no zero padding at end array([ 6, 14, 34, 34]) # valid method np.convolve(x,h,"valid") #no zero padding array([14, 34, 34]) ==== In tensor flow ==== * 3x3 filter (4D tensor = [3,3,1,1] = [width, height, channels, number of filters]) * 10x10 image (4D tensor = [1,10,10,1] = [batch size, width, height, number of channels] * The output size for zero padding 'SAME' mode will be same as input = 10x10 * The output size without zero padding 'VALID' mode: input size - kernel dimension +1 = 10 -3 + 1 = 8 = 8x8 import tensorflow as tf #Building graph input = tf.Variable(tf.random_normal([1,10,10,1])) filter = tf.Variable(tf.random_normal([3,3,1,1])) op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID') op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME') #Initialization and session init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) print("Input \n") print('{0} \n'.format(input.eval())) print("Filter/Kernel \n") print('{0} \n'.format(filter.eval())) print("Result/Feature Map with valid positions \n") result = sess.run(op) print(result) print('\n') print("Result/Feature Map with padding \n") result2 = sess.run(op2) print(result2) ===== Max Pooling ===== Fixed hyper-parameters: * Filter size f * Stride s Typical values: $f=2, s=2$ Usually no padding is used. Channel is the same (depth) ===== Average Pooling ===== In deep networks 7x7x1000 => 1x1x1000 ===== Winning competitions ===== * Ensembling of outputs * Multi-crop at test-time: Run classifier on multiple versions of test images (cropped, mirrored, ...) and average results