data_mining:neural_network:cnn:cnn

Convolutional neural network

Author: Yann Lecun

  • Multiple copies of same neuron, same activation function, weights and biases
  • Only connected to some local neurons, instead of full connection
  • Automatically do Feature Engineering
  • Kernel: E.g. element-wise multiplication and summation
  • Architecture:
    • Input → Convolutional layer → RELU → Pooling (dim red) → Fully connected layer
  • Drawback: Large data set needed

Applications:

  • Photo tagging

Example with zero padding:

Given

x[i] = [6,2]
h[i] = [1,2,5,4]

With zero padding, and inverted filter x (otherwise operation would be cross-correlation).

[2  6]
 |  |
 V  V
 0 [1 2 5 4]
= 2 * 0 + 6 * 1 = 6

Second step:

  [2  6]  
   |  |  
   V  V  
0 [1  2  5  4]  
= 2 * 1 + 6 * 2 = 14 (the arrows represent the connection between the kernel and the input)

Third step:

     [2  6]  
      |  |  
      V  V  
0 [1  2  5  4]  
= 2 * 2 + 6 * 5 = 34

Fourth step:

        [2  6]
         |  |
         V  V
0 [1  2  5  4]  
= 2 * 5 + 6 * 4 = 34

Fifth step:

           [2  6]
            |  |
            V  V
0 [1  2  5  4] 0  
= 2 * 4 + 6 * 0 = 8

The result of the convolution for this case, listing all the steps, would then be: Y = [6 14 34 34 8]

$n \times n$ image, $f \times f$ kernel, $n-f+1 \times n-f+1$ result (Valid padding = no padding)

With padding:

For padding $p$: $n+2p-f+1 \times n+2p-f+1$

Same padding $p=(f-1)/2$

With strides

$(n+2p-f)/s + 1 \times (n+2p-f)/s + 1 $

With Volumes:

$ 6 \times 6 \times 3$ * $3 \times 3 \times 3$ = $4 \times 4$

$n-f+1 \times n-f+1 \times n_c'$ Number of filters $n_c'$

Use many filters, to detect multiple features

# full method
np.convolve(x,h,"full")
array([ 6, 14, 34, 34,  8])
# same method
np.convolve(x,h,"same")  #no zero padding at end
array([ 6, 14, 34, 34])
# valid method
np.convolve(x,h,"valid")  #no zero padding
array([14, 34, 34])
  • 3×3 filter (4D tensor = [3,3,1,1] = [width, height, channels, number of filters])
  • 10×10 image (4D tensor = [1,10,10,1] = [batch size, width, height, number of channels]
  • The output size for zero padding 'SAME' mode will be same as input = 10×10
  • The output size without zero padding 'VALID' mode: input size - kernel dimension +1 = 10 -3 + 1 = 8 = 8×8
import tensorflow as tf
 
#Building graph
 
input = tf.Variable(tf.random_normal([1,10,10,1]))
filter = tf.Variable(tf.random_normal([3,3,1,1]))
op = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='VALID')
op2 = tf.nn.conv2d(input, filter, strides=[1, 1, 1, 1], padding='SAME')
 
#Initialization and session
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
 
    print("Input \n")
    print('{0} \n'.format(input.eval()))
    print("Filter/Kernel \n")
    print('{0} \n'.format(filter.eval()))
    print("Result/Feature Map with valid positions \n")
    result = sess.run(op)
    print(result)
    print('\n')
    print("Result/Feature Map with padding \n")
    result2 = sess.run(op2)
    print(result2)

Fixed hyper-parameters:

  • Filter size f
  • Stride s

Typical values: $f=2, s=2$

Usually no padding is used.

Channel is the same (depth)

In deep networks 7x7x1000 ⇒ 1x1x1000

  • Ensembling of outputs
  • Multi-crop at test-time: Run classifier on multiple versions of test images (cropped, mirrored, …) and average results
  • data_mining/neural_network/cnn/cnn.txt
  • Last modified: 2019/10/26 10:04
  • by phreazer