Faster RCNN

  • First: Use a pretrained CNN to create a feature map.
  • Region Proposal Network: Fully convolutional network (FCN) that proposes regions
    • Set of rectangular object proposals and objectness score
  • ROI pooling of proposals
  • Fast R-CNN detector
    • Classify content of bounding box
    • Adjust bounding box coordinates (better fit for object)

FCN is used to share it with Fast R-CNN object detection network.

  • Input: Image of any size
  • Output: Rectangular object proposal with objectiveness score
  • Fully convolutional network
  • Sharing convolutional layers with Fast R-CNN

Slide network over convolutional feature map (obtained by last convolutional layer).

n x n window as Input (e.h. n=3) Mapped to lower-dimensional feature (e.g. 256d)

Fed into fully connected sibling layers:

  1. box-regression layer (reg) 1 x 1
  2. box-classification layer (cls) 1 x 1

At each sliding window: Predict max k region proposals

Output of RPN

  • reg layer: 4k outputs (coordinates)
  • cls layer: 2k outputs (prob for foreground, prob for background)

Proposals are placed relative to k reference boxes = anchors

Anchor is centered at sliding window, associated with scale = 3 and aspect = 3 ratio ⇒ k = 9 anchors at each sliding position (WHk anchors in total)

Translation-invariant anchors

Pyramid of anchors

  • Classifies and regresses bounding boxes with reference to anchor boxes of multiple scales and aspect ratios
  • Only needs images and feature maps of a single size

Features used for regression are of same spatial size (3 x 3) on feature maps. k bounding box are learned, for each scale and aspect ratio, they don't share weights.

For training: Anchors which overlap ground truth object > 0.5 IoU ⇒ foreground

  • Each mini-batch arises from single image with positive and negative anchors
    • More negative samples present
      • Randomly sample 256 anchors with pos/negative ratio of 1:1
  • New layers are initialized with 0-mean gaussian distribution, $\sigma=0.01$
  • Shared convolutional layer are initialized by pretrained weights of ImageNet classiciation.

Non-Maximum Suppression (NMS):

Anchors overlap ⇒ proposals overlap. NMS Sorts proposal by score, discards those which have an IoU > threshold with proposal with higher score.

Could already stop here for binary object class detection.

RPN proposes RoI of different sizes ⇒ different sized CNN feature maps.

Region of Interest Pooling simplifies the problem, by reducing feature maps into same size.

Splits input feature map into a fixed number of roughly equal regions, then applies max-pooling on every region. Output is always fixed.

Now, those feature can be used for classification

Two tasks:

  • Classify proposaly into m classes (plus background class, to remove bad proposals)
  • Adjust bounding boxes for proposal according to predicted class

Fast R-CNN and RPN would have different convolutional layer weights, if trained independently.

Alternate training:

First train RPN, then use proposals to train Fast R-CNN. Fast-RCNN is then used to initialize RPN

  • data_mining/neural_network/cnn/faster_rcnn.txt
  • Last modified: 2019/10/26 12:04
  • by phreazer