Mask RCNN
Extends Faster-RCNN
- Additional branch for predicting segmentation masks on each Region of Interest (ROI)
- Mask branch is a small FCN
Problem: Faster-RCNN is not designed for pixel-to-pixel alignment between input and output.
- RoIPool operation for attending to instances performs coarse spatial quantization for feature extraction
Solution: Quantization-free layer that preserves spatial location (RoIAlign)
Architecture
- Convolutional backbone used for feature extraction
- Alternative 1: ResNet-50-C4 (features of final conv layer of 4-th stage (C4))
- Alternative 2: Feature Pyramid Network (FPN): Extracts features from different scales
- Network head for bounding-box recognition (classiciation and regression)
- Extend Faster R-CNN box heads from ResNet/FPN paper with mask prediction branch.