====== Mask RCNN ====== Extends Faster-RCNN * Additional branch for predicting segmentation masks on each Region of Interest (ROI) * Mask branch is a small FCN Problem: Faster-RCNN is not designed for pixel-to-pixel alignment between input and output. * RoIPool operation for attending to instances performs coarse spatial quantization for feature extraction Solution: Quantization-free layer that preserves spatial location (RoIAlign) ===== Architecture ===== * Convolutional backbone used for feature extraction * Alternative 1: ResNet-50-C4 (features of final conv layer of 4-th stage (C4)) * Alternative 2: Feature Pyramid Network (FPN): Extracts features from different scales * Network head for bounding-box recognition (classiciation and regression) * Extend Faster R-CNN box heads from ResNet/FPN paper with mask prediction branch.