
The Bottleneck transformer is a conceptually simple yet powerful backbone architecture that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. This implementation is based on the paper Bottleneck Transformers for Visual Recognition, and it is a simple, clean codebase that is easy to follow and use. It also incorperates PyTorch’s distributed training module to speed up multi-GPU training.