In this blog, we will review this paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale available at https://openreview.net/pdf?id=YicbFdNTTy

  1. Abstract

Abstract

In CV, attention is either used along with CNN or in some other way keeping the CNN in place. This work uses a pure transformer on sequence of image patches for the classification task.