An Image is worth worth 16X16 words

In this blog, we will review this paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale available at https://openreview.net/pdf?id=YicbFdNTTy

Abstract

Abstract

In CV, attention is either used along with CNN or in some other way keeping the CNN in place. This work uses a pure transformer on sequence of image patches for the classification task.