Text this: A Sparse Transformer-Based Approach for Image Captioning