Text this: A deep learning method for video‐based action recognition