Text this: Statistical and Visual Analysis of Audio, Text, and Image Features for Multi-Modal Music Genre Recognition