Text this: Feature fusion and clustering for key frame extraction