Describir: Feature fusion and clustering for key frame extraction