Text this: Video summarisation by deep visual and categorical diversity