Video presentation affects the usefulness and accessibility of a collection of videos. A system and method for presenting video search results in a way that maximizing relevant information for the user is desirable. Searches in video collections can return a large number of relevant results. It is important to present those results in a form that enables users to quickly decide which of the results best satisfy the user's original information need. Video shots are uninterrupted sequences of visual coherence, usually shot by the same camera without turning it off. Stories are semantically related groups of shots, where the semantic information comes from a time-aligned text transcript. This transcript may come from automatic speech recognition or close captions. Story boundaries can come from annotations in the text, e.g., section markers in closed captions, or they can be determined with a variety of automatic text processing methods, including self-similarity, vocabulary innovation and others.