The rapid growth of video data may lead to a demand for content-based retrieving system, including one or a combination of image, audio and textual information in the videos. Several text detection approaches for videos are proposed in the art. For example, text boxes may be retrieved from video frames based upon corner points and region edges of candidate text area. The retrieved text boxes may contain captions and non-captions. Captions (e.g., subtitles) may provide important information for video indexing, retrieval, mining and understanding.