There is a voice search technique for searching a part where a keyword is pronounced from stored speech data, such as a video, a voice mail, an answering machine, or the like. In this technology, even if a long speech is stored on speech data, it is important to quickly retrieve the part of interest with accuracy. JP 2002-221984 discusses a method for detecting speech data corresponding to a key word, where the speech data to be served as a retrieval target is converted into a phoneme series using an acoustic model in advance and a keyword is then converted into a phoneme series when detecting the keyword by speech to compare the phoneme series of the keyword with the phoneme series of the retrieval target by the dynamic programming (DP) matching.