Various devices exist for optically capturing visual objects, such as text on a page, in electronic form (e.g., digital data), and such devices may further process (e.g., manipulate) the captured information in some manner. Specific examples of such devices include optical scanners and digital cameras. Moreover, various devices exist for capturing text from a page and converting the captured text to speech. Such text-to-speech converters are relatively well-known, are generally operable to capture text in a relatively “closed environment,” and convert the captured text to speech. That is, such text-to-speech converters typically operate to capture text from a page as specified by a user. Thus, the environment in which such text-to-speech converters generally capture text is relatively closed, as a user typically specifies a defined page (or portion thereof) on which the text to be captured is included. Further, the user typically specifies/controls the specific text to be converted to speech. That is, the user typically dictates to the text-to-speech converter the exact text to be captured for processing (i.e., for converting to speech).
As a further example, various devices exist for capturing text from a page and translating the captured text from one language to another language. Such language translation devices are relatively well-known, and, as with existing text-to-speech converters, they are generally operable to capture text in a relatively “closed environment.” That is, such devices typically operate to capture text from a page as specified by a user. Thus, the environment in which such language translation devices generally capture text is relatively closed, as a user typically specifies a defined page (or portion thereof) on which the text to be captured is included. Further, the user typically specifies/controls the specific text to be translated to a different language. That is, the user typically dictates to the language translation device the exact text to be captured for processing (i.e., for translating to a different language).
Generally, language translation devices translate text from one language to another language, and output the translated text in textual format to a user in the desired language. Thus, for example, a user may utilize such a language translation device to scan an item from a restaurant's menu that is written in a language that the user does not understand. The translation device may translate the scanned menu item to a language that is understood by the user, and output text presenting the menu item to the user in the translated language. Such language translation devices do not perform a text-to-speech conversion to output the translated text in audible form.
As described above, various optical capture devices of the prior art for capturing information, such as textual information, are typically implemented to capture such information within a user-defined, closed environment. In this manner, “closed environment” is intended to encompass an environment specifically dictated and/or controlled by a user of an optical capture device as containing a visual object to be captured by the optical capture device for processing. For instance, prior art optical capture devices typically capture text from a page (or portion thereof) as specified by a user. Further, a user typically presents the page (or portion thereof) which includes the text to be captured to the capture device. Thus, the user takes an active part in controlling the environment (e.g., the page) from which the device is to capture text. More specifically, a user controls/specifies the specific page (or portion thereof) from which a device is to capture text for processing.