With the advent of the Internet, the amount of data accessible to users is far greater than what any person or entity could possibly identify or categorize through manual means. However, identification and categorization are needed to render the information usable. Because manual means are limited in scope and costly, automated systems and methods are needed to identify and process the vast quantities of available data.
Electronic text can be identified through automated means such as word searches in text-based documents such as the .html files that predominate the Internet. Indeed, the search engines that enable users to find data on the Internet typically use a word search. However, for electronic documents that are not in a text-based format, content identification and categorization become substantially more difficult. Optical character recognition (OCR) technologies can identify text in electronic documents that are not natively in a text format, such as .pdf files. Other imaging processes have been employed to electronically process either an image of a document or the electronic version of a document to identify the content of images. For instance, some software programs can identify the presence of flesh in an image and have reasonable success in separating pornographic images from images appropriate for all ages. However, such processes can be inaccurate, and they typically require substantial processing power. Further, such processes are entirely computer-based and therefore lack the pattern recognition capabilities, contextual knowledge, and judgment of the human brain.
These and other drawbacks exist with current systems and methods.