The present invention relates to systems and methods for identifying and cataloging objects within a content stream, and more particularly to a novel approach for locating non-textual objects within a content stream and obtaining for each object a meaningful semantic value.
New information is being stored at an ever-increasing rate on the Internet, on proprietary networks, in medical databases, in clip-art databases, in video databases, in art galleries, in civilian and military satellites, and on networks, to mention just a few locations. More and more content is being made available at these locations as they become bigger and more sophisticated. Much of the new content is non-textual, and non-textual content such as images, motion video, animation, simulation, audio, and the like will continue to be stored and used on networks. Thus, tools and techniques for identifying, measuring, filtering, monitoring, and otherwise using non-textual content are needed.
Current methods of searching, browsing, and retrieving images rely heavily on associated textual information. A picture, for example, will often have one or more words associated it in textual form, as keywords and/or a description of the picture. When someone wishes to find a particular image, a database search is performed with keywords pertaining to the desired image. If the image is not associated with the appropriate textual search terms, it might not be found. This is a problem, because similar images are often indexed under different search terms, even when one person does all the indexing for a given set of images.
Pictures with no explicit associated textual information are difficult for automated methods to index, sort, and filter, even though these operations are widely desired. For example, many people do not wish to have pornographic pictures displayed on their personal computers, so software has been developed that blocks certain Internet sites. As there are now millions of pages on the web, any blocking software must rely, at least in part, on automated methods. The sites are generally blocked by some combination of human-created filters and keyword searches, so sites that contain words considered objectionable are sometimes blocked inappropriately (e.g., medical or therapeutic sites) and some objectionable sites are not automatically filtered out.
The need to search image databases based on audio or visual content, as opposed to text labels associated with an image or recording, has been recognized for some time. Existing content-based search tools and techniques include, without limitation, template matching (a pixel level technique), texture comparison, average color comparison, color histogram analysis, shape comparison, image segmentation which interprets an image as a collection of items, characterizations based on bending energy, ellipticity, and/or eccentricity, and combinations of foregoing. One known combination uses a xe2x80x9cprobability density functionxe2x80x9d which characterizes an image using a combination of local color, texture, and shape.
Different approaches to content-based searches have different strengths. The usefulness of a given tool or technique for searching by non-textual context depends on many factors, three of which are rotational invariance, scale invariance, and reliability. Computational efficiency is also important, but it tends to become less of a limiting factor as computation devices grow increasingly powerful and less expensive.
Although some tools and techniques exist, it would be an advancement in the art to provide additional ways to search images according to their content without relying solely on keywords.
It would also be an advance to provide new search tools and techniques that are invariant as to scale and/or rotation.
It would also be an advance to provide a novel identification and cataloging method, which extends existing identification and cataloging methods and can be used together with existing identification and cataloging methods.
In short, it would also be an advance to provide new content-based search tools and techniques for use with images and/or other non-textual content, such as digitized sounds.
Such tools and techniques are disclosed and claimed herein.
The present invention provides methods and systems for identifying and cataloging objects within a digital content stream according to recognizable features of the objects. The invention is versatile, in that it may be used on audio and video content streams, as well as non-textual digital data sets of other types. Within a visual content stream, many different image formats may be used such as gif, tiff, RGB, grayscale, and others.
To characterize content in an image file or other data set, a series of similarly-shaped but different-sized contours are placed concentrically or otherwise nested around an xe2x80x9careaxe2x80x9d of interest. Conventional filters can be used to locate xe2x80x9careasxe2x80x9d of interest. Then the xe2x80x9careasxe2x80x9d under the nested contours are transformed using transformations which produce one or more semantic values. Some instances of the invention use a ratio of intermediate transformed values to arrive at the final semantic value. The semantic value may be expressed as a single number, a vector, a series of numbers, or as some other meaningful set of values which characterizes the content according to the contours and transformations used.
Note that an xe2x80x9careaxe2x80x9d may be a two-dimensional area because a data set may be a two-dimensional image, but in general the xe2x80x9careasxe2x80x9d from which semantic values are derived may have any finite dimensionality. Likewise, contours are not necessarily one-dimensional, since they are grounded in mathematical relationships that may be multi-dimensional. That being understood, for convenience the quotation marks around the word area will be omitted from now on, both in describing the present invention and in claiming it.
The semantic value(s) provided by contour transformations are used to position the data set area within a dictionary of archetypes. These archetypal semantic values may have textual or database labels such as xe2x80x9cnosexe2x80x9d, xe2x80x9cUpper-Case Axe2x80x9d, or xe2x80x9csnailxe2x80x9d, assigned to them. Semantic values which characterize one or more archetypes are compared with the semantic values derived from the new data set, to assign the data set to an archetype. If none of the archetypes fit the new data set within specified tolerances, a new archetype may be created with assistance from the user.
A xe2x80x9cuserxe2x80x9d may be an administrator, or a non-administrative xe2x80x9cregularxe2x80x9d user. In either case, a user may be a person or it may be a software task or agent or other computer process acting legitimately on behalf of a person or a group of people.
Different content streams may be in different metric spaces, so a metric manager uses a series of metric definitions to characterize the metric space of a given content stream. The archetypes within the dictionary of archetypes are translated into the same metric space as the content stream using an archetype dictionary conversion means based on reverse contour transformation.
Once the content stream and the dictionary of archetypes are in the same metric space, an object finder is used to locate interesting objects (data set feature(s)), within the content stream. When something of interest is located, an object transformer transforms the data set within the content stream and assigns it a semantically meaningful value (or values). The values are then used to determine the object""s identity relative to a dictionary of archetypes. Further refinement of the dictionary of archetypes and of the objects can be done using an object qualifier, which itself contains qualifier characteristics. Other features and advantages of the present invention will become more fully apparent through the following description.