Generally, the Internet is a global network of computers that include websites, servers and data stores. The Domain Name Service (DNS) directs Internet traffic to the appropriate websites, email servers and related machines. DNS is a hierarchical database containing the relationship of a given Internet Protocol (IP) address and it's corresponding website Uniform Resource Locator (URL). The World Wide Web (WWW) enables billions of commercial, social and intellectual transactions per day. The Hidden Web represents the data and websites that exist outside the reach or interest of conventional search engines. The Internet Stack represents a layered protocol model. DOM (Document Object Model) is the hierarchical model related to the structure of a web page. Hypertext Markup Language (HTML) is the language or code used to design and create webpages including placement of images, videos, documents and links on a particular webpage. Meta-data (or Meta tag) is HTML code that quite limited and does not provide a comprehensive detail of the content, images and videos of a web page.
Conventional search engines are used to search for information on the WWW, including digital media files stored throughout the WWW (e.g., on webpages). Conventional search engines operate by web crawling HTML markup pages for webpages, indexing the data resulting from the web crawling (e.g., meta-data), and searching the indexed data (e.g., using a combination of searching algorithms). Prior to the present invention, searching for digital media files is limited to the use of these conventional search engines since an Internet classification system does not exist for Internet-based information: websites, video files, audio files, and digital media files within websites.
In the context of the Internet, digital media files (e.g., images, videos, audio, etc.) are positioned on a webpage using HTML tags (e.g. <img src=“image_file1.jpg”> and in some cases <embed src=“audio_file1.mp3”>). Accordingly, digital media files do not reside on a webpage, but represent a hyperlink to the binary image, video or document on the sever file-system. For example, in HTML and website design, digital media files are external to their relative web pages (e.g., stored on a server of a web host). When a web user accesses the above noted digital media files by visiting a webpage, the image, movie and/or audio files are transferred to the cache folder of the user's web browser and computer from the external source. Therefore, digital media files exist independently of their respective websites. In the context of computer devices, such as a computer desktop, server, and mobile devices, digital media files are binary and inert. Accordingly, digital media files become useful only when opened and edited within their associated applications, e.g. Microsoft Word (DOC), Adobe Acrobat (PDF) or Adobe Photoshop (JPG, PNG; among others).
Digital media files can include, but are not limited to Hypertext Markup Language (HTML) web pages (.html), Portable Document Format (PDF), Adobe Illustrator (.AI), Adobe Photoshop (.PSD), Word Documents (.DOC), Text Editor (TXT, RTF), Computer Aided Design (CAD); and image files, JPEG/JPG, PNG, GIF, BMP, TIFF; and video files, MOV, AVI, MPEG, MP4; and audio files, MP3, AU, AIFF and WAV. Digital media files include meta-data to provide additional information to search engines and other applications about the content of the digital media files. Meta-data is information about a file contained within the file. The form, content and extent of meta-data is determined by the file format, e.g. PDF, JPG, .DOC. Meta-data does not contain “instructions” related to Internet search, web advertising, pricing, commercial licensing, copyright revocation settings, image recognition, file networking, broadcasting or self-organization. The meta-data of a given digital media file is not networked with other files or to a global (meta-data) database, Internet database, Internet classification system, Internet ontology, or phylogenetic structure. The classification of files and file meta-data is commercially desired by search engines, information designers and advertisers; it would be useful in organizing the digital media files on the Internet, their copyright ownerships, subject matter, copyright licensing and advertising in relation to all other files on the Internet. However, meta-data is static and inert. Meta-data does not automatically enrich itself heuristically based on information or knowledge gained from additional Internet content, searches of the Internet or interaction with an Internet classification system.
For example, JPG files provide meta-data in form of extended information or Exchangeable image file format (EXIF). EXIF meta-data provides information related to the geographic information system (GIS) of the photo, shutter speed, ISO settings, and related technical specifications. The EXIF does not contain information about the subject of the photo unless explicitly established by the user. The EXIF does not organize the file into a global ontology; thereby, relating the file to all other files on the Internet in an organized format and does not allow the JPG to network with other JPG's, (e.g., with images or documents of similar subject matter). Additionally, EXIF does not contain instructions for search engines (“search”), advertising and copyright. Similarly, the meta-data for other digital media files (e.g., PDF's, MP3's, MOV's etc.) are limited to technical specifications of the file and do not contain instruction for search, advertising, and copyright. For example, digital media files do not offer their associated copyright owners the ability to license or revoke content across the Internet, websites or computer desktops of users, globally. Moreover, digital media files on a computer desktop, server or website do not independently network, broadcast, or communicate with other files, servers, computers or mobile devices. Additionally, files cannot self-organize into an ontology, taxonomy, classification system, phylogenetic tree, or other organizational structure. Accordingly, in terms of the structure, point of view and processing of digital media files, the current programming dogma treats digital media files as external and separate from business logic, algorithms, applications, programs and webpages. This point of view dictates that a digital media file be controlled, manipulated and rendered as an inert object through code not contained within the file format itself, but by an external program, executable, method or module.
In response to shortcomings of search engines being able to identify information about the content of digital media files, web developers often generate algorithms in an effort to have digital media files appear more frequently and higher in web searches to generate more exposure for these digital media files. These algorithms can be simple, complex, or related to artificial intelligence. The input/output stream (reading and writing to a file), and manipulation of a file is executed by separate code not contained within the file itself. For example, using the C language an engineer can write a text string to a file using the following executable C code:
f=fopen(“file_a.txt”, “a”);
Using C++, an engineer can write a text string is as follows:
ofstream myfile;                myfile.open (“file_a.txt”);        myfile <<“Write the word Apple to the file.n”;        myfile.close( );        
Using Java, an engineer can write a text string is as follows:                public static void main(String[ ] args) throws IOException {        File file=new File (“file_a.txt”);                    PrintWriter printWriter=new PrintWriter (“file_a.txt”);            printWriter.println (“hello”);            printWriter.close ( );                        
Writing the above code for the “file_a.txt” can be labor intensive when writing similar code for each digital media file on a website. Additionally, the file does not independently search the Internet for similar content contained within itself [file_a.txt] to achieve richer content within the content of the file. Furthermore, JPG, PNG, MP3, MP4 files do not contain inherent logic or instructions, that govern their own behavior, their role, or classification in relation to other files on the Internet.
Existing web searching methodologies experience additional shortcomings. Specifically, in the majority of cases, meta-data does not contain information that is substantively useful to search engines, web users, web advertisers, or copyright holders. In particular, meta-data does not define or express the subject matter contained within the media files, e.g. words in a song (e.g., an MP3, WAV), people or places in a photo (e.g., a JPG/PNG), scenes in a film (e.g., an MP4, MPEG, AVI), words of a poem (e.g., a DOC), or technical terms within a scientific publication (e.g., a PDF) in a manner that is efficiently located by search engines.
Traditionally, code is required for a search engine to find a digital media files on the WWW. Similarly, code is required to display advertising and to provide meta-data about a web page. In aggregate, a significant amount of code is required for a website or image to be successfully accessed on the Internet. The need for code creates a labor intensive process to have digital media files found by search engines. Additionally, depending on the code and the received search terms, the search results returned by the search engines may not be all inclusive and/or accurate. As an example, a website containing Martin Luther King's, “I have a Dream Speech,” which has a complex and inspiration narrative, may be summarized in an HTML meta tag as follows: <meta keywords=“MLK I have a dream speech”>. If a web user is searching for a particular phrase with the speech, the meta tag (e.g., metadata) is not useful in this instance of search.
Search engines address this issue by using a string tokenizer provided by programming languages, such as Java. A string tokenizer is a class, an application program interface (API), algorithm or method for recognizing individual words on a web page to index certain key words or phrases. Search engines rely upon HITML meta-data to determine the basic premise of a given website in the creation of a search index. Meta-data is the fundamental hindrance preventing search engines from generating rich search results. Search engines require enormous computing power in the form of hardware and algorithms to index the content on a given webpage that extends beyond its metadata. Specifically, search engines begin by interacting with the Domain Name Service (DNS) to locate websites by Internet Protocol (IP) address. Once a website is found, the content (web pages, images, videos) is copied to the servers of the search engines. The computer method for performing the above described task is referred to as a web spider or web bot.
The newly found websites files (web pages, images, videos) exist on the search engine's servers with very little meta-data describing the retrieved set of files (e.g., the meta-tag of the webpage, and the meta-data within the images, EXIF). Subject to very little information, the search engine uses a string tokenizer to parse the text of the related webpage and compare it with existing HTML meta-data and file meta-data EXIF. The files are processed by hundreds of specialized algorithms that attempt to collect information useful in classifying the website for use in a web search or web results. For example, some of the search engine algorithms address specific variables such as geographic location, rank, weight and context. As part of these algorithms, artificial intelligence (AI) plays an important role in attempting to create a relationship among disparate forms of Internet information. As part of this technical movement, Semantic Web is an extension of HTML moving toward machine-readable language. In a specific example, algorithms are used to process images and identify people within a photo or group of photos. Once the search engine has created a search index, a given website has its place in the index with no relationship to websites globally. Should the website have related results with the index, the relationships have been determined by many optimized algorithms designed by using data from past user searches. Search engines also provide features on top of, or while accessing, the search index via a search page. Such features include autocomplete and spell correction.
The central revenue stream of a search engine is keyword advertising, which also has its own set of shortcomings. Keyword advertising can include selection of keywords by an advertiser, bid/ask fees for each keyword paid by the advertiser, meta-tags with keywords on the advertiser website, advertising specific programming code (HTML/Javascript) advertiser website, and cookies. Advertising customers select keywords in the search engine advertising tool to design their marketing campaigns. In effect, the advertisers are guessing what a web user will type into a search engine. For example, a clothing store may select the keywords, “women's dresses.” The clothing store pays a fee for using the keywords, “women's dresses,” which is a determined by a bid/ask system similar to a stock exchange. Given the demand, the fees can be high in relation to their performance. The higher the bid, the more likely an advertisement will reach the top of the search results. After purchasing the keywords, the clothing store must author code for their website with advertising specific computer code and update their meta-tag with the keywords: “women's dresses.” The clothing store's ultimate goal is that when a user types in “women's dresses” they are driven to the clothing store's website. This model is commonly referred to as a Pay-Per-Click model (PPC); however, there are many shortcomings with keyword advertising and PPC. An advertiser's budget is reduced by the price of each click. Not every click results in driving web traffic to a website such as the clothing store's website and not every click results in product sale for the clothing store. Click fraud is also present and can further cost the advertising customer without resulting in sales. The advertising customers must also add meta-tags and advertising code to their websites. Most importantly, the advertising customers are required to rely on users visiting the stated search engine that provides the keyword to advertisement relationship. The broader the keywords, such as “golf” or “insurance” the greater the audience reach. The narrower the keywords or brand specific keywords, the lower the audience reach. At present, search engines do not offer advertisers the ability to broadcast media (videos or graphical ads) to n number of websites or customers without using the keyword model.
Moreover, the cookie is an important tool within the keyword advertising model. The cookie file is related to the web browser and is stored on the web user's local machine. As keywords are processed and a user visits a website, the cookie file is augmented by cookie data. At best, the cookie file stores the websites where users have visited or products searched: this is referred to as user profiling. However, if the cookie has been deleted or two different users having been using the same browser, the cookie can provide an inaccurate customer advertising profile. The advertisers would greatly benefit in time and resources by avoiding the selection of keywords, meta-data and website programming while obtaining a richer user profile. Additionally, the advertising customers should not be limited a user visiting a specific search engine, typing in keywords and storing the related data in a cookie file.
Advertising and digital media files can be also be provided by other service providers (e.g., social media, shopping, entertainment, etc.). These other service providers have shortcomings similar to search engines. For example, social media technologies are a form of group email corresponding to a 1:1 or 1:n relationships that are finite and do not involve heuristics. Social media are effectively HTML-based group email and subject to the same shortcomings associated with coding for search, advertising, copyright and classification.
Similarly to web search engines, shopping, entertainment, and multimedia websites provide search engines of a certain class. For example, these websites can provide search services for customers searching for books, films, songs and products. These websites typically suffer from the same meta-data, classification, HTML, coding and advertising deficiencies that challenge search engines. In particular, their ability to recommended similar or relevant books, film and movies is limited by meta-data and user data collection. Shopping, entertainment, and multimedia websites commonly make interpretations of a user's behavior a priority in their research and development efforts.
In summary, digital media files, images, videos, webpages, and documents cannot broadcast, network, communicate, or self-organize into an Internet ontology. The commonly relied upon meta-data does not provide information that is substantively useful to search engines, advertisers or copyright holders due to inherent inefficiencies and inaccuracies in the selection and establishment of the meta-data. Additionally, images, videos, and websites that are similar cannot be networked to form a pre-assembled search engine (set of search results), or to form a wiki, or social network. Furthermore images, videos, and website that are similar cannot be networked to form recommendations for users of various social, shopping, and entertainment websites. Accordingly, images, videos, and websites that are similar cannot be networked for broadcast advertising.