The analysis of textual documents to ascertain which documents are closest matches is a recognized objective in computer science. A basic approach to accomplishing this objective is to calculate the occurrence of each word (e.g., a word count) in a textual document to identify other documents with the same or similar word counts. While this approach may be relatively easy to perform, it has numerous drawbacks.
One derivation on the basic “word count” approach includes TFIDF techniques. WIKIPEDIA explains that the “tf-idf weight (term frequency-inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model.”
A more sophisticated approach in the art is latent semantic indexing (LSI) or latent semantic analysis (LSA). Many Internet search sites reprioritize their result rankings based on LSI/LSA. LSI/LSA enables a search engine to figure out what a document is about without requiring that the search query text match exactly. LSI/LSA uses natural language processing and vectorial semantics to achieve enhanced search rankings. LSI/LSA model the context within which words or phrases are used to recommend other documents with similar words or phrases. LSI/LSA offers better performance than a “word count” approach.
The Massachusetts Institute of Technology Media Lab has developed numerous publicly available products for performing sophisticated semantic analysis of documents. According to their “Common Sense Computing Initiative” website, their current research in that area addresses “[c]reating systems that understand the connections between everyday events and objects, people's beliefs, [and] the way they express them in language, [u]sing this understanding to make computers more ‘people-friendly’, [d]eveloping representations for different varieties of common sense knowledge, [d]eveloping methods for acquiring common sense knowledge from people, corpora, and the web, developing architectures that let us fuse these diverse techniques into flexible and resourceful systems.” That lab has applications and/or concepts such as ConceptNet, Divisi, Luminoso, CrossBridge, AnalogySpace, PerspectiveSpace, Blending, and Open Mind Common Sense that are readily available to select members of the public under particular licensing agreements. Various levels of information about one or more of these applications/concepts is publicly available via the lab's Internet website and in the information disclosure statement accompanying this filing; the information disclosure statement and accompanying copies of cited references are herein incorporated by reference in their entirety.
Meanwhile, on an unrelated topic, technicians, engineers, and managers working for a company may respond to an outage of customer-facing services by first trying to restore service before developing a more permanent solution. Service outages may be due to one or more various problems, including software/hardware that has been incorrectly installed/modified/upgraded, or problems with data feeds from service providers or vendors, and other problems. Over the course of service restoral, the people responsible for troubleshooting may coordinate to gather and collect information about the current problems and malfunctioning systems and record them in an incident ticket. As they work on restoral, they may refer to several information resources, such as “playbooks” (i.e., documents that provide a sequence of steps or flow chart detailing the steps to restoral based on the problem description), “maps” (i.e., physical, logical and transactional maps of systems involved), and/or “flows” (i.e., high-level diagrams of key applications and processes). Finally, they may also peruse historical information, including previous, similar incidents, and investigate previous changes that may have caused the current incident. Current tools for assisting in troubleshooting techniques are deficient.