Reading literature is a popular pastime for many people. Some may read epics and tragedies while others may prefer comedies. Still others may enjoy novels, short stories or creative nonfiction. Regardless of the type of literary genre, people tend to develop preferences for different writing styles and content. Logically, most people like recommendations for the next piece of literature they will read, hoping that the piece of literature will fulfill their literary desires. Unfortunately the prior art has been woefully deficient in providing accurate recommendations to people.
More often than not, a person will read a piece of literature and write, or otherwise convey, a recommendation about the book that others can read. People must then base their decision on buying or reading the book according to the reviewer's recommendation. However, the opinion of a human reviewer is subjective, and relies on the individual knowledge and preferences of the reviewer to make the system of recommendation accurate. If the user's preferences differ from the reviewers, the recommendation is not as valuable to the user. Additionally, even a single reviewer may vary their judging criteria across different books over time, making direct comparison between recommendations difficult.
The ability of a human reviewer to recommend a book is dependent on the reviewer's knowledge of other texts. This means that a reviewer who has read more books will be able to better judge a book's comparative values than a reviewer who has read fewer books. No matter how well read the human reviewer, they are incapable of consuming the entire catalog of written texts that are currently available to the reader; doing so would take several lifetimes. Even if it were possible, the ability to accurately recall and compare the contents of every book would be beyond the capability of a human reviewer.
It is also very difficult for a human reviewer to communicate a review of all the stylistic elements of a book to the user. A human reviewer may be able to write a review of a book's characters and storyline, but unless both the reviewer and the user share a common vocabulary of literary terms, it is difficult for the human reviewer to effectively describe to the user the details of the text's language.
Social networks have also provided various literature recommendation methodologies. Social networks based recommendations rely on the buying and reading habits of other people with similar preferences to the user in order to make recommendations that the user is likely to enjoy. This includes recommendation systems that recommend texts based on the buying habits of people that have purchased a book or text that the user is considering buying or reading. This also includes systems that make recommendations to users based on how well their profiles match other users that have expressed similar opinions about a given text or texts. Unfortunately, social network based recommendation systems, by definition, base their recommendations on how well the user matches the preferences of other users. As a consequence, these systems are less reliable if the user base is too small; in order to make an accurate recommendation, the user's preferences have to be matched by a sizable number of other users.
Social network based systems identify a user's preferences and use that identification to group the user with other users of similar preferences. They do not match a user to a book that matches the user's preferences, but instead to a book that has been recommended by users that have matching preferences. Consequently, these systems match users based on the characteristics which make them similar to other readers; these systems are not particularly good at matching users to texts when the user's preferences diverge from the community's preferences. As a result, the system is less able to recommend books to users who are highly unique, or to recommend books which are highly unique.
Social network based systems are not objective, and are dependent on the texts that have been read by the community. Recommendations can be influenced by advertising campaigns, the popularity of a specific book or author, the number of users in the system that have read that specific book, etc. As a consequence, social networking based systems are more capable of recommending popular books than books which are not popular, but which may still appeal to the preferences of the individual user. For example, Stephen King is more likely to be recommended by a social networking system than an unknown author who just completed his first book, because more people have read and provided input on books by Stephen King. Books with a small reader base are not treated equally by the system compared to books which have been read by a large user base, or which have an artificially inflated rating due to advertising or factors that can influence social groups.
The Flesch-Kincaid Readability Test and Lexile Scoring systems were designed to measure a text's difficulty level. The Flesch Kincaid Readability Tests are designed to determine how difficult a particular passage is to understand. There are two tests, the Flesch Reading Ease, and the Flesch Kincaid Grade Level. Although the tests use the same core measures (word length and sentence length), they have different weighting factors, so the results of the two tests do not always correlate. For example, a first text may produce a higher score on the Reading Ease test over a second text. However, the first text may produce a lower score than the second text on the Grade Level test.
The Lexile Framework provides a common, developmental scale that attempts to match reader abilities with text difficulty. Lexile aims at enabling individuals to select targeted materials that can improve reading skills and to monitor reading growth across the curriculum.
Both the Flesch Kincaid system and the Lexile Framework are designed to identify how difficult a text is to read, and do not attempt to make any predictions beyond that. For example, the Flesch Kincaid formula (fundamentally) only measures the number of words in a sentence or paragraph that contain more than a certain number of syllables. This approach is extremely simplistic compared to our approach, and is incapable of identifying higher-level stylistic elements. It is also has no structure for comparing variations within a text between scenes. The Lexile Framework does not release how their scores are measured. Moreover, the Lexile Framework targets only the difficulty level of a text, similar to Flesch Kincaid.
Methods of analyzing literature have also been used to assist writers in identifying and targeting their writing to specific stylistic trends. The only manner in which these methods have been provided use human reviewers. The writer, publisher, agent, or any interested party first identifies the commercial success of different books in different genres. That party then attempts to manually analyze each book to find stylistic trends between successful and unsuccessful books. However, this approach of using a human reviewer has proven to be slow and limited in terms of the number of books that can be analyzed. Additionally, because the review is performed by a human, the error rate is likely to be high, even if the reviewer is highly trained; accurate measurement of many stylistic elements for every scene in a book would be difficult.
Literary analysis has also been used to help agents, writers, publishers, or any interested party, identify stylistic elements in manuscripts before they have been published, and compare them to texts that are already published. The goal of such a system has been to help connect writers, agents, and publishers to each other based on the stylistic match between the writer's manuscript and the preferences of the agent or publisher. However, such methods have heretofore fallen short of their intended results. In one aspect, prior methods have relied on the author to identify an agent or publisher who is likely to want to represent or publish their manuscript, by performing a human analysis of the agent or publisher's past texts. Other methodologies have relied on the agent or publisher to manually acquire and identify a manuscript that matches their individual preferences. Neither approach provides satisfactory results.
In a system where a writer identifies a potential agent or publisher, the system relies on a time intensive process that requires a prospective writer to manually review and identify the stylistic trends of publishers or agents based upon the publishers' past texts or the texts that the agent has represented, respectively. The most common way to do this has been to first purchase a subscription to a compilation of literary publishers and agents (such as WritersMarket.com), which lists publishers and agents based on a search criteria, such as genre or whether the publisher or agent is currently accepting submissions. The writer then selects a number of publishers that publish the genre of the writer's manuscript and reviews a number of those publisher's published texts in attempt to identify stylistic similarities to their own manuscript. The writer then sends their manuscript to the publisher or agent they determine is the most likely to accept their manuscript. There are a number of limitations to such a methodology.
It is, for all practical purposes, impossible for a single person to adequately analyze all the past texts from all the prospective publishers or agents. Because of this, it is likely that the writer will not identify the publisher or agent most likely to accept their manuscript. This mismatch often results in the manuscript being rejected. If a publisher or agent rejects a writer's manuscript, the writer must resubmit it to a new publisher or agent for review. Typical time required for a publisher to review a manuscript ranges from one to two months, and current publishing standards require that the writer only submit their manuscript to a limited number of publishers or agents at a time. Accordingly, the time required to correctly target and submit a manuscript manually for publication is very long.
In a system where a publisher or agent identifies potential manuscripts, the system has relied on a writer to submit their manuscript to a publisher or agent, and then relied on human reviewers to identify texts that were worth publishing or representing. Typically, this type of system inherits all the drawbacks that have been present on the writer's side of the submission process. If a writer fails to correctly identify the publisher or agent, the ideal publisher or agent may never have an opportunity to even review the manuscript. Assuming a manuscript is received by the agent or publisher, it then has to be reviewed by human reviewers, traditionally a series of them, before it is reviewed by a person who can decide if the manuscript should or shouldn't be represented or published. This means that a manuscript has to survive an elimination process dependent on humans which may or may not have matching preferences compared to the person who ultimately decides if a manuscript should be published or represented. Additionally, such a methodology is slow, and does not allow the publisher or agent to specify the type of stylistic characteristics they want in a manuscript before a manuscript is submitted to them.
Prior text analytical systems have also been used as at least a component of e-mail spam filter tools. Such systems have compared an incoming e-mail with a statistical profile that has either been identified as “spam” or as “not spam”. The systems look at the frequency of certain words and phrases, then determine if the incoming e-mail has a greater possibility of being spam or of being not spam. For example, if the an e-mail contains the text “Viagra” more times than is likely in an e-mail that is not spam as identified by the spam filter, it is likely to be considered spam.
Prior text and literature analyzing systems have not been successfully incorporated with alert systems, such as medical and stress alert systems. Studies indicate that certain health issues (such as Alzheimer's Disease) are detectable in a user's writing, such as a decrease in vocabulary use, often before they are evident to the user.
Similarly, prior literature and text analyzing systems have not been adequately used to improve search engines. Generally speaking, prior search engine systems have relied on the frequency and placement of keywords and related keywords within a website. The systems have then combined that information with other metrics, such as how many other websites of a certain topic link to that specific search result.
Literature and text analyzing systems are not typically used as tools for improving targeted ad placement. Generally speaking, targeted advertising attempts to identify the subject matter of the content of a website, and display advertisements that offer products that are relevant to the subject matter.
Prior literature and text analyzing systems and methods have not been successfully applied as tools for judging the general mood on the Internet towards a specific subject matter. Such prior methods have involved websites that gather information from many sources, such as review sites. The methods have combined this data to produce a general score for the item. For example, GameRankings.com or RottenTomatoes.com gather the 1-10 review ratings about movies and video games from different review sites, and combines them to create a single number or data point that represents the general opinion of those review sites. Another method of judging general opinion uses user polling, which allows users to vote on whether a product or item is good or bad, and the aggregate score is displayed to the user. Many prior systems require individual review sites to present their data in a form that is easy for the system to interpret. For example, providing ratings through a single score based on a 1-10 scoring system that can be easily combined with other 1-10 scoring systems. These systems have also relied on the various sites providing RSS feeds that the system can automatically interpret. However, the systems are not typically capable of identifying the general attitude of websites or content (such as blog posts) that are not explicitly formatted to be parsed by their system. Consequently, such systems do not measure the attitudes of text written on general blogs about a subject. Additionally, these sites tend to be specific to one subject matter.
Video analysis and recommendation systems have not heretofore incorporated the use of text analyzing methodologies. Previously, human reviewers and social network-only based recommendations have provided the only basis for video analysis and recommendation systems. Such systems are similar to prior book recommendation systems and, as such, inherit many of the drawbacks described herein above.