As well known, understanding social concerns and opinions is of high importance in several scenarios and in many decision making processes. Nowadays, society thoughts are mainly investigated by conducting a series of questions to a selected sample of the population. Questions like. What do you think about that brand? Did you like the commercial aired during the superbowl? Do you watch that TV-show? Do you buy that product?, are commonly used by polling organizations to figure out the society opinions.
Recently, the widespread use of the Web as a way of conveying personal opinions has whetted researchers to propose methods that aim at understanding the society through Webspace analysis.
The rationale behind the usage of Web data to find the society interests is that, thanks to the set of Internet technologies grouped under the label Web 2.0, the Web is more and more a significant representation of our society: it is a modern version of the Ancient Greek Agora, where people gathered together to do commercial and administrative activities, to discuss politics and philosophy, to participate to social and religious events, to understand and influence society.
Web 2.0 technologies like blogs, podcasts, and wikis are so important in nowadays the society that they are affecting its morphology by creating new spaces of freedom, giving voice to any opinion, easing interpersonal relationships, and encouraging the creation of collaborating collectivities.
The revolution of Web 2.0 is that it potentially transforms every user from a mere passive reader to an active modern citizen, apart from ethnicity, gender, or walk of life. Using Web 2.0 technologies, people can meet virtually to share knowledge, conduct business, discuss different topics, socialize, and even influence society. The society and Web are so strongly linked that they affect each other. When something happens in the society it is very likely that few seconds later someone writes about it in the Webspace, for example, more and more people consider the Web as the first place to look for news, or when a product is released, the Blogosphere, which is made up of all the blogs and their interconnections, is the place where to discuss about it. On the other hand, the Web might influence the society providing several communication tools and an easy access to information. For instance, on May 2007 a post on a blog reported that Apple was delaying the “iPhone” and “Leopard OS”. Although this post turned out to be a false alarm, during the period that the news was considered to be true, Apple's stocks were negatively affected.
In the literature, different proposals exploit the society-Web relation so as to find out society's interests like people' concerns, Hollywood stars' notoriety, politicians' popularity, or consumers' opinions. These proposals are based on the idea that when you see something interesting, e.g., on TV, on the Web, or at the movie theater, you usually converse about it with friends, and if people talk about it and spread the voice around, there will be several on-going conversations about the topic. The more people converse on a same subject matter, the more the topic is considered in society. By supposing the Blogosphere as the place where modern conversations happens, these methods compute the number of on-going discussions about a specific topic and uses this number as an indication of the importance of the topic in society.
Also commercial products like Google Trends, BlogPulse, Trendpedia, and Blogmeter, just to name a few, exploit the Webspace to analyze human society. These tools assume that the more people use Web search engines to look for a particular topic, or the more people discuss a particular topic in the Blogosphere, the more the topic is popular, important, or simply discussed in society.
A critical thinking to these approaches is that they may help understanding what's going on in the Web, but they might be misleading or might even represent a distorted view of the society. Two are the main concerns about these methods.
Firstly, results better represent a part and not the entire society. In fact, being based on the Blogosphere, these methods analyze a portion of the society composed of tens of millions of users who share information and exchange personal opinions, a portion of the society usually defined as composed of technologically advanced people. With no doubt, the Blogosphere offers great commercial values and provides new business opportunities in areas such as product survey, customer relationship, and marketing, but compared to the 700 millions of Web users, the Blogosphere represents a very small portion of the Web, and therefore of the society.
The second critical note is related to the usage of the sole magnitude of volume data search in Web search engines, or of keywords in the Blogosphere; it is easy to maliciously alter the results as one can write a software that automatically, and periodically, issues Web searches, or posts blog messages, so as to make a brand, a website, or a politician more popular than they really are.
Recently, in the literature many proposals focused on using Web data to understand social opinions and/or concerns, as well as many commercial blog sites and Web search engines introduced services that try to give an indication of public opinions.
In the literature, much research work is being conducted on the Blogosphere, as blogs are much more dynamic than traditional Web pages:
Chi et al. analyze the Blogosphere and propose a trend analysis technique based on the singular value decomposition.
Ni et al. propose a machine learning method for classifying informative and affective articles inside the Blogosphere.
Liu et al. study the predictive power of opinions and sentiments expressed in blogs, in order to predict product sales performance.
Fukuhara et al. describe a system that counts the number of blog articles containing a specific word so as to understand concerns of people.
Glance et al. propose a mechanism to discover trends inside the Blogosphere by using data mining techniques.
Gruhl et al. use the volume of blogs or link structures to predict the trend of product sales.
Morinaga et al. present an approach that automatically mines consumer opinions with respect to given products, in order to facilitate customer relationship management.
Agrawal et al. and Gamon et al. have also conducted research in opinion mining for marketing purposes.
Also commercial blog sites and Web search engines are offering services that aim at understanding the society through Web data analysis.
The Webfountain project uses Web mining techniques for market intelligence and is based on massive server clusters; Google Trends charts how often a particular search term is entered relative the total search volume across various regions of the world, and in various languages.
All the known methods based on the simple magnitude of the results, either in the Blogosphere, Web searches engines, or the entire Webspace, are misleading and provide different, and sometimes controversial, understandings of the society.