The Internet is a worldwide system of interconnected computer networks that transmits data in packets. Various information and services are carried over the Internet, such as electronic mail (e-mail), online chat rooms, and the World Wide Web (the Web). In particular, the Web is an information space in which online documents called web pages are stored and published for the entire computing world to access. Anyone connected to the Internet can view the myriad of web pages available online by accessing global identifiers called Uniform Resource Identifiers (URIs).
A web page is a simple file containing, for example, text and a set of Hypertext Markup Language (HTML) tags that describe how the text should be formatted on a screen. HTML tags are simple instructions that tell web browsers how a web page should look when it is displayed. For example, HTML tags may describe a web page's fonts, colors, title, etc. Furthermore, web pages may be accessed via the Hypertext Transfer Protocol (HTTP) and may be displayed according to HTML tags by a software package called a web browser. Web browsers identify web pages on web servers by their URIs. Examples of web browsers include Microsoft® Internet Explorer, Opera Netscape Navigator, Firefox, and Thunderbird. Once a web page is retrieved, the web browser interprets the page's HTML tags and displays it accordingly on a screen.
A web site is a collection of individual related web pages. Examples of different types of web sites include archive, business, database, and news sites. One specific type of web site gaining popularity today is the “weblog,” also known as “blog.” A blog is a web site containing periodic articles and posts, usually presented in reverse chronological order. Generally, blogs are much simpler than other web sites. Rather than being composed of many individual pages connected by hyperlinks, blogs are composed of a few templates (usually Main Page, Archive Page, and Individual Article/Item Page), into which content is fed from a database. This allows for easy creation of new pages, since new data is entered into a simple template and then submitted, which effectively adds the article to the blog.
All language is biased by its basic nature and is a consequence of individual history, opinions, context, ethics, experiences, belief structures, or other bias. Consequently, web pages and blog posts are typically slanted to the author's point of view. There are many instances where users may wish to substitute their own preferred biases for those of a web page or blog author.
Since its inception, the Web has rapidly expanded to include a vast and diverse amount of online information and provide a global forum for unregulated public speech. With the advent of new web-building software, such as Microsoft® FrontPage®, Macromedia Dreamweaver, Mozilla Composer, Blogger, Xanga, Typepad, etc., it has become much easier to create and publish information online. As a result, a plethora of web pages, blogs, and other online sources that describe and discuss nearly every aspect of life are readily available on the Web. Internet search engines like Google and Yahoo! search online documents using keyword-driven search technology. However, these services merely direct a user to web pages. They do not synopsize information, alleviate author bias, or allow the user to interpret the information with their own particular bias. Also, as the number of online documents keeps increasing, keyword-driven searches will provide larger results for a user to navigate through for information. Therefore, a need exists to assimilate blogs and web pages by specific topic, analyze them, and summarize their underlying objective content.