1. Field of the Invention
This invention relates generally to information organization and understanding, and more particularly to the organization and understanding of machine data.
2. Description of the Related Art
Information systems invariably generate vast amounts and wide varieties of machine data (e.g., activity logs, configuration files, messages, database records) whose value is widespread. Troubleshooting systems, detecting operational trends, catching security problems and measuring business performance, for example, typically require the organization and understanding of machine data. But the overwhelming volume, different and changing formats, and overall complexity of machine data create substantial difficulty for software developers, system administrators and business people who want to make sense of it and gain insight into information system behavior. The problem is compounded by the fact that information systems, and the machine data they generate, continue to grow in complexity and size.
Consider for example an information system environment for web-based applications consisting of web servers, application servers, databases and networks. Each information system component is constantly logging its own machine data documenting its activities. System administrators need to access and comprehend the machine data from one or more components to find and fix problems during operations. Security analysts want to understand patterns of machine data behavior from network devices to identify potential security threats. Business people are interested in tracing the machine data across components to follow the paths and activities customers perform when purchasing products or services.
Today, people generally attempt to comprehend information system, behavior by manually looking at and trying to piece together machine data using the knowledge from one or more individuals about one or more systems. Individuals typically have specific technology domain expertise like networking, operating systems, databases, web servers or security. This expertise can also be in specific application domains like finance, healthcare, or communications. Manual approaches can be effective when considering small amounts of machine data in a single domain, but humans are easily overwhelmed as the size, variety and dynamic nature of the machine data grows.
Automated approaches, like homegrown scripts, data analysis programs, and data warehousing software, by contrast, can work with large amounts of machine data. But organizing different types of frequently changing data and formats can be troublesome, generally requiring specific methods for each type of data and necessitating modification of methods when the data formats change or new types of data are encountered. Automated approaches to building understanding from machine data are typically limited to finding simple, predefined relationships between known data elements.
Generally machine data is organized today by relying on predefined data schemas and predetermined algorithms for parsing and categorizing data. In current approaches, what, data elements exist in a machine data set and how the data elements are classified generally must be known ahead of time. How the data is cleansed, parsed and categorized is defined algorithmically in advance for different types of data formats resulting in systems that are brittle, expensive to implement, and have numerous functional shortcomings. For example, unexpected types of data are typically ignored. As a result, data categorization usefulness degrades quickly and unexpected data and behaviors are not observed or recorded. Given the inherent dynamic nature of information systems and the machine data they generate, current organization methods have limited applicability.
Building understanding from machine data is inherently subjective and depends on the task, scope of data and skill level of people using a solution. Deriving specific, useful meanings from large quantities of machine data can require expertise in one or more domains and knowledge of how data from one domain relates to data from another domain. Current methods of deriving meaning from machine data are generally based on building simple pair-wise relationships (A→B) between predetermined data elements using data values. More advanced techniques may be able to find predetermined multi-data element relationships (A→B→C), provided the data elements are described in advance, requiring the availability of multiple domain experts to configure and continuously manage a solution.
Conventional methods, whether human or automated, of organizing and understanding machine data across multiple information systems and domains suffer from an inability to effectively keep up with changing machine data and are constrained by limited data relationships, making these methods difficult, time consuming, expensive and often ineffective.
There exists, therefore, a need to develop other techniques for organizing and deriving understanding from machine data.