1. Field
The present invention is a technology for collecting big data for analysis, and, more particularly, for collecting big sensor data generated on a sensor network for analysis.
2. Description of the Related Art
With fast-spreading digital economy, we are increasingly living in a ‘Big Data’ environment in which unimaginably large amounts of information and data are produced. In the ‘Big Data’ environment, big data analytics is becoming important to process large unstructured data as well as large structured data. The typical big data analytics is Hadoop, an open-source software framework that supports data-intensive distributed applications, and the running of applications on large clusters of commodity hardware. In general, a big data service is provided based on Hadoop. Hadoop collects both structured data and unstructured data, processes the collected data set in a distributed network cluster in parallel, and extracts valuable information from the processed data set within a short time. Hadoop Distributed File System (HDFS) is an open source for storing big data dispersedly, that is, a technology for storing collected data reliably. The most important part of the big data analytics is collecting data before storing the data, and many data collection tools based on Hadoop supports collecting data in HDFS.
Generally, each sensor node composing a sensor network collects data, and the sensor network provides an application service using the collected data. The sensor network is located on an area network within a large infrastructure network. Thus, the collected data crosses over the area network to thereby be used in a server for an application service. In most cases, data generated in a sensor network is transferred to a server of an area network, and the server uses the data for a desired service. As such, sensor data is generated in a sensor network that is simply an area network within an infrastructure network, but collected by a server located at the center of the infrastructure network, so that there are challenges to provide a service by processing big data generated in sensors.
Each server of an area network collects sensor data transferred from the area network, and thus, the server needs to be in association with Hadoop in order to store the sensor data in HDFS. Yet, in most cases, a server is associated with a Hadoop to transfer collected data to a collector in HDFS. So, when the server transfers sensor data, collected from an area network, to the collector, the sensor data needs to be converted in a form suitable for the collector, so that heavy load may occur and memory usage may increase continuously.
Accordingly, if there are a lot of logs to be processed, the system may shut down, resulting in data delay. Consequently, it would be difficult to provide a big data service using sensors. In order to overcome these drawbacks, it is necessary to reduce the number of files to be monitored, and to apply both a deferred processing method and an asynchronous processing method. In other words, what is needed is a method for providing a highly-reliable big data service in real time to transfer big sensor data, generated on a sensor network, to a Hadoop-based collector.