Hadoop is a basic framework for establishing a distributed system, through which a user can develop a program for distribution without understanding the underlying details of the distributed system, and realize the high speed of operation and storage by fully utilizing the power of aggregation.
Hadoop Distributed File System (HDFS) is a distributed file system used by Hadoop, which is suitable for storing and processing big data, and is of high fault tolerance and high throughput.
In products that involve serving and processing big data, the use of HDFS provides an efficient, quick, and matured solution that exploits the characteristics of HDFS for storing massive amounts of data and providing external services based on the data.
Conventionally, there are two kinds of ways for accessing the HDFS:
1) Accessing the HDFS by calling through a Hadoop client, where the Hadoop client is a control tool provided by the Hadoop for reading/writing the HDFS, and the calling is accomplished by a command line input; and
2) Accessing the HDFS through programming using the library functions in the Libhdfs, where the underlying execution of the Libhdfs functions still relies on a Hadoop client.
Thus, a Hadoop client is required to be installed and running on the accessing apparatus if the access to the HDFS is to be accomplished through manners 1) and 2) above. FIG. 1 is a schematic diagram showing the relationship between an accessing device and the HDFS in a conventional configuration. As shown in FIG. 1, the Hadoop client is required to be installed and running on each accessing apparatus that wishes to access the HDFS.
Thus, in practical applications involving access to the HDFS, a problem of high implementation cost for upgrading the current version of the Hadoop client on each accessing apparatus arises when the total number of the accessing apparatus is larger, for example, over hundreds and even thousands.
Furthermore, it is relatively slow and inefficient to realize the access through calls to the Hadoop client for each access, while it is difficult to create and maintain the programs for using the Libhdfs libraries.