The present AppAccel invention disclosure is an expression of, derivation of, and application of technologies described in my prior Provisional Patent Application and later F+ and Customizable Storage Controller (CSC) patents identified above.
This application relates to the cloud and Enterprise datacenter as a thinking machine, or rather as the infrastructure necessary for (hosting) processes (applications) that provide functionality that mimics thinking, that provide problem solving, that solves real world problems (Cognitive Computing). Some of this problem solving will be real-time reaction to real-world changing conditions for smarter factories, air, rail, and road traffic systems, medical and other healthcare, weather & atmospheric patterns (tornados, tropical storm systems, etc.), etc. Others may include ‘batch’ analysis of deep problems that have eluded successful analysis before now. Some will be decision support systems that collaborate with human decision-making and analysis in real time for opportunities such as market trading.
But the scope of this evolution of the datacenter is to create a foundation for thinking machines that solve problems that have high social and economic value.
The datacenter infrastructure currently includes the servers, storage resources, and the network that harnesses these together. ‘Cognitive Computing As A Service’ (CCaaS) is the next evolutionary step, benefiting from both the CSC storage improvements and the Application Acceleration (AppAccel) invention disclosed herein.
This application discloses an enabling technology, an approach to positioning storage-network-latency-throttled executables closer to the relevant storage contents. The example used in the preferred embodiment is “search”, which is foundational to analytics and other algorithm types. This technology is useful no matter what network topologies or technologies are considered, and no matter what server or storage technologies are employed, so long as the network connected storage devices can host user designated runtime executables.
This application disclosure is for the system, methods, and/or mechanisms used to run search and other software on a storage device, especially search software tailored to the contents of the storage device, where the software is part of a larger application or application framework such as a Big Data Analytics application.
The objectives include accelerating datacenter applications, especially Big Data Analytics and Cognitive Computing, by reducing network traffic, reducing the number of server to storage round-trips, positioning search proximate to data, and enable “deeper” search and constant re-search
This disclosed technology also supports the emerging datacenter storage architectures and Software Defined Data Center (SDDC) paradigm intended to deal with the enormous scaling of storage resources. It is not limited to the use of Ethernet (TCP, IP), or some other connection mechanism such as FC (iSCSI), etc.; all network (or network-like channel) connected storage interfaces and storage device types can be supported by this invention, including those not yet invented.
This disclosure is an expression of, derivation of, and application of the Bubble technology described in my Provisional Application No. 60747536, filed May 17, 2006, and subsequently granted F+ patents, and applies my F+ Storage Firewall technology and Customizable Storage Controller (CSC) technology disclosed in my above identified U.S. Patents.
Cloud datacenters are evolving and growing in importance. There are several complementary technology families interacting to create this success. The present AppAccel invention affects several of these with complementary improvements.
The novel, unobvious, and useful technology disclosed in this application relate to the acceleration of analytics algorithms running in data-centers against very large datasets.
The problem being solved is the slow execution of important software due to latency caused by large numbers of storage access requests across data-center networks (or other communication links).
This invention is derived from that subject matter disclosed in part in my above identified Provisional and Utility patent applications disclosures of FlashApp Bubbles on USB Flash Drives, as well Storage Firewall and CSC technologies, and all first-to-invent rights thereto are claimed herein.
The CSC technology provides a way to store ready for execution (‘host’) and execute application software on CSC-equipped storage devices. There is a use for this in accelerating Big Data analytics algorithms. The portions of the application software run (hosted) on the storage devices' improved storage controllers are those that otherwise would make the largest number of storage accesses, therefore the largest number of over-the-network storage transactions. The result is that these portions of software run faster, and there are a smaller number of network transactions, albeit perhaps larger network messages.
The best candidate algorithms for running on the storage devices (close to the data) include search. All Big Data Analytics employ some type of search, so these are immensely accelerated. It does not matter whether Ethernet (TCP, IP), or some other connection mechanism such as FC (iSCSI), etc., all network (or network-like channel) connected storage interfaces and storage device types can be supported by the improved storage controller of this application acceleration invention, including future extensions thereof.
My prior patents introduced my Customizable Storage Controller (CSC) technology as an application of, and set of improvements to, my F+ Storage Firewall and Support System platform technologies, for the purposes of supporting multiple storage controller softwares, updating these as needed, and securely running applet portions of Enterprise and Cloud application software close to the corresponding stored (and protected) data. The CSC device and data security technologies relate to how the CSC protects itself (threat model, protection mechanisms).
The ability to host and support application software on compatible storage devices enables an important innovation, resulting in this present AppAccel invention, the acceleration of Big Data Analytics applications, and other important datacenter application software. This acceleration effect is partly provided by dramatically reducing the number of round trips between datacenter application servers and storage, by running storage-facing portions of the application software on the storage devices. Each storage access network round-trip has storage and network stack latency costs; many of these round trips can be eliminated by the invention disclosed below.
This disclosure describes how to accelerate Big Data Analytics and other applications that work across large datasets. The focus is on data centric (really dataset centric) algorithms, enhanced storage and application execution services, with much shorter elapsed time required to complete or achieve useful results on very large and exabyte scale datasets.
AppAccel Achieves the Performance Goal
The objective of the datacenter architect and others concerned with application performance should be to reduce the actual elapsed time required to search a possibly large number of possibly very large datasets, looking for correspondence with results of other searches.
The latency imposed by accessing storage over network links has to be reduced, and the cost of sequentially searching a very large number of storage devices has to be eliminated. The CSC permits the storage devices to be searched in parallel, with many fewer network round trips from the servers.
This speeds up application software in proportion to the size of (the magnitude) of the data and the extent to which the application's algorithm matches the paradigmatic analytics algorithm dissected in this disclosure specification.
So, the benefits include:
Reduced network traffic and next to data;
Object model for applet, so applets can be implemented as Bubbles!; and
Distributable among storage devices.
Object model for data by itself does not improve the performance of applications, but a model that provides for distributed data sets can grow (scale) better than bringing data adjacent to server blades (Clusters, in-Memory, etc.)
The present AppAccel invention reduces the effect of network congestion and latency by running query & search algorithms as close as possible to storage media. This permits Big Data Analytics algorithms, as provisioned applets (SCAPs), to run at local storage access performance, without network latency slowing the data access. This supports running server-hosted application software with high performance access to data.
Consider what a big data analytics algorithm is, under the covers; it is an analysis (compare and rank) algorithm acting on the results of non-SQL “joins” across heterogeneous data sets. Why non-SQL? Because these data sets don't fit into neat tables in a relational database.
These non-SQL joins are really the joins of the results of searches of these data sets, which may extend over many storage devices in possibly more than one storage pool.
Each of these non-SQL joins, therefore, requires a great many searches of the respective datasets; without the present AppAccel invention, these searches have a large number of storage access round-trips across network links.
In addition, the present AppAccel invention provides the facility for optimized search applets that are application aware and data set aware. This means that applets can be built so as to conform to the requirements and design of server-hosted applications, and also to be performance optimized for particular datasets.
If the structure of key analytics algorithms could be distilled into their essence, it might look something like this:
a) Rank or compare the elements in a set (list, array, table) of tuples (I to N components per tuple, no requirement the tuples be of equal length);
b) Where these tuples are the result of searches of one or more datasets, or of non-SQL joins, or of SQL joins;
c) Where non-SQL joins are tuples or tuple sets (aka tables) put together as synthetic, perhaps temporary, objects, as a result of searches of one or more datasets.
The key to all of this is the speed with which a possibly large number of possibly very large datasets can be searched, looking for correspondence with results of other searches.
Therefore, to accelerate Big Data Analytics, put storage close to, next to, or interleaved with processing (reduced network latency, etc.). Since another objective is to scale out the storage to accommodate the exa-scale and yotta-scale data deluge, the processor to storage relationships have to change.
The present AppAccel invention sees large performance improvements in areas that have significance such as search, big data analytics, and other server-hosted applications, by applying AppAccel to needle-in-haystack and other solutions whose underlying structure includes some sort of search, non-SQL join, rank or compare.
This is enabled by the benefits of the present AppAccel invention:
Eliminating a significant percentage of server-hosted application storage access requests to & from storage devices, therefore reducing a significant number of network round-trips, network & storage stack latency, etc.;
Reducing network congestion, consequently speeding up other server-hosted applications' access to data and other resources;
Improving SDS storage pool average response time as well as storage pool throughput
Operations next to Storage (Reduced Network Latency);
Parallelism;
Dataset aware, storage volume aware, aware of the larger application, efficient access protocol (ex: optimized search applet);
Support In-Situ Processing, and Rack-Scale Flash; and
Providing a significant server-hosted application performance improvement; depending on the design of the server-hosted application, application framework, algorithm, and data set, this performance gain may be as much as 100 times to 1000 times or more.
It is therefore an objective of the present AppAccel invention to provide a high performance solution to datacenter performance difficulties with Big Data Analytics and exa-scale datasets.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, so as to search the contents of that storage device in much less time.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, where the software is part of a larger application or application framework such as a Big Data Analytics application, so as to run data center applications in much less time.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, especially search software tailored to the contents of the storage device.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, especially search software tailored to the contents of the storage device, so as to search the contents of that storage device in even less time.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, especially search software tailored to the contents of the storage device, so as to discover results that might not have been discovered in any other way.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, especially search software tailored to the contents of the storage device, where the software is part of a larger application or application framework such as a Big Data Analytics application, so as to run data center applications in much less time.
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, in order to run a continual re-search of the storage device as data streams on to it and as data on it is modified by other means, so as to discover results that might not have been discovered in any other way
Another objective of the present invention to provide a system, method and apparatus to run search and other data access operations on a storage device, especially search software tailored to the contents of the storage device, in order to run a continual research of the storage device as data streams on to it and as data on it is modified by other means, so as to discover results that might not have been discovered in any other way.
This solution consists of an improved execution environment for Datacenter server-hosted applications with high storage access requirements.