The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves may also correspond to implementations of the claimed technology.
Our world today is composed of the 1s and 0s that make up the binary code created by the streams of data flowing through every sector of the global economy. How much data is that? According to IBM™, 2.5 exabytes of data were created every day in 2012. That is 2.5 billion gigabytes of data in a single day. Facebook™ alone was responsible for 500,000 gigabytes a day in the same year. The importance of data is becoming so big, even the U.S. Government has launched an initiative, Data.gov, to help access and analyze it. The good news is that data processing and storage costs have decreased by a factor of more than 1,000 over the past decade. But once that data is stored, it is difficult to retrieve and use.
According to The Boston Consulting Group, one third of all bank data is never used. A big part of this is the fact that 75% of the data we generate is unstructured. It is randomly organized, difficult to index, and therefore difficult to retrieve. Where is all of this data coming from? An obvious source is the data that is being generated from legacy systems of record. It is data from cloud software as witnessed by the rapid adoption of Software as a Service (SaaS) as the new business application model. It is data being created every second from mobile phones, devices, and sensors that are being placed on just about everything that can be monitored in the physical world. And social media represents the largest data streams, which are being created in astronomical volumes.
Forget about texts, and think of all the photos and videos being uploaded via smartphones to popular services like YouTube™, Facebook™, Instagram™ and Twitter™. The smartphone is currently the major enabler of this data tsunami. PCs and feature phones (mobile phones that are not smartphones) are both in decline while smartphones are growing in the opposite direction, even in regions such as sub-Saharan Africa. And where there is a smartphone, there is an application for practically every human endeavor. Applications are the smartphone control point for all of the real-time data streams being created by our fingers, the camera, the motion sensor, GPS antenna, Bluetooth antenna, and gyroscope. Smartphone manufacturers continue to jam more sensors and capabilities into these devices while developers continue to build applications that delight us all.
According to The Economist, 50% of the adult population in 2015 owns a smartphone. That will grow to 80% in 2020. But as impressive as smartphones are, the biggest ripple is just forming. To use a term coined by Andreessen Horowitz, it is the “sensorification” of the physical world. The combination of cheap, connected, miniaturized computers and sensors will create a world of smart, connected products and industrial equipment. This new technology category is often called the “Internet of Things” (IoT). General Electric goes one step further, with the term “industrial internet”, to include things like jet engines, locomotives and MRI machines. The Internet of Things represents a major and transformational wave of IT innovation. The Harvard Business Review calls this the third wave of IT-driven competition, with the first two waves brought by mainframes and minicomputers, and the rise of the Internet. Needless to say, harnessing and analyzing these data streams will represent the biggest challenge IT and businesses will face over the next decade.
The apt term used to describe this massive volume of data is “Big Data.” For Big Data, traditional data storage technology is inadequate to deal with these large, high-speed volumes. And the challenges do not end there. Enterprises will also need to figure out how to not only capture this data, but how to search, analyze and visualize it as well as connect it with their business and customer data. The ultimate goal is the ability to perform predictive analytics and real-time intelligent decision-making. This is going to require an IT transformation from systems of record to systems of intelligence.
Before the advent of big data, the concept of business intelligence (BI) had already become a commonly used phrase back in the 1990s. A number of newly formed BI software vendors also entered the market at that time. BI provided the methods and tools required for the transformation of data into meaningful and useful information for the business. The functions of BI during this period were fairly basic, namely, to collect and organize the data and visualize it in a presentable way. Innovations continued and the introduction of data warehouses drastically reduced the time it took to access enterprise data from systems of record. Despite these innovations, a core challenge remains. Setting up these data warehouses requires deep expertise and using BI tools requires significant training. The mere mortals in the line of business still cannot use these tools in an accessible way. Most BI tools are pretty good at getting answers when you know ahead of time the questions you are asking. Sometimes you simply do not know what questions to ask. In short, these tools do not enable business users to obtain the insights when, how, and where they need them.
Fortunately, this is all changing. For the first time, data analytics tools are being built that are entirely designed and run in the cloud. There is no need for IT to provision hardware or install and configure the data platform. Performing all the associated integration and schema development has gone from months to days. This newfound agility has allowed innovation in technology to eliminate the traditional two-step service bureau model where every request from the line of business required IT's involvement. These innovations are paving the way for a democratization of data so that business users can not only get access to data but also participate in its analysis. This means a self-service model with direct access to answers without the need for analysts, data scientists or IT. Business users can find and share answers almost instantly. There is no hard requirement of needing to know ahead of time what questions to ask of the data. Business users can quickly bang out questions that allow them to explore and gain insights into the data sets. Furthermore, this democratization is powered by mobile. Using their smartphone, tablets, or wearables, workers can now gain access to data and answers to pressing business questions whenever and wherever they are. The democratization of data has become a necessary phase in the journey toward building systems of intelligence.
While the fruits of data democratization are plenty, the process itself mostly deals with empowering business users with access to and analysis of data from legacy systems of record and cloud-based business applications. At best, some of these new BI tools can provide near real-time access and analysis of data. But they are not engineered for capturing and analyzing actual real-time streams of data emanating from smartphones, wearables and the coming explosion of sensors in the physical world.
Real-time data streams deliver information that is quite different from the backward-looking, historical data most BI tools and platforms harness. Real-time data is perishable. That means it not only needs to be detected, it needs to be acted upon. The concept of “time to insight” emerges as one of the key performance indicators for systems of intelligence. These insights are going to require a whole new of level packaging and consumption. The information needs to be delivered in context, at the right time, and in a way that cuts through the cacophony of data we are exposed to in our daily work lives.
Systems of intelligence require knowing what to do with the data insights and how they should be delivered to the appropriate worker based on their job function and role inside the organization. These systems are every bit as democratic as modern BI tools in that they are easy to configure and get up and running. They are also designed to deal with the daily deluge of data we are confronted with every day at work. Consumer applications such as social media, traffic, and news aggregating applications help us more intelligently deal with the things that matter to us most.
The bar for applications connected to our systems of intelligence is as high as for consumer applications. This means one click installation, a lovely and simple user interface and accessibility via the mobile device of your choosing. The harnessing and analysis of real-time data streams begins to open up not only action in real time, but the ability to anticipate what is going to happen. This has traditionally been the realm of data scientists who handle everything from statistics and computational modeling to visualization and reporting. Models created by data scientists mostly look at past historical trends and use the data to predict patterns and future trends. Trying to build computational models that look at large volumes of real-time data streams presents a significant human resource challenge for enterprises.
The next step beyond this are the systems of intelligence that start to tell customers what questions they need to be asking. Getting there will require a blueprint for systems of intelligence. The source of data streams are the signals emanating in real-time from mobile devices such as smartphones and consumer wearables like the Fitbit™ and Apple Watch™. The control point for these signals is the application. The application is what puts context behind the raw data that gets created by human inputs and the sensors embedded in these devices. According to Wikipedia™, a sensor is a transducer whose purpose is to sense or detect some characteristic of its environs. It detects events or changes in quantities and provides a corresponding output, generally as an electrical or optical signal. Tying all of this together is the digital plumbing, or application programming interfaces (APIs). Along every critical element of the data stream flow represented in this schematic, APIs will enable this end to end transport of high speed and high volume data in the system. Although the term, API, may not be in the common vernacular outside of IT, it will be, much in the same way that terms of art to describe the web and internet are common language in business communication today.
The major gushers of data streams will be the connected consumer products and industrial equipment and machines. These real-time signals will emanate from product sensors inside our automobiles, inside our homes, on our valuables, our security systems, and anywhere in our physical environment that matters. Signals from the industrial Internet will emanate from sensors on any piece of equipment or machine that requires monitoring, maintenance and repair. Anything than can be digitally monitored with sensors in the physical environment will be. Systems of intelligence must be able to identify these signals and harness them.
In order to capture the high-volume and high-speed data signals, a “digital watchdog” is needed to monitor these signal inputs. If anything significant happens with these digital signals, an event is registered. A very simple example of an event is when a temperature sensor goes off in your automobile to warn you of freezing conditions outside. Systems of intelligence will require the technology to ingest and monitor these data streams. The events created by the digital signals get broadcasted via messages and moved through the system so that the digestion process can proceed as planned. This is where filters can begin their job of further analyzing these data streams. For the system to function properly, it must be able to handle growing volumes and increased speeds of data flow and must not be lost if there is a breakdown or crash in that system.
Once data is captured and processed, it moves along into the digestion phase. This is where some of the magic starts to happen. This includes the monitoring and analytical processing of real-time data streams. Once the data is analyzed and processed, it needs to be put somewhere. The data streams flowing in are not suitable for traditional database storage such as relational databases using structured query language. This requires specialized technology that can handle and store very large data sets, an essential element of systems of intelligence.
Currently, stream processing systems that desire to service billions of users are limited by the amount of scalability they can achieve. This limitation is caused primarily by the exhaustion of components that make up the stream processing systems. As a result, it is desired to identify techniques that inject efficiency in the operation of stream processing systems so that their components are not over-used and can deliver the level of super scalability expected from such systems.
Therefore, an opportunity arises to provide systems and methods that process big data for a super scaled stream processing systems in an efficient manner. Increased revenue, higher user retention, improved user engagement and experience may result.