One important category of pertinent techniques proposed in recent literature for efficiently acquiring and managing data from sensors and other internet sources that have received significant attention are the model-based techniques. These techniques use mathematical models for solving various problems pertaining to data acquisition and management. Obtaining values from a sensor requires high amount of energy. Since most sensors are battery-powered, they have limited energy resources. To minimize the number of samples obtained from the sensors, models are used for selecting sensors, such that user queries can be answered with reasonable accuracy using the data acquired from the selected sensors. Another energy-intensive task is to communicate the sensed values to the base station. Several model-based techniques exist for reducing the communication cost and maintaining the accuracy of the sensed values.
Machine Learning plays an important role in a wide range of critical applications with large volumes of data, such as data mining, natural language processing, image recognition, voice recognition and many other intelligent systems. There are some basic common threads about the definition of Machine Learning. Machine Learning is defined as the field of study that gives computers the ability to learn without being explicitly programmed. For example, predicting traffic patterns at a busy intersection, it is possible to run through a machine learning algorithm with data about past traffic patterns. The program can correctly predict future traffic patterns if it learned correctly from past patterns.
There are different ways an algorithm can model a problem based on its interaction with the experience, environment or input data. The machine learning algorithms are categorized so that it helps to think about the roles of the input data and the model preparation process leading to correct selection of the most appropriate category for a problem to get the best result. Known categories are supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning.
(a) In supervised learning category, input data is called training data and has a known label or result. A model is prepared through a training process where it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data. Example problems are classification and regression.
(b) In unsupervised learning category, input data is not labelled and does not have a known result. A model is prepared by deducing structures present in the input data. Example problems are association rule learning and clustering. An example algorithm is k-means clustering.
(c) Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Researchers found that unlabeled data, when used in conjunction with a small amount of labeled data may produce considerable improvement in learning accuracy.
(d) Reinforcement learning is another category which differs from standard supervised learning in that correct input/output pairs are never presented. Further, there is a focus on on-line performance, which involves finding a balance between exploration for new knowledge and exploitation of current knowledge already discovered.
Machine learning and statistics are closely related. The ideas of machine learning have roots in statistics starting from theoretical understandings down to the methods of implementations. Some researchers have already adopted methods called statistical learning to correctly represent machine learning.
Certain machine learning techniques are widely used and are as follows: (1) Decision tree learning, (2) Association rule learning, (3) Artificial neural networks, (4) Inductive logic programming, (5) Support vector machines, (6) Clustering, (7) Bayesian networks, (8) Reinforcement learning, (9) Representation learning, and (10) Genetic algorithms.
The learning processes in machine learning algorithms are generalizations from past experiences. After having experienced a learning data set, the generalization process is the ability of a machine learning algorithm to accurately execute on new examples and tasks. The learner needs to build a general model about a problem space enabling a machine learning algorithm to produce sufficiently accurate predictions in future cases. The training examples come from some generally unknown probability distribution.
In theoretical computer science, computational learning theory performs computational analysis of machine learning algorithms and their performance. The training data set is limited in size and may not capture all forms of distributions in future data sets. The performance is represented by probabilistic bounds. Errors in generalization are quantified by bias-variance decompositions. The time complexity and feasibility of learning in computational learning theory describes a computation to be feasible if it is done in polynomial time. Positive results are determined and classified when a certain class of functions can be learned in polynomial time whereas negative results are determined and classified when learning cannot be done in polynomial time.
PAC (Probably Approximately Correct) learning is a framework for mathematical analysis of machine learning theory. The basic idea of PAC learning is that a really bad hypothesis can be easy to identify. A bad hypothesis will err on one of the training examples with high probability. A consistent hypothesis will be probably approximately correct. If there are more training examples, then the probability of “approximately correct” becomes much higher. The theory investigates questions about (a) sample complexity: how many training examples are needed to learn a successful hypothesis, (b) computational complexity: how much computational effort is needed to learn a successful hypothesis, and finally (c) bounds for mistakes: how many training examples will the learner misclassify before converging to a successful hypothesis.
Mathematically, let (1) X be the set of all possible examples, (2) D be the probability distribution over X from which observed instances are drawn, (3) C be the set of all possible concepts c, where c: X→{0.1}, and (4) H be the set of possible hypothesis considered by a learner, H⊆C. The true error of hypothesis h, with respect to the target concept c and observation distribution D is the probability P that h will misclassify an instance drawn according to D:errorD≡Px∈D[c(x)≠h(x)]
The error should be zero in the ideal case. A concept class C is “PAC learnable” by a hypothesis class H if and only if there exists a learning algorithm L such that given any target concept c in C, any target distribution D over the possible examples X, and any pair of real numbers 0<ε, δ<1, L takes as input a training set of m examples drawn according to D, where the size of m is bounded above by a polynomial in 1/ε and 1/δ and outputs an hypothesis h in H about which it is true with confidence (probability over all possible choices of the training set) greater than 1−δ, then the error of the hypothesis is less than ε.errorD≡Px∈D[c(x)≠h(x)]≤ε
A hypothesis is consistent with the training data if it returns the correct classification for every example presented it. A consistent learner returns only hypotheses that are consistent with the training data. Given a consistent learner, the number of examples sufficient to assure that any hypothesis will be probably (with probability (1−δ)) approximately (within error ε) correct is
  m  ≥            1      ɛ        ⁢                  (                              ln            ⁢                                        H                                              +                      ln            ⁡                          (                              1                δ                            )                                      )            .      
Calculus is an important branch of mathematics not considered so far as one of the building blocks of machine learning techniques. Calculus is used in every branch of physical science, actuarial science, computer science, statistics, engineering, economics, business, medicine, demography, meteorology, epidemiology and in other fields wherever there is a need to mathematically model a problem to derive an optimal solution. It allows one to go from (non-constant) rates of change to the total change or vice versa. A mathematical model represented in calculus for a large data set can very well represent a hypothesis with very low error (ε) in machine learning. A complex hypothesis is possible to be constructed with one or more part(s) being represented in calculus based model(s). This way of building complex hypothesis for machine learning can lead to powerful techniques for probably approximately correct (PAC) learning with very low error bounds for hypothesis.
The fundamental theorem of calculus states that differentiation and integration are inverse operations. More precisely, it relates the values of anti-derivatives to definite integrals. Because it is usually easier to compute an anti-derivative than to apply the definition of a definite integral, the fundamental theorem of calculus provides a practical way of computing definite integrals. It can also be interpreted as a precise statement of the fact that differentiation is the inverse of integration. In machine learning, if a hypothesis involves model(s) represented in calculus then there must be complementing processes of differentiation and integration involved in the overall learning processes.
Calculus based mathematical models can be used as part of a hypothesis for machine learning over a wide variety of data sets derived from devices such as heart monitoring implants, biochip transponders on farm animals, electric clams in coastal waters, automobiles with built-in sensors, smart homes, smart cities or airplanes with sensors. These devices or sensors used inside physical, biological or environmental systems collect large volumes of data that follows mathematical models based on both calculus and statistics. Efficient machine learning algorithms for such data sets can use hypothesis based on mathematical models involving both calculus and statistics.
When calculus based computational model is used as part of hypothesis in machine learning, the bounds for error and computational complexity are reduced by many orders of magnitude in “PAC learnable” classes of problems.
One important example domain of application for such machine learning algorithms is Smart Home where data is collected from sensors for analysis and automation. Smart Home automation applications control temperature, humidity, light and many other things automatically for homes with sensors. The heat equation is a parabolic partial differential equation that describes the distribution of heat (or variation in temperature) in a given region over time. Heat equation can be the starting point to characterize the temperature rise and fall at various locations in a Smart Home. In calculus, temperature is a function u(x,y,z,t) of three spatial variables (x,y,z) and the time variable t. The heat equation for temperature is represented as a combination over derivatives and partial derivatives in calculus:
            du      dt        =          α      ⁡              [                                                            d                2                            ⁢              u                                      dx              2                                +                                                    d                2                            ⁢              u                                      dy              2                                +                                                    d                2                            ⁢              u                                      dz              2                                      ]              ,
Where u(x,y,z,t) is temperature function and α is a constant of proportionality.
An example statistical machine learning algorithm for analyzing sensor data sets collected from Smart Homes is K-means clustering. The K-means clustering is a method popular for cluster analysis in statistical machine learning. This statistical technique can be efficiently used for Smart Home applications to partition sensor data based on location, time or any other dimension. The k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Given a set of observations (x1, x2, . . . , xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k(≤n) sets S={S1, S2, . . . , Sk} so as to minimize the within-cluster sum of squares (WCSS).
Its objective is to find
            arg      ⁢                          ⁢      min        s    ⁢            ∑              i        =        1            k        ⁢                  ∑                  x          ∈                      s            i                              ⁢                                              x            -                          μ              i                                                2            
Where μi is the mean of points in si.
K-means clustering algorithm uses an iterative refinement technique. Starting with an initial set of k mean values, m1(1), . . . , mk(1) the algorithm proceeds by alternating between two steps:
(1) Assignment step: Assign each observation to the cluster whose mean yields the least within-cluster sum of squares (WCSS). Since the sum of squares is the squared Euclidean distance, this is intuitively the “nearest” mean. In mathematics, this means partitioning the observations according to a Voronoi diagram generated by the means where a Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane.Si(t)={xp:∥xp−mi(t)∥2≤∥xp−mj(t)∥2∀j,1≤j≤k}, 
Where each xp is assigned to exactly one Si(t), even if it could be assigned to two or more of them.
(2) Update step: Calculate the new means to be the centroids of the observations in the new clusters.
      m    i          (              t        +        1            )        =            1                                S          i                      (            t            )                                        ⁢                  ∑                              x            j                    ∈                      S            i                          (              t              )                                          ⁢              x        j            
Since the arithmetic mean is a least-squares estimator, this also minimizes the within-cluster sum of squares (WCSS) objective.
The algorithm converges when the assignments no longer change. Since both steps optimize the WCSS objective, and there only exists a finite number of such partitioning, the algorithm must converge to a (local) optimum. There is no guarantee that the global optimum is found using this algorithm.
The algorithm is often presented as assigning objects to the nearest cluster by distance. The standard algorithm aims at minimizing the WCSS objective, and thus assigns by “least sum of squares”, which is exactly equivalent to assigning by the smallest Euclidean distance. Using a different distance function other than (squared) Euclidean distance may stop the algorithm from converging.
Computational complexity of K-means clustering algorithm is non-polynomial in nature. If the number of clusters k and d (the dimension) are fixed, the problem can be exactly solved in time complexity represented by the expression O(ndk+1 log n), where n is the number of entities to be clustered. For large number of sensor data records n with d attributes in each record, the time taken to compute k clusters will be very large if k or d or both are large quantities. The time complexity can be reduced if data sets can be normalized and partitioned for parallel executions of K-means clustering algorithm.
Database normalization is the process of organizing the fields and tables of a record oriented relational database to minimize redundancy. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database using the defined relationships. De-normalization is also used to improve performance. De-normalization may also be used when no changes are to be made to the data and a swift response is crucial.
The present invention is fundamentally different from current practices by the application of iterative process of machine learning and normalization on data sets to reduce redundancy and to increase performance by querying normalized data tables. Data normalizations are done after successive steps of computations based on calculus and statistics respectively. This invention uniquely addresses the needs for industrially viable machine learning technology and analytical systems for extremely large data sets from sensors and other internet data sources by combining techniques of both calculus and statistics based mathematical models along with parallel computations on query results from normalized data sets in relational databases to reduce error and to improve performance by many orders of magnitude.