The chemical content and composition of substances can be analyzed by obtaining the spectral signature using spectroscopy, such as visible and infrared spectroscopy. Although it has been widely deployed in a variety of industrial settings, consumer level usage of spectral analyses is seldom used, mostly due to the lack of low cost portable spectrometer devices, and the lack of reliable data processing and modeling system. In recent years, miniature and low cost spectrometers have been made possible by the advancement in chipset technology. However, it still remains one of the leading challenges to build a solution and model to provide fast, reliable analysis of spectral data obtained by the spectrometer devices. In addition, it is necessary to overcome the challenges due to the heterogeneity of the samples to be analyzed. Particularly, there are three major problems longing to be resolved before a complete solution becomes available to consumers: 1. A well calibrated spectrometer device that provides reliable, reproducible spectral data of object of interest with low derivation; 2. A system to reliably record the objects to be analyzed and the area/location to be analyzed, this is due to the fact that many objects in real life are heterogeneous in chemical content and composition at different parts of the object; 3. A model that can handle all information, including spectral analysis data, environ-mental data (temperature, humidity, etc.), type of object analyzed, and location on the object analyzed, to establish a fast, yet reliable, prediction model to provide real time information and recommendations based on the user-end analysis.
Computer vision technology has matured and performed well in many visual perception tasks such as facial expression recognition and human motion recognition. Before the emerging of deep networks, people relied on hand crafted visual features which are fed into a traditional machine learning classifier such as random forest, support vector machines to predict object class or scene understanding. A variety of ways of finding image features and descriptions were proposed that push forward the field of computer vision, such as scale-invariant feature transform (SIFT) and Histogram of oriented gradients. However, traditional computer vision methods are trained using predefined features which might not be the optimal features for the prediction task. Also, with these traditional computer vision methods, it is not always possible to select the optimal parameters of the image processing for all cases of lightning and shadows. To solve the above problems, others have proposed different variants of deep convolution neural networks from the earliest LeNet until the recent ResNet. In general, most of architectures are composed of three core building blocks for image feature extraction purposes: a convolutional layer, a RELU layer and a Maxpooling layer. Each filter in the convolutional lay will output an image local spatial feature map. It is common to periodically insert a pooling layer in-between successive convolutional layers, which progressively reduce the spatial size of the representation to reduce the number of parameters. Along with the output layer(e.g., a fully connected feed forward or softmax layer), the entire convolutional network outperforms on broad image classification and understanding problems.
Visible and infrared spectroscopy is a powerful technique to obtain information of the chemical content and composition in substance by accessing its molecular vibrations. Infrared absorbance/reflectance spectroscopy and Raman spectroscopy can also be used to collect valuable data of objects. Upon being exposed to light (electromagnetic radiation), molecules within the object of interest interact with the electromagnetic fields, leaving a molecular signature in the reflected or transmitted light. The interaction is selective for different molecules and different energies (i.e. wavelengths) of the electromagnetic radiation, hence the molecular signature of the object of interest can be mapped as a spectrum, where the response of object to the electromagnetic radiation is recorded as a function of the wavelength or wavenumber of the electromagnetic radiation, hence a spectrum. Many spectrometers are available on the market for a variety of industrial applications, a few are newly introduced to the consumer market for daily applications. However, it is critical to have a working model for interpreting or analyzing the spectral data, since spectral data without a reliable model are of limited value.
Current state-of-the-art artificial intelligence techniques (deep neural networks) scale up the learning capacity of the traditional machine learning techniques, which allows one to find hidden and useful patterns in massive datasets. These achievements are mainly from advances in feature representation learning and neural network component and structure design. In terms of representation learning, in the present invention we have trained an end-to-end system and push the model to learn the most informative features corresponding to the prediction tasks instead of extracting predefined handcrafted features. Computer vision is one of the most salient and success use cases. In terms of neural network component and structure design, a variety of networks have been proposed to solve different problems. For instance, variants of convolutional neural networks are used to learn the complex spatial structure of data. Variants of recurrent neural networks are used to model the long-term temporal dependency of data. With a better feature, representation and customized learning capacity, the above models outperform in most prediction and regression tasks. On the other hand, combined with classic learning models such as active learning, reinforcement learning, deep learning models also show capabilities for adaptation and personalization in a variety of AI-level tasks. With massive datasets, there might be an explosive number of potential states in reinforcement learning.