As a recognition system that performs recognition processing on the basis of a sensor value, a recognition system that acquires a feature from a sensor value with a machine learning model, such as a neural network, to perform recognition, has been known (e.g., Iasonas Kokkinos, “UberNet: Training a ‘Universal’ Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory”, arXiv:1609.02132v1 [cs.CV], 7 Sep. 2016). The recognition system is applied to vehicle driving control, such as vehicle automated drive or driver assistance.
The vehicle driving control includes: acquiring, for example, an image of a camera or a detected value of a millimeter wave radar, as a sensor value; inputting the sensor value into a machine learning model, such as a neural network, to acquire a result such as sign identification, pedestrian detection, or white-line detection; and controlling the vehicle on the basis of the recognition result (namely, the output of the machine learning model). Examples of the control of the vehicle include speed limiting based on the result of the sign identification, autonomous emergency braking based on the result of the pedestrian detection, and lane keeping based on the result of the white-line detection.
For example, recognition processing is performed to an image (shot image) shot and acquired by the camera, for any of the sign recognition, the pedestrian detection, and the white-line detection. Inputting the shot image into a learned neural network configured for the sign identification, acquires the result of the sign identification; inputting the shot image into a learned neural network configured for the pedestrian detection, acquires the result of the pedestrian detection; and inputting the shot image into a learned neural network configured for the white-line detection, acquires the result of the white-line detection.
A large amount of learning data are required in order to acquire a high-precision neural network, and additionally a long computing time is required for the neural network to learn a large amount of prepared learning data. Conventionally, it is necessary to prepare a plurality of such neural networks for recognition tasks (e.g., sign identification, pedestrian detection, and white-line detection), and thus development costs (including expense costs, time costs, and work burden costs) are large and updating costs for updating the recognition system are large.
Particularly, in a system of performing a large number of recognition tasks to acquire a vehicular control value as for a vehicle that performs automated drive and driver assistance, for example, the same shot image is used for the plurality of recognition tasks and similar feature extraction computing is performed to the shot image in each of the recognition tasks, but an independent neural network is prepared for each of the recognition tasks.
An object of the present disclosure is to reduce development costs in a recognition system that performs recognition with a neural network that receives a sensor value as an input.
A recognition system according to one aspect of the present disclosure, includes: a sensing unit configured to perform sensing to output a sensor value; a task-specific unit including a first recognition processing part that performs a first recognition task based on the sensor value and a second recognition processing part that performs a second recognition task based on the sensor value; and a generic-feature extraction unit including a generic neural network disposed between the sensing unit and the task-specific unit, the generic neural network being configured to receive the sensor value as an input to extract a generic feature to be input in common into the first recognition processing part and the second recognition processing part.
With this configuration, the extraction of the generic feature to be used in common between the first recognition task and the second recognition task, can be performed with the generic neural network, and thus development costs can be reduced.
The generic-feature extraction unit may be provided on a semiconductor chip different from a semiconductor chip on which the task-specific unit is provided.
With this configuration, the generic-feature extraction unit and the task-specific unit can be separately developed and thus development management costs can be reduced.
The generic neural network in the generic-feature extraction unit may include hardware on the semiconductor chip.
With this configuration, the extraction of the generic feature in the generic-feature extraction unit can be performed with low power consumption and a low heating value at high speed. Note that, when the generic neural network includes the hardware, there is an increase in cost in updating the generic neural network, but implementation of each of the parts in the task-specific unit with software and updating of each of the parts in the task-specific unit allow the recognition system to be updated with inhibition in cost.
The first recognition processing part may include a neural network for the first recognition task that receives, as an input, the generic feature output from the generic-feature extraction unit to output a result of the first recognition task.
With this configuration, the recognition result of the first recognition task can be acquired from the sensor value with the generic neural network and the neural network for the first recognition task in series.
The sensing unit may include a sensor that acquires the sensor value and a preprocessing part that performs preprocessing to the sensor value.
With this configuration, the preprocessing part can acquire the sensor value to be appropriately input into the generic neural network.
The generic-feature extraction unit may include a discrete device that resolves the input to each layer of the generic neural network, into integer bases.
With this configuration, the extraction of the generic feature can be performed at high speed.
The generic neural network may have an integer weight.
With this configuration, the extraction of the generic feature can be also performed at high speed.
The generic-feature extraction unit may include a discrete device that resolves the input to each layer of the generic neural network, into integer bases, and the generic neural network may retain a weight discretized having binary numbers or ternary numbers, the generic neural network being configured to perform the entirety or part of internal computing with a logic operation, the generic neural network being configured to transform a result of the logic operation with a non-linear activating function, the generic neural network being configured to give a result of the transformation to a next layer.
With this configuration, the extraction of the generic feature can be also performed at high speed.
The generic-feature extraction unit may include a communication module or may be connected to the communication module, the generic-feature extraction unit being configured to update the weight of the generic neural network, based on information received by the communication module.
With this configuration, the weight of the generic neural network in the generic-feature extraction unit can be remotely updated in communication.
A generic-feature extraction unit according to one aspect of the present disclosure, includes: a generic neural network disposed between a sensing unit and a task-specific unit, the sensing unit being configured to perform sensing to output a sensor value, the task-specific unit including a first recognition processing part that performs a first recognition task based on the sensor value and a second recognition processing part that performs a second recognition task based on the sensor value, the generic neural network being configured to receive the sensor value as an input to extract a generic feature to be used in common between the first recognition processing part and the second recognition processing part.
With this configuration, the generic feature to be used for the first recognition task and the second recognition task can be calculated with the generic neural network, and thus the scale of a computing device as the entire system can be reduced and development costs can be reduced in comparison to a configuration in which the first recognition task and the second recognition task can be performed without the generic feature between the tasks.
A recognition system configuration method of configuring the recognition system, according to one aspect of the present disclosure, includes: causing the generic neural network to learn with, as learning data sets, input data and output data of a learned recognition device that performs the first recognition task and input data and output data of a learned recognition device that performs the second recognition task.
A large amount of learning data set is required for the generic neural network learning (specifically, for determining a weight parameter). According to the present recognition system, the generic-feature extraction unit performs at least part of the feature extraction, and the task-specific unit outputs the respective recognition results of the first recognition task and the second recognition task. Thus, a learning data set for the first recognition task and a learning data set for the second recognition task are required in order to cause the generic neural network to learn. It may be difficult to prepare a learning data set depending on the type of a recognition task, but according to the configuration, the input data and the output data of the learned recognition device that performs the first recognition task and the input data and the output data of the learned recognition device that performs the second recognition task, are used as the learning data sets. Such a learned recognition device can be easily available, so that a learning data set can be easily acquired for causing the generic neural network to learn. In this manner, the use of the learning data set for the first recognition task and the learning data set of the second recognition task allows of end-to-end learning including the generic neural network, for the recognition tasks, so that the generic neural network can learn to adapt to both of the first recognition task and the second recognition task.
An ensemble recognition device that unifies recognition results of a plurality of recognition devices to acquire the output data, may be used as each of the recognition devices.
With this arrangement, a higher-precision learning data set can be acquired.
A recognition system configuration method of configuring the recognition system, according to one aspect of the present disclosure, includes: causing the neural network for the first recognition task to learn with, as a learning data set, input data and output data of a learned recognition device that performs the first recognition task.
With this configuration, the learning data set for the neural network for the first recognition task is easily acquired.
An ensemble recognition device that unifies recognition results of a plurality of recognition devices to acquire the output data, may be used as the recognition device.
With this configuration, a higher-precision learning data set can be acquired.
A recognition system configuration method of configuring the recognition system, according to one aspect of the present disclosure, includes: changing a structure of the generic neural network to cause a relationship between the input into the generic neural network and the output from the neural network for the first recognition task, to further approximate to a relationship between an input and an output of a learned recognition device that performs the first recognition task, and to cause a relationship between the input into the generic neural network and an output from a neural network for the second recognition task, to further approximate to a relationship between an input and an output of a learned recognition device that performs the second recognition task.
With this configuration, the structure of the generic neural network is changed such that the approximation to the existing learned recognition devices is made.
A recognition system configuration method of configuring the recognition system, according to one aspect of the present disclosure, includes: changing a structure of the neural network for the first recognition task, to cause a relationship between the input into the generic neural network and the output from the neural network for the first recognition task, to further approximate to a relationship between an input and an output of a learned recognition device that performs the first recognition task.
With this configuration, the structure of the neural network for the first recognition task is changed such that the approximation to the existing learned recognition device is made.
According to the present disclosure, the extraction of the generic feature to be used in common for the first recognition task and the second recognition task can be performed with the generic neural network, so that the development costs can be reduced.
The foregoing and other objects, features, aspects and advantages of the exemplary embodiments will become more apparent from the following detailed description of the exemplary embodiments when taken in conjunction with the accompanying drawings.