Currently robot-based object recognition and manipulation of indoor scene are based on vision method. But as the fast development of 3D sensing techniques and 3D model database, the autonomously robot recognition system is developing explosively. While 3D model is playing more and more important role on machine vision.
Normally, for a coarse object recognition such as desk and chair, only one view point is enough. But in huge 3D model database, fine-grained object recognition is challenging. In this case, one viewing point is not enough.
For the procedure of effective autonomous robot-based recognition, computing a sequence of viewing points is very critical. We call the problem as next-best-view, NBV for short. NBV estimation and object classification are inseparable, the aim is to minimize the uncertainty of classification while minimize the energy of viewing. Fine-grained object recognition need to classify the object and compute the NBV. One straight forward method is train an instance level classifier for the whole database. However fine-grained classification method doesn't work well when the types are too many. So the static information prediction method based on all concerted views and the order of computing views need extra work.
To solve the problems above, existing technology usually use volume to present the 3D model, and train a Convolutional Deep Belief Network (CDBN for short) to model the information of space and used for classifier. By sampling for the distribution, can complete the shape using depth image, while estimate the information gain by virtual scanning. But the method could not predict the NBV, and the method uses volume presentation to classify the shape while could not solve the classification of different fineness.