US 12,169,900 B2
Method and apparatus for three-dimensional reconstruction of a human head for rendering a human image
Taras Andreevich Khakhulin, Pyatigorsk (RU); Vanessa Valerievna Sklyarova, Moscow (RU); Victor Sergeevich Lempitsky, Moscow (RU); and Egor Olegovich Zakharov, Krasnogorsk (RU)
Assigned to SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed by SAMSUNG ELECTRONICS CO., LTD., Suwon-si (KR)
Filed on Nov. 15, 2022, as Appl. No. 17/987,586.
Application 17/987,586 is a continuation of application No. PCT/KR2022/015750, filed on Oct. 17, 2022.
Claims priority of application No. RU2021133083 (RU), filed on Nov. 15, 2021; and application No. RU2022107822 (RU), filed on Mar. 24, 2022.
Prior Publication US 2023/0154111 A1, May 18, 2023
Int. Cl. G06T 15/00 (2011.01); G06T 7/70 (2017.01); G06T 9/00 (2006.01); G06T 17/20 (2006.01); G06V 10/74 (2022.01); G06V 10/82 (2022.01); G06V 40/16 (2022.01)
CPC G06T 17/205 (2013.01) [G06T 7/70 (2017.01); G06T 9/00 (2013.01); G06V 10/761 (2022.01); G06V 10/82 (2022.01); G06V 40/174 (2022.01); G06T 2207/30201 (2013.01)] 11 Claims
OG exemplary drawing
 
1. A method for three-dimensional (3D)-reconstruction of a human head for rendering a human image, the method being performed by a device including at least one processor and at least one memory, the method comprising:
a) encoding, by using a first convolutional neural network, a single source image into a neural texture, the neural texture having a same spatial size as the single source image and a larger number of channels than the single source image, the neural texture containing local person-specific details;
b) estimating, by a pre-trained detailed expression capture and animation (DECA) system, a face shape, a facial expression, and a head pose by using the single source image and a target image, and providing an initial mesh as a set of faces and a set of initial vertices based on a result of the estimating;
c) providing a predicted mesh of a head mesh based on the initial mesh and the neural texture; and
d) rasterizing 3D reconstruction of a human head based on the predicted mesh, and rendering a human image based on a result of the rasterizing,
wherein the providing the predicted mesh comprises:
rendering the initial mesh into an xyz-coordinate texture;
concatenating the xyz-coordinate texture and the neural texture;
processing, by using a second neural network, a result of the concatenating into a latent geometry map;
bilinear sampling the latent geometry map by using texture coordinates to obtain a vertex-specific feature;
decoding the vertex-specific feature by a multi-layer perceptron for predicting a 3D offset for each vertex; and
adding the predicted 3D offset to the initial vertices to obtain the predicted mesh.