Automated Generation of Animated 3D Facial Meshes
A Photogrammetry and Deformation Transfer-Based Model

Abstract

Virtual reality (VR)'s recent, long expected breakthrough to the consumer market has enabled industry to deliver highly immersive experiences to consumers. However, creating photorealistic assets for such experiences is time consuming and requires a high level of expertise. To provide a remedy for this burden, this thesis presents a fully automated and validated model for the generation of animated 3D facial meshes, using photogrammetry and deformation transfer. Using a set of multi-camera photographs of a neutral face, we acquire facial geometry and appearance. Subsequent landmarking and ICP registration provide direct correspondence of the acquired facial mesh and an existing template. Then, using deformation transfer, we transfer existing blendshapes from template source to the obtained facial mesh. To assess the pipeline's effectiveness, we have conducted a set of user experiments.
The results show that the automatically produced facial meshes faithfully represent subjects and can convey highly believable emotions. Compared to manual modeling and animation, the pipeline is more than 100x faster and produces results compatible with industry standard game engines, without any manual post-processing. As such, this thesis introduces a unique, fully automated, photogrammetry and deformation transfer-induced model for the generation of animated 3D facial meshes. On the one hand, this model can be expected to have both significant commercial and scientific impact. On the other hand, the model allows sufficient space for future improvement and tailoring.

Overview

(a) Mesh acquisition. We obtain multi-camera view of a person with a camera rig. Using the resulting photos, we first create dense point cloud, and then the triangle mesh.
(b) Mesh cleaning. We simplify, clean, and compute UV-mapping.
(c) Mesh texturing. We obtain the color of the surface by blending intensities from each photograph.
(d) Correspondence construction. We calculate the correspondence between the acquired mesh and the existing fully animated template mesh by using automatic landmarking and ICP registration algorithm.
(e) Blendshape transfer. Finally, we transfer blendshape animations from template to the newly created mesh based on the constructed correspondence.

This paper makes a systemic contribution by establishing and describing a complete pipeline for the creation of a digital face model suitable for VR by using photogrammetry and automatic animation of this model by deformation transfer. Photogrammetry allows us to achieve photorealistic quality, while transferred animations bring full spectre of emotional facial expressions. The paper also makes scientific contribution by conducting user experiments evaluating the proposed method.

Photogrammetry

Using photogrammetry, we acquire facial geometry and appearance.

Deformation Transfer

Using automatical landmarking and ICP registration algorithm, we obtain direct correspondence of the acquired facial mesh and the existing template. Then, using deformation transfer, we transfer existing blendshapes from the template to our new mesh. Note: the bottom row mesh in this image was not obtained by our setup.

Results

Synthesized views and untextured models (every second row), for viewpoints different from the original camera images, across subjects of varying appearance. First column shows original photogrammetrically obtained meshes (neutral face), while three other columns show synthesized poses. The weights of blendshapes for poses of the same emotions are the same across subjects.

Manual Posing

Future work

The model allows sufficient space for future improvement and tailoring:

Conclusions

We have presented a new technique for automatic creation of rigged facial meshes based on a set of multi-cameras photos of a neutral face. First, using photogrammetry, we obtain a mesh we acquire facial geometry and appearance. Using automatical landmarking and ICP registration algorithm, we obtain direct correspondence of the acquired facial mesh and the existing template. Then, using deformation transfer, we transfer existing blendshapes from the template to our new mesh.

Creating a rigged facial mesh ready for game engines is a highly manual process that requires work of a team with a dedicated skillset on the scale of man-weeks. Areas that require even more fidelity, such as visual effects production, require amount on the scale of man-months. The processing time of our workflow implementation, from image input to output of a rigged 3D model, takes around 10-20 minutes (depends on the number of transferred animations) on a high-end computer (Intel i7 3.1Ghz, 64Gb RAM, GeForce GTX 1080 Ti). Such efficiency brings improvement in speed compared to manual work in the order of two magnitudes. It is important to note that the current implementation has had no additional optimizations or parallelization, and we believe that it is possible to reduce compute time to even less.

The conducted user experiments suggest that the obtained rigged facial meshes:

To our knowledge, this is the first method for completely automatic generation of rigged facial meshes (without deep learning) currently described in the literature. A system of this type would be invaluable for generation of rigged facial datasets. One such database, collected with the workflow described in this study, is expected to be released with a separate publication later this year. We believe this database will be helpful for the scientific community, especially in machine learning research area.

Supervisors

Project supervisor: dr. dr. E. L. van den Broek, Utrecht University
Second examiner: prof. dr. R. C. Veltkamp, Utrecht University
Daily supervisor: dr. Quentin Avril, Technicolor R&D France
Associate daily supervisor: dr. Fabien Danieau, Technicolor R&D France