Automated Generation of Animated 3D Facial Meshes
A Photogrammetry and Deformation Transfer-Based Model
Virtual reality (VR)'s recent, long expected breakthrough to the consumer market has enabled industry to deliver highly immersive experiences to consumers. However, creating photorealistic assets for such experiences is time consuming and requires a high level of expertise. To provide a remedy for this burden, this thesis presents a fully automated and validated model for the generation of animated 3D facial meshes, using photogrammetry and deformation transfer. Using a set of multi-camera photographs of a neutral face, we acquire facial geometry and appearance. Subsequent landmarking and ICP registration provide direct correspondence of the acquired facial mesh and an existing template. Then, using deformation transfer, we transfer existing blendshapes from template source to the obtained facial mesh. To assess the pipeline's effectiveness, we have conducted a set of user experiments.
The results show that the automatically produced facial meshes faithfully represent subjects and can convey highly believable emotions. Compared to manual modeling and animation, the pipeline is more than 100x faster and produces results compatible with industry standard game engines, without any manual post-processing. As such, this thesis introduces a unique, fully automated, photogrammetry and deformation transfer-induced model for the generation of animated 3D facial meshes. On the one hand, this model can be expected to have both significant commercial and scientific impact. On the other hand, the model allows sufficient space for future improvement and tailoring.
(a) Mesh acquisition. We obtain multi-camera view of a person with a camera rig. Using the resulting photos, we first create dense point cloud, and then the triangle mesh.
(b) Mesh cleaning. We simplify, clean, and compute UV-mapping.
(c) Mesh texturing. We obtain the color of the surface by blending intensities from each photograph.
(d) Correspondence construction. We calculate the correspondence between the acquired mesh and the existing fully animated template mesh by using automatic landmarking and ICP registration algorithm.
(e) Blendshape transfer. Finally, we transfer blendshape animations from template to the newly created mesh based on the constructed correspondence.
This paper makes a systemic contribution by establishing and describing a complete pipeline for the creation of a digital face model suitable for VR by using photogrammetry and automatic animation of this model by deformation transfer. Photogrammetry allows us to achieve photorealistic quality, while transferred animations bring full spectre of emotional facial expressions. The paper also makes scientific contribution by conducting user experiments evaluating the proposed method.
Using photogrammetry, we acquire facial geometry and appearance.
Using automatical landmarking and ICP registration algorithm, we obtain direct correspondence of the acquired facial mesh and the existing template. Then, using deformation transfer, we transfer existing blendshapes from the template to our new mesh. Note: the bottom row mesh in this image was not obtained by our setup.
Synthesized views and untextured models (every second row), for viewpoints different from the original camera images, across subjects of varying appearance. First column shows original photogrammetrically obtained meshes (neutral face), while three other columns show synthesized poses. The weights of blendshapes for poses of the same emotions are the same across subjects.
The model allows sufficient space for future improvement and tailoring:
- Photogrammetry result quality directly corresponds to the quality of capture photos. However, it is not clear how exactly, and without any strictly defined rules, other than just a number of "guidelines". Every photogrammetry subject type has different optimal capture setup, and finding the most optimal way to capture human face is an open problem.
- Instead of using 2.5D landmarking, one obvious improvement is to use 3D landmarking, for example with methods mentioned in section 3.7. Potentially, such methods can be more precise as they use not just intensity (color), but also shape information. 3D landmarks would also allow universal facial mesh normalization based on landmarks, instead of assumptions of camera positions or subject's pose.
- The current result of the pipeline is a single mesh, which includes eyes and hair (including facial) as a part of surface. Extracting eyes and hair, and being able to control them separately would significantly improve quality of the end result. An extensive research has been done on capture of the eyes by Berard et al. 2014 and the hair by Beeler et al. 2012. Both of methods require manual supervision, and it would be interesting future work to automate such separation.
- Face is one of the (if not the most) crucial parts of person's identity. However, personalized face by itself is not enough to believable represent real person in virtual space. Capturing and faithfully representing person's body is the next logical advancement for bringing humans into the virtual space.
We have presented a new technique for automatic creation of rigged facial meshes based on a set of multi-cameras photos of a neutral face. First, using photogrammetry, we obtain a mesh we acquire facial geometry and appearance. Using automatical landmarking and ICP registration algorithm, we obtain direct correspondence of the acquired facial mesh and the existing template. Then, using deformation transfer, we transfer existing blendshapes from the template to our new mesh.
Creating a rigged facial mesh ready for game engines is a highly manual process that requires work of a team with a dedicated skillset on the scale of man-weeks. Areas that require even more fidelity, such as visual effects production, require amount on the scale of man-months. The processing time of our workflow implementation, from image input to output of a rigged 3D model, takes around 10-20 minutes (depends on the number of transferred animations) on a high-end computer (Intel i7 3.1Ghz, 64Gb RAM, GeForce GTX 1080 Ti). Such efficiency brings improvement in speed compared to manual work in the order of two magnitudes. It is important to note that the current implementation has had no additional optimizations or parallelization, and we believe that it is possible to reduce compute time to even less.
The conducted user experiments suggest that the obtained rigged facial meshes:
- do not lose unique surface details of the subjects,
- are out of the box suitable for secondary characters without close up shots,
- fairly accurate represent the subject faces,
- can be used to confidently convey emotions.
Project supervisor: dr. dr. E. L. van den Broek, Utrecht University
Second examiner: prof. dr. R. C. Veltkamp, Utrecht University
Daily supervisor: dr. Quentin Avril, Technicolor R&D France
Associate daily supervisor: dr. Fabien Danieau, Technicolor R&D France