Although stereophotogrammetry is increasingly popular for 3-dimensional face scanning, commercial solutions remain quite expensive, limiting its accessibility. We propose a more affordable, custom-built photogrammetry setup (Stereo-Face 3D, SF3D) and evaluate its variability within and between systems.
Twenty-nine subjects and a mannequin head were imaged 3 times using SF3D and a commercially available system. An anthropometric mask was mapped viscoelastically onto the reconstructed meshes using MeshMonk ( github.com/TheWebMonks/meshmonk ). Within systems, shape variability was determined by calculating the root-mean-square error (RMSE) of the Procrustes distance between each of the subject’s 3 scans and the subject’s ground truth (calculated by averaging the mappings after a nonscaled generalized Procrustes superimposition). Intersystem variability was determined by similarly comparing the ground truth mappings of both systems. Two-factor Procrustes analysis of variance was used to partition the intersystem shape variability to understand the source of the discrepancies between the facial shapes acquired by both systems.
The RMSEs of the within-system shape variability for 3dMDFace and SF3D were 0.52 ± 0.07 mm and 0.44 ± 0.16 mm, respectively. The corresponding values for the mannequin head were 0.42 ± 0.02 mm and 0.29 ± 0.03 mm, respectively. The between-systems RMSE was 1.6 ± 0.34 mm for the study group and 1.38 mm for the mannequin head. A 2-factor analysis indicated that variability attributable to the system was expressed mainly at the upper eyelids, nasal tip and alae, and chin areas.
The variability values of the custom-built setup presented here were competitive to a state-of-the-art commercial system at a more affordable level of investment.
A custom-built photogrammetry system for 3-dimensional facial capture is presented.
SF3D’s vertex count was 10-fold higher compared with 3dMDFace.
SF3D was slightly more precise and also slightly more variable than 3dMDFace.
SF3D’s overall performance matched and, at times, surpassed that of 3dMDFace.
In orthodontics, facial esthetics has traditionally been scrutinized using the lateral cephalogram’s profile outline combined with classic 2-dimensional (2D) facial photographs. With the introduction of cutting edge 3-dimensional (3D) imaging techniques into the orthodontic and/or craniofacial diagnostic toolset, (structured light) photogrammetry setups and 3D facial images derived from full-size cone-beam computed tomography (CBCT) exposures have increasingly been adopted for this purpose. Three-dimensional CBCT does carry an increased radiation burden compared with traditional 2D radiology, especially if image quality is of primary concern. Combined with the ALARA principle, , this has so far precluded its use as a de facto imaging solution for orthodontic diagnosis, at least in Europe. In addition, the restraints required to immobilize the patient’s head during image capture using CBCT may potentially obscure facial regions of interest, such as the forehead and/or chin area. Combined with ethical objections associated with repeatedly exposing patients to ionizing radiation for growth-monitoring purposes, treatment follow-up, or outcome assessment, this entails that there might be a bright future for nonionizing methods for diagnosing facial esthetics, growth, or treatment change, such as photogrammetry.
Several studies report on the accuracy and reliability of various commercially available photogrammetry solutions applied in an orthodontic and/or craniofacial setting. These include both active stereophotogrammetry solutions (which illuminate the patient’s face with invisible structured-light patterns to provide the features required for interpreting and reconstructing the face’s 3D geometry) from manufacturers such as AxisThree, and passive ones (which reconstruct the scene directly from visual cues present in the acquired image) from Canfield Imaging Systems (eg, VECTRA; Fairfield, NJ) and Dimensional Imaging (DI3D). , , Hybrid stereophotogrammetry solutions, which combine both techniques to achieve an optimal result, have also been presented by 3dMD. Even low-cost solutions, such as David’s SLS-2, Fuel3D’s Scanify, , and Microsoft’s Kinect, have been investigated.
Interestingly, some of the aforementioned studies use direct anthropometry (ie, caliper and measuring tape) as the “gold standard,” notwithstanding the notable variability of the latter approach. , , Aside from the general sparsity of the human face in terms of clearly definable landmarks, direct anthropometry is additionally hampered by skin compressibility and slight changes in facial expression. Some studies attempt to minimize the effects of landmark identification error by prelabeling the facial surfaces. Both tissue compressibility and facial pose variation can be circumvented by performing measurements on a mannequin head, , , or on plaster casts. , Other studies replace direct anthropometric measurements with (repeated) digital ones, make use electromagnetic digitizers, coordinate-measuring machines, or other stereophotogrammetry devices, , , , which are the “gold standard” for comparison.
A problem not adequately addressed by studies is the very feature-sparse nature of the human face, which entails broad regions having few landmarks at which accuracy and reliability can be gauged. One relatively straightforward solution might be to use elastic deformation of a standard anthropometric mask to densely sample and model the entire facial surface with a very large number of landmarks which, by the elastic deformation, are effectively homologous.
In general, cutting edge technologies such as 3D facial capture typically demand a significant premium over traditional 2D methods. Budgetary constraints often deprive orthodontic departments and private practices alike from access to these technologies. Aside from the cost, 1 of the major impediments to the democratization of this technology has traditionally been the imposing complexity of the photogrammetric algorithms involved in reconstructing the 3D scene. This limitation has changed somewhat with the introduction of relatively affordable and high-quality multibase photogrammetry software. In the latter, image reconstruction proceeds from a relatively large set of images taken from multiple but slightly different viewpoints. This approach was made possible because of relatively recent, fast-paced innovations in the field combined with the ever-increasing availability of relatively low-cost computational power. Taken together, this prompts the question of whether it would be possible to design, build, and test a custom-built photogrammetry-based setup for 3D facial capture. The current study aimed to assess and report the accuracy of this technology.
Material and methods
The Stereo-face3D (SF3D) system, custom-built by the first author (H.L.L.W.), consists of 14 Canon EOS 1200D digital single-lens reflex cameras with Canon 18-55 mm EF-S lenses (Canon, Tokyo, Japan) mounted on a square aluminum frame (measuring 1 × 1 m), assembled from industry-standard system profiles (45 × 45 mm cross-section with a 10 mm slot) (Motedis, Ensdorf, Germany) ( Fig 1 ). The frame is attached to a similarly constructed, wheel-mounted support assembly (dimensions, 1.12 m [width] × 1.7 m [height] × 0.7 m [length]), which provides a working surface for the control switches and laptop, and houses the power supplies, universal serial bus (USB) hubs and electronics ( Figs 1 and 2 ). Two height-adjustable Bosch-Rexroth lifts (Bosch Rexroth, Lohr am Main, Germany) allow the frame’s height to be adjusted over a distance of 0.4 m.
The cameras are mounted in a hemispherical arrangement to accommodate the human facial form better ( Fig 1 , A ). This arrangement is accomplished both by applying a mild inset to the 4 central cameras (ie, positioning them slightly out of the plane and further away from the patient) ( Fig 2 ; blue arrows ), as well as by positioning the outermost cameras in a more forward manner as a result of their inward rotation around the 28-mm round aluminum struts they are connected to ( Fig 1 , A ). Adjustable camera mounts allow for precise control over the camera positions and angles (Multi-mount 6, Vanguard, Guangdong, China).
Aside from the cameras, the frame also supports 3 remote-controlled, high CRI, 5600K light-emitting diode (LED) panels (Godox, Shenzhen, China), which provide uniform, shadow-free illumination. These are located on the frame’s upper left and right corners (2 × Godox LED500LW) and on its lower-middle section (1 × Godox LED308W) ( Figs 1 , A and 2 ). These panels are low-weight, dimmable, flicker-free, and do not generate heat. In addition, the panels are powered by a separate Meanwell HRPG-150-15 enclosed power supply unit (15 V, 10 A) (MeanWell, Guangzhou, China). Patient positioning is facilitated by 2 diode line-lasers (Picotronic, Koblenz, Germany) on either side of a 20 × 20 cm mirror, positioned approximately in the center of the frame ( Figs 1 , A and 2 ). Furthermore, 3 eye-safe 660-nm random pattern lasers (dot) provide additional texture to the relatively feature-sparse human face (SL-660-S-C; Osela, Lachine, Canada), mounted slightly obliquely from the upper left and right corners, and centrally from below ( Figs 1 , A and 2 ).
Removing cameras from the setup to replace depleted batteries is highly undesirable in a carefully calibrated setup. This issue is avoided by using Canon DC-10 DC couplers (Canon, Tokyo, Japan), which, in turn, are fed directly from a power supply (SP-320-7.5 V power supply unit [40 A]; Mean Well, Guangzhou, China). Apart from the power cable, each camera requires 1 USB cable for image transfer and one 2.5-mm jack cable for camera focusing and triggering. The USB cables are connected to two 7-port industrial USB hubs, which are powered by the same power supply unit. The same power supply also feeds the electronics (after appropriate downregulation of the voltage), which consists of an Arduino Uno microcontroller ( www.arduino.cc/ ) along with pushbuttons and relays for controlling the positioning of cameras and random pattern lasers, and 8-bit shift registers combined with optocouplers for focusing and triggering the cameras ( Fig 1 , B ). The electronic components were soldered on 3 Adafruit Perma-Proto full-size printed circuit boards (Adafruit, New York).
Camera settings and the image retrieval process are controlled using Smart Shooter 3 GRID software ( kuvacode.com/smart-shooter ), whereas the imported images are reconstructed using 3DFlow’s 3DF Zephyr PRO (multibase) stereophotogrammetry software ( www.3dflow.net/3df-zephyr-pro-3d-models-from-photos/ ).
Preparing for image capture typically involves removing the lens covers from the cameras, powering up the setup and laptop, flipping the camera reset switch on the working surface (thus, providing power to the cameras too). The LED panels light up when the setup is powered on, after which their brightness can be adjusted using a remote control. The whole startup takes between 1 and 2 minutes and does not have to be repeated when capturing multiple subjects sequentially.
After seating the subject centered in front of the setup, a headband displaying 4 machine-vision markers is loosely fitted. This headband, which serves to scale the reconstructed facial mesh to life-size, is positioned such that as much of the forehead as possible remains exposed, while also ensuring that the markers are visible to a sufficient number of cameras (at least 3, but preferably more). Any loose hair is tucked away behind it in the process. The positioning lasers are then activated, and the subject is instructed to look into the centrally placed mirror with the nose tilted slightly upwards. The frame’s height and anteroposterior position are subsequently adjusted to align the projections of both laser crosses on the subject’s facial midline subnasally, which provides an easy visual cue for both patient and practitioner to confirm proper positioning ( Fig 3 ). Upon instructing the subject to maintain a neutral (ie, relaxed) facial expression, the image acquisition button is pressed, after which all cameras automatically and simultaneously focus and trigger. An example of the images acquired by each camera is presented in Figure 4 . The process is then continued on a computer fitted with a sufficiently powerful Nvidia graphics card (Nvidia, Santa Clara, Calif). After loading the images into 3DF Zephyr Pro, the reconstruction into a 3D mesh takes about 5-15 minutes to complete, depending on the desired mesh resolution and system specifications (at which point the presence of the imaging subject is no longer required).
Assessing the accuracy of the SF3D setup ideally required both intra- and intersystem evaluations, for which we had access to both the frequently used 3dMDFace and Vectra H1 systems. Because the latter requires 3 acquisitions from different angles to perform 1 facial reconstruction, whereas both SF3D and 3dMDFace are used as stationary, single-shot systems, we opted to use the 3dMDFace system for comparison. In brief, the 3dMD system is a hybrid structured-light stereophotogrammetry system consisting of 3 (stereo) pairs of 2 cameras each, with 1 pair positioned centrally in front of the patient, and the other 2 placed on either side.
A study group of 30 volunteers of diverse ethnicity was recruited from the Medical Imaging Research Lab at the University Hospital Gasthuisberg in Leuven, Belgium, using the exclusion criterion of having undergone any facial surgical interventions and having dense facial hair such as mustache and/or beard. The age and sex distribution of the sample, calculated using Microsoft Office Excel (Microsoft, Redmond, Wash), are shown in Table I . To account for the highly variable nature of human facial expression, we repeated image acquisitions 3 times in a row for each individual, using the methodology presented earlier. Furthermore, the technical baseline performance of both systems, defined as the performance in the absence of biologic variability (ie, facial pose), was assessed and compared by scanning a mannequin head 3 times consecutively with each system.