The use of three-dimensional (3D) optical instruments to measure soft tissue facial characteristics is increasing, but systematic assessments of their reliability, practical use in research and clinics, outcome measurements, and advantages and limitations are not fully established. Therefore, a review of the current literature was performed on the reliability of facial anthropometric measurements obtained by 3D optical facial reproductions as compared to conventional anthropometry or other optical devices. The systematic literature search was conducted in electronic databases following the PRISMA guidelines (PROSPERO registration: CRD42018085473). Overall, 815 studies were identified, with 27 final papers included. Two meta-analyses were conducted. Tested devices included conventional cameras, laser scanning, stereophotogrammetry, and structured light. Studies measured living people or inanimate objects. Overall, the optical devices were considered reliable for the measurement of linear distances. Some caution is needed for surface assessments. All instruments are suitable for the analysis of inanimate objects, but fast scan devices should be preferred for living subjects to avoid motion artefacts in the orbital and nasolabial areas. Prior facial landmarking is suggested to improve measurement accuracy. Practical needs and economic means should direct the choice of the most appropriate instrument. Considering the increasing interest in surface-to-surface measurements, fast scan devices should be preferred, and dedicated protocols devised.
The qualitative and quantitative description of facial size and shape plays a key role in several biological and clinical fields, ranging from the definition of sex, age, and ethnic group, to the assessment of diseases, trauma, and malformations, and to treatment planning and associated follow-up evaluations . In the past, facial anthropometry was classically performed in the three spatial dimensions using a caliper, steel tapes, and protractors. However, between the end of the 19 th century and the beginning of the 20 th century, the advent of craniofacial imaging such as conventional radiography and photography resulted in clinicians and researchers using simplified, two-dimensional (2D) representations of anatomical characteristics . More recently, three-dimensional (3D) techniques have regained their pivotal role in depicting the complex morphology of the head and face .
Alongside volumetric 3D imaging devices, which are mostly used for clinical purposes (computed tomography (CT), cone beam computed tomography (CBCT), magnetic resonance imaging (MRI)), there is increasing interest in optical surface systems (structured light including Moiré stripes, laser scanners, stereophotogrammetry) . Optical devices are contactless, safe, and radiation-free, and provide a detailed 3D quantitative representation of the external (cutaneous) facial surface, often coupled with a textured picture that makes the image realistic and well suited for patient communication .
The usefulness of 3D optical devices is increasingly being reported, not only for research and educational purposes, but also for clinical needs, and the use of 3D facial models has been proposed both for the diagnostic assessment of facial dysmorphism and for the outcome of functional, orthodontic, and surgical interventions in the orofacial region . On some occasions, they have also been used for the analysis of inanimate specimens such as cadaver heads and replicas of the face or of its parts .
The various optical instruments have different characteristics, with variations in the underlying technology, number of cameras, 2D to 3D conversion algorithms, resolution, dimensions, portability, and cost , . Technical aspects of these instruments have recently been reviewed and summarized , but it appears that no investigation has systematically analysed their reliability as a class of instrument, with focus on their use in research and clinical settings, the outcome measurements, their specific advantages and limitations, and whether the latter can be reduced or eliminated by the definition of a proper protocol.
Some reviews have focused on specific applications of optical systems, for instance methods for the longitudinal assessment and quantification of soft tissue facial morphology in preschool children or in patients with cleft lip and palate . Other studies have collected data on adult subjects using different imaging systems and on some occasions have compared the results to those provided by conventional anthropometry , , but the sample size has always been limited (range 2–15). Furthermore, linear distances, which currently represent the most used measurements, have been assessed in only two studies .
The aim of this review was to summarize the current evidence on the various optical instruments that can depict and measure soft tissue facial characteristics of living humans and inanimate objects in their 3D aspects, with a special focus on the practical applications of measurement protocols.
This review focused on soft tissue facial data collected from human people (participants) or objects reproducing the human oral and maxillofacial anatomy, using (intervention) digital 3D optical instruments and comparing (comparison) data with those provided by conventional anthropometry or other optical instruments to quantify (outcomes) the reliability of facial anthropometric measurements.
Materials and methods
A research protocol was designed and registered in the International Prospective Register of Systematic Reviews (PROSPERO: CRD42018085473, February 6, 2018), aiming to answer the following questions: (1) Which are the most reliable optical devices for the assessment of 3D facial anatomy in living humans and inanimate objects? (2) What are the advantages and limitations of each device in the different clinical and surgical fields? (3) Which metric measurements are most reliable (accurate and repeatable) for each device?
The relevant PICO criteria were as follows: ‘P’ (population): humans of any age and inanimate objects mimicking the oral and maxillofacial structures; ‘I’ (intervention): 3D optical reproduction of the face (or of its inanimate replica); ‘C’ (comparison): conventional anthropometry, other optical devices (stereophotogrammetry, laser scanner, structured light including Moiré stripes); ‘O’ (outcomes): reliability of facial anthropometric measurements.
The investigation followed the guidelines illustrated by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement .
Information sources and search
Two complementary searches were performed: an electronic search in five databases (PubMed, www-ncbi-nlm-nih-gov.easyaccess2.lib.cuhk.edu.hk/PubMed ; Scopus, www-scopus-com.easyaccess2.lib.cuhk.edu.hk/search/form.uri?display=basic ; Web of Science, apps.webofknowledge.com.easyaccess2.lib.cuhk.edu.hk/WOS_GeneralSearch_input.do?product=WOS&SID=Q1El@dc ; SciELO, SciELO.br/ ; The Cochrane Library, www.cochranelibrary.com.easyaccess2.lib.cuhk.edu.hk/ ) and a hand-search of cross-references and the authors’ personal collections. Taking into account technological progress and recent reports , focus was placed on the last 15 years, and articles published from June 1, 2003 to June 1, 2018 were considered. The search was limited to papers published in English, French, Italian, Spanish, and Portuguese.
The following search strategy was employed: (photogrammetry OR stereophoto OR laser scanner OR optical scanner) AND (face OR facial OR head) AND (3D OR three-dimensional OR three dimensional) AND (validation OR accuracy OR repeatability OR agreement OR concordance OR reproducibility OR reliability OR comparison).
Study types and participants: all types of study reporting quantitative assessments, including research studies, clinical studies, technical notes, original studies, reviews, multicentre studies, randomized clinical trials, and controlled clinical trials, performed on humans of any age and inanimate objects reproducing the facial structures. Types of measurement instrument: optical devices (stereophotogrammetry, laser scanner, structured light, etc.) producing a 3D digital model of the face. Types of intervention: metric measurements of anthropometric features obtained by 3D optical reproductions of the face or of its inanimate replica. Types of outcome measure: reliability (accuracy and repeatability) of facial measurements obtained by optical devices versus conventional anthropometry or other 3D optical devices.
Study types: reviews not reporting original data, editorials, opinion letters, case reports, case series, and congress abstracts. Types of measurement instrument: studies performed with instruments other than 3D optical scanners. Types of intervention: articles describing specific populations without a quantitative comparison of the same measurements between different methods of acquisition. Types of outcome measure: studies not reporting the reliability of facial measurements.
According to the inclusion and exclusion criteria listed above and after preliminary training and calibration, two reviewers independently and in a blinded manner screened the titles and abstracts of the identified records and decided on item retention or not. Any disagreement between the examiners was resolved by technical discussion involving a third reviewer. Articles were excluded on the basis of title and abstract if they did not fulfil the inclusion criteria. If the required information was not clearly reported in the abstract, the paper was included in the full-text analysis, thus reducing the risk of excluding potentially relevant articles.
The full texts of potentially relevant investigations were obtained and independently evaluated by the same reviewers. For each paper, an analysis of the inclusion criteria was performed and a final decision on eligibility was made. The agreement between the reviewers for the screening process was calculated (percentage agreement and kappa). Any disagreement was resolved by discussion.
The two reviewers independently extracted the data of interest from the full-text papers and recorded them on an electronic spreadsheet. The following data were collected: basic information on the investigation including the date, title, and authors; the type of optical device, commercial name, and reference standard for validation (direct anthropometry or another 3D image acquisition device); participant information including whether they were living or inanimate (mannequins mimicking facial morphology), age, and number; the types of measurement performed (linear distances, angles, surfaces, volumes, surface-to-surface distances), number of landmarks used, and number of measurements; the type of statistical analysis, estimation of errors, and their practical/clinical significance; the authors’ conclusions and disclosure of conflicts of interest.
Analysis of subgroups
The articles were classified according to the type of device, and a further division was made between studies on living people and studies on inanimate objects, also including cadavers.
Risk of bias (quality) assessment
The quality assessment (risk of bias in individual studies) was performed according to a modified version of the QUADAS 2 (QUality Assessment tool for Diagnostic Accuracy Studies) guidelines . The tested instrument and the reference instrument were considered as the ‘index test’ and ‘reference standard’, respectively. The question “Were the reference standard results interpreted without knowledge of the results of the index test?” in the QUADAS 2 checklist was not used, because this was not applicable to the current analysis. Two independent reviewers evaluated the final 27 studies; κ was 0.864. Any disagreement between the examiners was resolved by technical discussion.
To obtain a general view about measurement accuracy, two meta-analyses were performed, one for papers reporting data collected from living subjects and one for papers dealing with inanimate objects. In both analyses, the reference standard was direct anthropometry. Mean differences between data (linear distances) obtained using direct anthropometry and the tested instrument, the relevant standard deviations, sample size, and the correlation coefficients were obtained from the included studies. The random-effects model was used in both analyses. In addition, subgroup meta-analyses were run to compare the various tested instruments. The effect size for each study, as well as the global estimate (Cohen’s d ), were summarized using forest plots.
Heterogeneity among the selected papers was assessed using the Cochran Q test based on χ 2 statistics. T 2 and I 2 indices were obtained. T 2 indicates the between-study variance, while I 2 estimates the percentage of variation in the global estimate that could be attributed to heterogeneity (25% = low; 50% = moderate; 75% = high) . A sensitivity analysis was performed by removing one study at a time and recalculating the effect size .
To evaluate publication bias, the trim and fill procedure was applied; the Egger linear regression test and Begg and Mazumdar rank correlation test were also computed .
All calculations were performed using ProMeta software (version 3) , with the significance level set at 5% for all analyses except the Cochran Q test (α < 0.10) .
The first search retrieved 815 articles: 322 from Web of Science, 285 from PubMed, 200 from Scopus, seven from the Cochrane Library, and one from SciELO; no additional articles were found through the manual search ( Fig. 1 ). Unsuitable and duplicate papers were excluded through the title and abstract review, and 113 records were retrieved for more detailed evaluation. After reading the full texts, 69 articles were excluded (non-relevant papers, duplicate data, conference abstract only, or full text unavailable), leaving 44 papers as eligible for the review analysis.
Finally, according to the inclusion/exclusion criteria, 27 articles were included (final agreement = 91.2%; κ = 0.90). Reasons for exclusion were: ‘unsuitable instrument, no optical scanner’ , ‘no facial measurements performed’ , ‘unsuitable study design (no quantitative data or no assessment of reliability of an instrument)’ , and ‘no statistical analysis described’ .
Risk of bias (quality) assessment
Twenty papers had a low risk of bias , , five a high risk , and two an unclear risk ( Fig. 2 ). Most of the concerns in both domains were about the index test (risk of bias and applicability, three papers out of 27 with a high risk in both categories). The main concern was relative to the definition of the threshold for clinical/practical use that was not declared in the Materials and methods section. In the subject selection category, two papers did not apparently use a consecutive or random selection of volunteers or of inanimate objects.
Among the final 27 studies, 14 were included in the two meta-analyses. Inclusion criteria for meta-analysis were: direct anthropometry as the reference standard; measurement of linear distances; sufficient data for the estimation of Cohen’s d .
Description of the included studies
According to the reference technique and type of optical instrument device used in the study, several categories were created ( Table 1 ), and the articles were classified accordingly. Direct anthropometry, laser scanning, stereophotogrammetry, and structured light were all used as reference techniques, with the largest number of studies using direct anthropometry (columns in Table 1 ). Conventional cameras, laser scanners, stereophotogrammetry, and structured light (including Moiré profilometry) were the tested devices; most of the studies tested stereophotogrammetry devices (rows in Table 1 ). Eleven studies used two or three different optical devices; they were included in all relevant categories: therefore, the total number of categorized studies ( n = 34) exceeds the number of included studies ( n = 27).
|Tested system||Reference technique|
|Direct anthropometry ( n = 18)||Laser scanner ( n = 2)||Stereophotogrammetry ( n = 7)||Structured light ( n = 1)|
|Cameras ( n = 1)||*Deli et al. (2010)|
|Laser scanner ( n = 8)||*Fourie et al. (2011)||*Gibelli et al. (2018)|
|*Germec-Cakan et al. (2010)|
|Lippold et al. (2014)|
|Joe et al. (2012)|
|*Kook et al. (2014)|
|Kovacs et al. (2006)|
|Kovacs et al. (2006)|
|Stereophotogrammetry ( n = 16)||Asi et al. (2012)||*Zhao et al. (2017)||Camison et al. (2018)|
|Dindaroğlu et al. (2016)||Gibelli et al. (2018)|
|*Fourie et al. (2011)|
|*Germec-Cakan et al. (2010)|
|Ghoddousi et al. (2007)|
|*Kook et al. (2014)|
|Lübbers et al. (2010)|
|Metzler et al. (2014)|
|Park et al. (2012)|
|*Weinberg et al. (2006)|
|Winder et al. (2008)|
|Wong et al. (2008)|
|*Ye et al. (2016)|
|Structured light ( n = 8)||Weinberg et al. (2004)||*Zhao et al. (2017)||*Artopoulos et al. (2014)||Bakirman et al. (2017)|
|*Weinberg et al. (2006)||*Knoops et al. (2017)|
|*Ye et al. (2016)||*Secher et al. (2017)|
|*Zhao et al. (2017)|
Papers using the same devices were analysed together. Within each category, where appropriate a further division was made between studies on living people and inanimate objects. Table 2 summarizes the main characteristics of the reviewed studies, including the results and authors’ conclusions.
|Authors||Reference method||Tested method/s||No. living/inanimate||No. landmarks||Type, No. variables||Results and authors’ conclusions|
|Artopoulos et al. (2014)||SP||SL||22 mid-face photopolymer models of CT scans||3D point clouds||Surface-to-surface distance maps||No systematic deviation on bias and good agreement between techniques (mean difference −0.01 mm, SD 0.038, 95% CI −0.084 to 0.064). SL 97%, SP 99% of points ±1 mm. Good inter-method agreement; clinically acceptable differences.|
|Asi et al. (2012)||DA||SP||20 adults||17||18 linear distances||No significant differences between the two systems; the tested method is highly reliable and valid (error <1 mm) except for the eye region, which requires caution (mean difference 1.25 mm).|
|Bakirman et al. (2017)||High-resolution SL||2 low-cost SL scanners||2 adults||3D point clouds||Surface-to-surface distance maps||RMSD errors between the reference and the tested facial reconstructions <1 mm; absolute mean differences 0.4–0.8 mm. Both tested systems can be potentially used for 3D face modelling; motion artefacts should be considered.|
|Camison et al. (2018)||SP||Handheld portable SP||26 adults/1 MH||17; 3D point clouds||136 linear distances/surface-to-surface distance maps||Linear distances: mean TEM 0.84 mm, range 0.19–1.54. All errors <2 mm; mean rTEM 1.13%, range 0.44–2.48%. Surfaces: RMSD difference living <1 mm, inanimate object <0.1 mm, range 0.44–2.48; mean signed difference 0.89 mm, range −0.35 to 2.07. The portable device is highly repeatable, reliable and accurate. It can be used in the clinical setting but motion artefacts should be considered (eyes, mouth).|
|Deli et al. (2010)||LS||Custom-made 3D cameras SP||1 MH||30||56 linear distances||Linear distances: mean difference between methods 0.037 mm, SD 0.006. The precision requirement is fully satisfied. Distances are highly reliable proving the validity of the system.|
|Dindaroğlu et al. (2016)||DA||SP||80 adults||11||10 linear distances/ 6 angles||Very high level of agreement between the methods. The system is reliable and accurate; errors always <2 mm; validated for practical purposes.|
|Fourie et al. (2011)||DA||LS/ SP/ CBCT||7 cadaver heads||15||21 linear distances||Absolute errors up to 0.89 mm; percentage errors up to 1.64%; one distance by SP: mean absolute error >1.5 mm. All the tested methods are similarly accurate and reliable and highly precise when compared with direct measurements.|
|Germec-Cakan et al. (2010)||DA||Portable LS/LS/ SP||15 adults/plaster casts||15||11 linear distances||Significant differences among the four methods but SP was the most promising method.|
|Ghoddousi et al. (2007)||DA||SP||6 adults||14||15 linear distances||The tested device is sufficiently accurate and reliable for clinical use, but in the eye and nasolabial areas 3/15 measurements had clinically important differences.|
|Gibelli et al. (2018)||SP||Low-cost handheld LS||50 adults/ 1 MH||17||14 linear distances/ 12 angles/volumes/surfaces||Inter-instrument comparison: 17/26 measurements had ‘very good’ or ‘good’ TEMs. Intra-instrument repeatability was moderate or poor for most measurements. The tested system should be limited to inanimate objects due to its moderate and poor reliability and repeatability on living subjects.|
|Gibelli et al. (2018)||SP||Handheld portable SP||50 adults/1 MH||12; 3D point cloud surfaces||15 linear distances/12 angles/surface-to-surface distance maps/surface areas/volumes||Within-device mean TEMs: 1.29 mm, 1.19°; repeatability for most of linear and angular distances 82.2–98.7%; rTEMs: 0.2–3.1%. The system is reliable for linear, angular and surface measurements in both inanimate and living. Volume measurements and RMSD in living subjects are more affected by involuntary motion and should be considered with caution.|
|Joe et al. (2012)||DA||Fixed LS||9 adults||14||10 linear distances||The digital system is a valid method with results as precise as DA and accurate (only three distances differed >3 mm).|
|Knoops et al. (2017)||SP||Handheld infrared SL/handheld LED SL/ MR scanner||8 adults||4; 3D point clouds||Surface-to-surface distance maps||Accuracy: higher for the white SL scanner (RMSD 0.71 mm; 94% of data <2 mm threshold for clinical use) than for the MR scanner (1.11 mm; 86% of data) and the infrared SL scanner (1.33 mm; 80% of data). Precision: similar for both SL scanners (0.51 mm). Both SL scanners are suitable for practical use; the LED light scanner is the most accurate system.|
|Kook et al. (2014)||DA||SP/fixed LS/CT/ electro-mechanical digitizer||12 MHs||15||10 linear distances||All coefficients of reliability >0.92, TEMs up to 0.9 mm; no differences among methods except bi-orbital and intercanthal widths. All the tested methods are highly reliable and valid for both clinical and research purposes. Motion in living could affect results.|
|Kovacs et al. (2006)||DA||Fixed LS||1 MH||48||>680 linear distances and angles||By optimizing recording conditions, the reliability and precision is highly enhanced (93% of distances had differences <2 mm).|
|Kovacs et al. (2006)||DA||Fixed LS||5 adults||48||>680 linear distances and angles||On a subset of 560 distances: mean difference 1.32 mm, SD 5.67; >50% of the variables do not satisfy the reliability tolerance threshold for practical applications (>2 mm). Precision may be improved by optimizing recording procedures.|
|Lippold et al. (2014)||DA||Handheld LS||15 adults||12||7 linear distances||Most of the distances differ <1 mm from the reference standard. The system is validated and considered clinically useful due to its high inter-method agreement except for mouth width and nasion–subnasale distance. The quality of the scan can still be improved.|
|Lübbers et al. (2010)||DA||SP||1 MH||41||201 linear distances||Errors always <1 mm (on average, −0.01 mm). The system provides a good digital representation of reality under clinical circumstances with high agreement with DA. Its clinical use is proposed.|
|Metzler et al. (2014)||DA||SP||1 MH||52||410 linear distances||Mean difference 7.96 mm; after elimination of 10% of the most unreliable data: 1.33 mm. The system has high reliability, accuracy and excellent repeatability. It is suitable for anthropometric and clinical studies but with caution in living.|
|Park et al. (2012)||DA||SP||20 adults||7||5 linear distances||Mean difference 0.73 mm; range 0.13–1.53 mm. No significant differences between the two methods. The device is sufficiently accurate and reliable for clinical use (error <1 mm).|
|Secher et al. (2017)||SP||SL||10 adults/1 MH||4; 3D point clouds||Surface-to-surface distance maps||Random error 0.1 mm, below a 1 mm threshold for practical clinical applications; the most complex parts of the face had errors up to 4 mm. The tested method is very accurate in inanimate objects but with lower reproducibility in living, affected by motion.|
|Weinberg et al. (2004)||DA||SL||20 adults||17||19 linear distances||About one third of the linear distances differed between techniques, but in only 3/19 was the mean difference >2 mm. The tested system is highly precise and shows high agreement with DA. Potential for clinical and research applications.|
|Weinberg et al. (2006)||DA||SP/SL||18 MHs||17||12 linear distances||The tested methods show significant differences between each other and with DA yet within the limits considered as clinically acceptable, ranging around ± 0.1 mm.|
|Winder et al. (2008)||DA||SP||1 MH||18||20 linear distances||Mean difference between measurement sets: 0.62 mm (range 0.06–1.43); delta RMS mean: 0.057 mm, max 1.06; mean variance 0.003 mm. The system is suitable for clinical use thanks to its high repeatability.|
|Wong et al. (2008)||DA||SP||20 adults||19||18 linear distances||The tested device is highly reliable, precise and accurate (error <1 mm); its use in clinics and research is encouraged.|
|Ye et al. (2016)||DA||SP/SL||10 adults||16||21 linear distances||No differences between instruments; absolute errors: SL 0.58 mm, SP 0.62 mm. Both optical systems are reliable and accurate with errors below the clinically acceptable threshold of 1 mm.|
|Zhao et al. (2017) [Au?6]||LS/SP||SP/SL||10 adults||9/ iterative closest points||Surface-to-surface distance maps||RMSD between the facial maps: approx. 0.5–0.7 mm; the two tested systems are both accurate and applicable for clinical purposes.|