The aim of this study was to validate the automatic tracking of facial landmarks in 3D image sequences. 32 subjects (16 males and 16 females) aged 18–35 years were recruited. 23 anthropometric landmarks were marked on the face of each subject with non-permanent ink using a 0.5 mm pen. The subjects were asked to perform three facial animations (maximal smile, lip purse and cheek puff) from rest position. Each animation was captured by the 3D imaging system. A single operator manually digitised the landmarks on the 3D facial models and their locations were compared with those of the automatically tracked ones. To investigate the accuracy of manual digitisation, the operator re-digitised the same set of 3D images of 10 subjects (5 male and 5 female) at 1 month interval. The discrepancies in x , y and z coordinates between the 3D position of the manual digitised landmarks and that of the automatic tracked facial landmarks were within 0.17 mm. The mean distance between the manually digitised and the automatically tracked landmarks using the tracking software was within 0.55 mm. The automatic tracking of facial landmarks demonstrated satisfactory accuracy which would facilitate the analysis of the dynamic motion during facial animations.
Facial appearance has a major impact on how we are perceived in society and how others perceive us. Functional impairment may be caused by facial nerve paralysis, cleft lip and palate, facial trauma and facial scarring. Many patients consider surgical reconstruction to correct their facial functional impairments. The evaluation and quantification of facial movement is becoming particularly important to aid in diagnosing, treatment planning and to improve the outcome of the surgical correction of facial functional impairment. There is a need for a reliable method to record facial morphology and accurately measure animations.
Callipers have been used to measure facial soft tissue movements based on landmark distances on the face. These methods only measure the magnitude rather than the dynamics of facial animation. Farkas et al. applied direct anthropometry and concluded that errors were introduced by incorrect head positioning in both the vertical and horizontal planes and by measuring landmarks on the photograph that were not labelled directly on the face. Abnormal facial movements using photographic data of facial animations including maximal brow left, maximal smile and maximal whistle have also been analysed. The photographs were taken at rest and at maximal animations. The study was limited to measuring the changes of the position of anatomical landmarks between two facial expressions, and did not describe the dynamic nature of facial animations. Videos were also used to study the ‘normal’ smiles. The videos were captured from various view angles, eight landmarks were digitised along the vermilion border of the lips. The study could not provide three dimensional (3D) representation of lip movements. The two dimensional (2D) measurement systems failed to deliver information on the movements of facial landmarks in the antero-posterior direction and were limited to medio-lateral and vertical displacements of these landmarks.
Static 3D imaging techniques have been applied for the analysis of facial shape and demonstrated accuracy within 0.5 mm. Weinberg et al. assessed the precision and accuracy of facial measurements obtained from digital 3D images using the Genex imaging system. The authors concluded that digital 3D photogrammetry was sufficiently precise and accurate for facial analysis. The precision and repeatability associated with facial landmarks derived from 3D images of the faces of 15 cases recorded by the 3dMDFace system were investigated. The study reported that 14–20 facial landmarks showed a high degree of precision, the other 6 landmarks had errors greater than 1 mm. Various investigations have confirmed the validity and reliability of 3D digital photogrammetry. A 3D laser scanner has also been used to capture facial morphology. Laser scanning systems are used for 3D static capture of facial morphology but the method does not allow real time capture, therefore, the assessment of facial animation in dynamic motion is not possible, which is one of the main drawbacks of the method.
Recently, facial animation has been quantified using a 3D motion capture system. A video based motion capture system was used to analyse lip movement. The accuracy of the system was in the range of 0.53–0.73 mm. Popat et al. conducted a cross-sectional study to construct 3D templates of average lip movement during speech using the 3dMDFace Dynamic system. The results showed that there was a significant difference in the lip movement between genders for visemas/pu/and/ppy/, although these were not clinically significant.
Despite advanced techniques to measure facial animation, there is still insufficient information on the dynamics of facial movement. One of the major obstacles when studying facial animations is the number of anatomical landmarks that have to be digitised to study the dynamics of shape change throughout the course of facial movements and their associated facial animation. Direct labelling of anatomical landmarks on the face before capture has been tried, but the method is impractical for daily use and inconvenient for the patients. Digitising facial landmarks on 3D virtual models is time consuming and not suitable for routine clinical use especially with real time recording in which 60 3D frames are captured per second. For an animation that would last for 3 s 180 frames will be generated and it would be almost impossible to digitise all facial landmarks on each one of these frames over the entire sequence. Automatic tracking software has always been an objective to overcome the technical difficulties of manual digitisation of thousands of anatomical landmarks to track facial animation. Dimensional imaging has developed software that uses a combination of passive stereo photogrammetry to recover a sequence of 3D models from a stereo pair of synchronised video streams and dense optical flow tracking to track every pixel from one image frame to frame through the video streams with sub-pixel precision. This allows a landmark to be placed on the surface of the first 3D model in the sequence, and then projected to an image location in the first stereo pair of images. The optical flow tracking information is then used to locate automatically the same point in the second stereo pair of images, which is then projected onto the surface of the second 3D model, and so on for subsequent frames. This software promises to be a reliable and fast tracking method for anatomical facial landmarks.
The aim of this study was to evaluate automatic tracking of facial landmarks in image sequences captured using a four dimensional (4D) capture system (DI4D system, Dimensional Imaging Ltd., Glasgow, UK).
Materials and methods
After obtaining the appropriate approval from the local ethics committee, 32 subjects were recruited to the study (16 females and 16 males) ranging in age from 18 to 35 years. Subjects were given verbal and written explanations of the purpose of the project and the details of their involvement. The subjects had no history of facial deformity, orthognathic surgery or facial scarring.
Each subject was imaged using the DI4D system which consisted of two greyscale cameras (Model avA 1600-65km/kc. Resolution 1600 × 1200 pixels, Kodak sensor model KAI-02050, Basler, Germany), one colour camera functioning at 60 frames per second, and a lighting system (Model DIV-401-DIVALITE, KINO FLO Corporation, USA). The 4D capture system was connected to a personal computer (Win 7 professional, Intel core i7 CPU – 3.07 GHz). 23 facial landmarks ( Table 1 ) were marked on each subject’s face. by the same operator with non-permanent coloured ink using a 0.5 mm pen (Staelier, Germany) before his/her facial animation was captured.
|Landmark number||Landmark name||Definition|
|1 and 2||Superciliary points||The points located above most superior aspects eye brows|
|3||Glabella||The most prominent midline point between the eyebrows, identical to bony glabella on the frontal bone|
|4 and 7||Exocanthion||The point at the outer commissure of the eye fissure, located slightly medial to bony exocanthion|
|5 and 6||Endocanthion||The point at the inner commissure of the eye fissure, located lateral to bony landmark|
|8||Nasion||The point in the midline of both the nasal root and the nasofrontal suture, always above the line that connects the two inner canthi, identical to bony nasion|
|9 and 10||Zygion||The most prominent point on the cheek area beneath the outer canthus and slightly medial the vertical line passing through it; different from bony zygion|
|11||Pronasale||The most protruded point of the nose identified in lateral view of the rest position of the head|
|12 and 13||Alar curvature||The most lateral point on the curved base line of each ala, indicating the facial insertion of the nasal wing base|
|14 and 15||Subalare||The point on the margin of the base of the nasal ala where the ala disappears into the upper lip skin|
|16||Subnasale||The mid point of angle at the columella base where the lower border of nasal septum and surface of the upper lip meet|
|17 and 18||Cheilion||The point located at the corner of each labial commissure|
|19 and 20||Crista philtre||The peak of Cupid’s bow of the upper inferior|
|21||Labiale superius||A point indicating the maximum convexity of the muco-cutaneous junction of the upper lip and philtrum|
|22||Labiale inferius||A point indicating the maximum convexity of the muco-cuteneous border of the lower lip|
|23||Pogonion||The most anterior midpoint of the chin|
The operator demonstrated three facial animations and a rest position to each subject and trained each participant for 5 min before image capture began. The subject sat in an upright and comfortable position. Subjects were shown photographic cue cards of an individual demonstrating each of the expressions. Prior to each capture session, each expression was practised with the operator to ensure that the subjects had fully understood the instructions. Subjects were asked to keep their eyes open and remain still during image capturing. A distance of 95 cm was measured using a measuring tape from the cameras to the subject’s cheek. A second operator checked the focal length before each capture. The lighting system was set to maximum power before image capture began. The subjects were asked to remain in the rest position following a standardised protocol. For all image sequence captures the subjects were seated on a chair directly in front of the camera system, they were asked to perform the following three facial animations: maximal smile, subjects were asked to bite their back teeth tightly together and to smile as widely as possible saying ‘cheese’; maximal lip purse, subjects were asked to purse their lips together and whistle or pretended to whistle; maximal cheek puff, subjects were asked to bite together on their back teeth and hold their lips together while the cheeks were puffed maximally.
It took about 3 s to capture each facial animation at a rate of 60 frames per second using the DI4D capture software ( Fig. 1 ). Each capture began at the rest position followed by maximal animation and then returned to the rest position. The images were reviewed immediately after capture using DI4D view (Dimensional Imaging Ltd, Glasgow, UK) to ensure absence of acquisition errors including image blurring and artefacts. The images were saved for future processing. This was repeated for the all three facial animations.