13: Anatomy and Physiology of Speech Production

Chapter 13
Anatomy and Physiology of Speech Production

Janet E. Rovalino

Otolaryngology – Head and Neck Surgery, School of Medicine, University of Connecticut

Phonation and vocal tract modulations

There are multiple points of clinical intersection amongst the fields of dentistry and speech pathology. Interdisciplinary treatment often is required in the management of disorders of articulation, resonance, and swallowing caused by congenital cleft of the lip and palate, in speech disorders due to morphologic variations in the facial skeleton and its oral tissue, and in supplementation of dysarthric speech resulting from neuromuscular dysfunction using adaptive technology (i.e. palatal lifts or obturators). It is also important to consider oral health initiatives in patients living in long-term care facilities who have speech and swallowing difficulties, feeding problems in newborns with dentofacial disorders due to fetal alcohol syndrome , and dental and speech/swallowing needs in irradiated patients following treatment of oral and laryngeal cancer. The opportunity for collaboration amongst speech and dental personnel is vast.

In order to provide optimum treatment of such cases of speech and language disorders, the dental professional needs a standing knowledge of speech production patterns, swallowing dynamics, vocal tract differences, and an understanding of the neural basis of language comprehension and production. It is important to appreciate the function of various parts of the speech system as it generates an acoustic output in both quantitative and qualitative terms. Judgments of inadequate velopharyngeal closure, dental consonant misarticulations, and weak articulatory contacts currently are able to be calculated and defined more accurately by the speech pathologist or dental professional. More specifically, the speech professional seeks to characterize, both objectively and perceptually, the manner in which aberrant function impacts speech and swallowing physiology. Thereafter, the speech pathologist’s role is to provide therapeutic management to change/adjust function within the framework of the physiologic and physical abnormalities or while the system undergoes surgical and/or clinical management.

Within the field of speech pathology, there are multiple areas of specialty and subspecialty. Likewise, there are multiple clinical disorders requiring concentrated clinical knowledge, including swallowing disorders, resonance disorders, voice disorders, vocal tract disorders (i.e., disorders of articulation) and language-based disorders. This chapter will focus on speech and voice/vocal tract disorders, with initial discussion of the normally functioning speech system.

Normal speech production is a highly regulated process and a sufficiently dynamic process requiring interchange amongst pulmonary, laryngeal, and vocal tract structures of the human body. The speech production mechanism depends on the respiratory system, a laryngeal vibration, a functioning set of resonating cavities (larynx, pharynx, oral cavity, and nasal cavity), and rapid movement of coarticulating organs or articulators (tongue, lips, teeth, alveolar ridge, hard palate, velum, and pharynx). Any change that affects the size, shape, movement, or timing of these organs will alter the acoustic output. The subsequent alteration can result in speech variation or disorder. These areas will be examined individually.


The initiation of human sound begins at the larynx. The vocal folds, two paired thyroarytenoid muscles, obstruct the airflow generated upwards from the lungs. These vocal folds are made of muscle and ligament, covered with mucosa, and stretch from the front of the larynx to the back of the larynx (Fig. 13.1). The space between the vocal folds is called the glottis and peak glottal area is usually between 0.05 and 0.2 cm2 during voicing for adults. Vocal folds are typically 13 to 18 mm long in females and 17 to 23 mm long in males.


Figure 13.1 Schematic of the position of the vocal folds during phonation (vocal folds adducted) and during rest and breathing (vocal folds abducted).

(Artwork courtesy of Hannah Plishtin, BA.)

The vocal folds are attached posteriorly to two arytenoid cartilages which sit atop the cricoid cartilages. The arytenoid cartilages move in a complex pattern (simplistically, rocking, rotating, and sliding). With neural input, the arytenoids change position to cause the vocal folds to adduct (close) or abduct (open). Anteriorly the vocal folds are attached to a fixed point on the thyroid cartilage. The size of the open glottis is accordingly controlled by the arytenoids and by tension within the thyroarytenoid (vocal fold) muscles themselves. Contraction of the cricothyroid muscle (moving the cricoid cartilage further away from the thyroid cartilage) stretches the vocal folds, leading to their elongation, and serving as a pitch change mechanism for voicing.

The vocal folds regulate the inward–outward flow of air. With a wide glottis the vocal folds are relaxed and air flows freely with minimal hindrance. During voicing, however, the arytenoid cartilages move toward each other, causing the vocal folds to approximate or adduct. As the vocal folds approximate, they produce a partial closing of the glottis, and obstruct the airflow from the lungs. Tension from the thyroarytenoid muscles is added, and pressure below the adducted vocal folds increases. Eventually the compressed air beneath the vocal folds forces them apart. The vocal folds are then brought back together by two forces, the elasticity of the vocal folds and the Bernoulli effect (Fig. 13.2). Bernoulli’s principle indicates that when pressure between the vocal folds drops, negative pressure is created, causing the vocal folds to become sucked inward. The entire process repeats itself for as long as the aerodynamic and muscular conditions for phonation are met. The resulting vibrations/oscillations generate quasi-periodic broad spectrum excitations (or puffs of vibrating air). Their quality is akin to a “buzz-like” sound. Sound generated by the larynx is referred to as the sound source and visualized in a sound spectrum (Fig. 13.3a). These excitations of air are propelled into the vocal tract.


Figure 13.2 Schematic of the pattern of phonation: initial vocal fold adduction (1), column of transglottal airflow and air pressure opens bottom of vibrating vocal folds (2–3), column of air pressure moving upwards, opening the top of the vocal folds (4–6), the low pressure created by moving column of air produces Bernoulli effect initiating vocal fold closing (7), and vocal fold closing (8).

(Artwork courtesy of Hannah Plishtin, BA.)


Figure 13.3 Source-filter model of phonation. (a) Sound spectrum from vibrating vocal folds. (b) Filter (or transfer) function of vocal tract. (c) Output spectrum showing final acoustic product radiated from the lips.

(Artwork courtesy of Hannah Plishtin, BA.)

The sound produced by the vocal folds, then, is the result of careful and coordinated use of air pressure produced by the respiratory system. Lung pressure during voicing is released slowly. A steady subglottal pressure is maintained by activity of the respiratory muscles, including the intercostal, abdominal, and latissimus dorsi muscles. The speech pathologist is able to take measurements of air pressure across the vocal folds and then throughout the vocal tract, at areas of constriction. By studying patterns of air pressure in these regions and comparing it to available normative data, the degree of function or abnormality can be inferred. In a clinical speech laboratory, the easiest measurement to obtain and evaluate is the measure of intraoral pressure, using a small intraoral catheter. Measuring the pressure in the oral cavity, during the production of repetitive productions of /pa/, provides an indirect measure of pressure at the glottis, Psubg (Fig. 13.4). Abnormal findings may be indicative of a variety of issues, including velopharyngeal insufficiency and poor laryngeal closure, or it can suggest errors in degree, timing, or location of vocal tract constrictions.


Figure 13.4 Intraoral pressures traces and airflow traces during rapid repetition of /pa/. Bottom tracing represents air pressure pulses for /p/. Third tracing shows speech airflow immediately following each pressure pulse. First and second tracings represent fundamental frequency and sound pressure levels, respectively.

(Permission provided by Kay-Pentax Corporation for use of images from Phonatory Airflow System.)

The duration of one cycle of vocal fold vibration is called the pitch period, or fundamental frequency (typically measured in hertz, or cycles per second). The rate of vocal fold vibration is influenced by many factors. These factors include tension of the vocal folds, mass of the vocal folds, and the air pressure below the glottis. Vocal folds are capable of a variety of vibration rates ranging from 60 to 1000 Hz. Males have lower fundamental frequencies (typically phonating at 100–150 Hz) than females (who typically phonate at 180–220 Hz). The lower vibrating frequency of the vocal folds in males is due to the greater mass of male vocal folds.

Vocal tract modulation

The vocal tract is defined structurally as the area existing from the superior surface of the vocal folds through to the lips and including coupling of the nasal passages. The vocal tract has an average length of 17 cm in a male and is shorter in length for females. Its shape is nonlinear. It consists of a set of cavities where sounds are resonated. The articulators of the vocal tract include the tongue, lips, teeth, alveolar ridge, hard palate, soft palate, and pharynx (Fig. 13.5). The vocal tract is considered to be a filter of the sounds (or source) produced by the larynx. Another way to understand this is to consider the source as the energy of the system being modified by the supralaryngeal vocal tract’s filter which can either suppress the energy or amplify the energy. Furthermore, for vowels, the vocal folds produce a harmonic-rich signal that falls at or near resonance peaks in the vocal tract, which then enhances or amplifies them. Other harmonics are attenuated because they fall in areas beyond the resonant peaks.


Figure 13.5 Schematic of larynx and vocal tract illustrating the larynx (sound source) and the vocal supralaryngeal structures (sound filter).

(Art work courtesy of Hannah Plishtin, BA.)

The acoustic product of the filter is determined by the varying shape of its container(s). Any change in the acoustic output necessarily represents a change in the status of the speech organs. This can be visualized by use of a frequency-amplitude spectrum, also called a transfer function. The transfer function represents the acoustic response of the air in the vocal tract cavities. An amplitude spectrum, where the vertical axis represents amplitude and the horizontal axis represents frequency, is commonly used by speech pathologists (Fig. 13.3b).

The final output of the combined interaction of the source and the filter is called the spectrum and is visualized in spectrograms (Fig. 13.3c). The spectrogram allows the investigator to surmise considerable information about speech production. The frequency of a signal is drawn on the ordinate, and time is represented on the abscissa. Amplitude of the signal is indicated by the darkness of the signal. The closely spaced vertical striations represent the energy pattern resulting from the individual glottal pulses. There are points in time where the major excitation of the air in the supralaryngeal vocal tract occurs. These resonant frequencies, in speech science context, are referred to as formant frequencies or simply as formants. For vowels, due to changing position of the articulators, different spectrograms are produced, which correspond to the difference in the shape of the resonant cavities. For vowels, the major contributor to the formation of resonating cavities is the tongue. Thus each vocal tract shape is characterized by a collection of formants.


In vowel productions a near-periodic sound source is conveyed into the vocal tract. For articulation of vowels, the upper vocal tract is chiefly modified by tongue positioning, with additional influence from lip rounding. Studies assessi/>

Only gold members can continue reading. Log In or Register to continue

Jan 4, 2015 | Posted by in General Dentistry | Comments Off on 13: Anatomy and Physiology of Speech Production
Premium Wordpress Themes by UFO Themes