|
|||
|
Paper
Paper in Microsoft Word .doc format.
Introduction
How does one take a fast paced, multi-character action novel and turn it into an audiobook narrated by a single voice? In Dan Brown's Angels and Demons, the plot surrounds main characters Robert Langdon and Vittoria Vetra. Langdon is a Harvard professor from the northeast United States. Vetra, the only lead female character, is Italian. Other secondary, but notable characters, include: Maximilian Kohler, the Swiss director of an atomic research center; the Hassasin, a Middle Eastern hired murderer; Janus, the unknown Illuminati master whom the Hassasin takes orders from; Olivetti and Rocher, Swiss Guard officers; Gunther Glick, a British BBC reporter; Chinita Macri, an African American; and finally the Camerlengo, a member of the clergy. Given only a brief description of each of these characters, it is already clear that a diverse variety of backgrounds and personages - American, Italian, Middle Eastern, Swiss, British - must be represented by narrator Richard Poe as he recites the novel. In this project, I will look at how Poe differentiates characters from one another when narrating dialogue, with particular focus on speech rate and phonetic pitch. Method of Approach I have selected 168 samples of conversation to analyze from the audiobook. [See charts 1A and 1B.] The sample clips span the entire length of the novel, with a slightly greater percentage contained in earlier chapters because more dialogue occurs then. (Many chapters are quite short, and a veritable amount of later chapters contain only narration and no character dialogue.) The samples represent approximately thirteen different characters, of both female and male gender. They have been tagged, if applicable, into the following contextual categories: character introductions, explanation of facts, end of chapter lines, thoughts/flashbacks, and lines in Italian (versus English). Various emotional contexts have also been tagged: angry/exasperated, confrontational, "emotional" (sad), urgent, confused, shocked, and happy/gleeful. The speech rate was obtained by timing each sample and dividing the number of seconds by the word count. Although human error and measurement inaccuracy might apply, I tried to minimize its effects by predominantly choosing midsized samples, in which human reaction time would play a smaller role in altering the calculated speech rate. Altogether, the samples range from three words to 118 words, with the shortest sample taking ~0.82 seconds to complete and the longest taking ~44.87 seconds. Pitch was analyzed by recording each sample in Audacity. The recorded file was then converted to .wav and transferred to Praat, which supplied the mean, maximum, and minimum pitches. One observation made here was that the presence of alveolar consonants, particularly /s/ or /t/ in short words or as an ending consonant, caused the pitch to spike dramatically upwards at that point, potentially making the maximum and mean calculations less accurate. [See diagram 2.] This trend appeared to be caused by the presence of /s/ or /t/ being more readily picked up by the program, due to overemphasized pronunciation. This assumption is reasonably applied because, as a narrator, Richard Poe is careful to enunciate clearly and therefore the ending or short /s/ and /t/'s are much more loudly and clearly pronounced, for the benefit of the listener. Although /s/ and /t/ are voiceless and technically not pitched, the high frequencies present in these consonants, coupled with the extra speaking emphasis, probably caused this defect in the data collection. Any numerical insets below (e.g. #23) refer to particular samples, labeled by identification number. The raw data may be found in the accompanying document. Words per second will subsequently be abbreviated by W/s. Reference to samples as "male" or "female" do not refer to the gender of a sample, but rather to the gender of the character whose dialogue it is. Male versus Female: Speech Rate [See charts 3A and 3B.] Of the 53 total female gender samples, the average speech rate was 2.804 W/s, with a minimum of 1.717 (#56) and a maximum of 6.358 (#115). With standard rounding to the nearest whole number, a rate of 3 W/s was the mode at 30 counts, followed by 2 W/s at 19 counts. For males, there were 115 total samples. The average speech rate was 2.648 W/s, with a minimum at 1.3015 (#40) and a maximum of 4.469 (#86). Standard rounding produced the same mode (3 W/s) as for female samples, however, the frequency here was 51 and extremely closely flanked by 2 W/s at 50 counts. Therefore, for males, the most "frequent" speech rate was about halfway between 2 and 3 words per second, whereas the female speech rate was more frequently close to 3 words per second. The modes here correlate well to the averages of 2.804 being closer to 3, and 2.648 being closer to midway between 2 and 3. The speech rates presented by the data show that Richard Poe tends to narrate female lines faster or quicker than he does male lines. In this Language Log post, Professor Brizendine is cited as writing that "Girls speak faster on average-250 words per minute versus 125 for typical males." The results here aren't quite so dramatic: multiplied by 60, the two rates become 158 words per minute for males and 168 words per minute for girls, a difference of only ten words per minute, and no where near the 250 rate that Brizendine claims. However, one must keep in mind that this is not a true case of males and females speaking; there is a single (male) narrator who is forced to voice the lines of both. This would explain why the two rates are so close to one another - Poe is not really a female, so the "base rate" is still present when he speaks. Additionally, the nature of the recording as an audiobook restricts Poe to maintain a relatively steady and listener-friendly pace, therefore eliminating what might have been a more dramatic variance in speech rate. In Professor Liberman's discussion later on in the Language Log post, Brizendine's statistics are dismissed for lacking evidence, and interestingly, the opposite conclusion is drawn: "males tend to speak faster than females…[but] the difference between them is…very small." This is backed up by a study conducted by Jaihong Yuan, Chris Cieri and Professor Liberman, and their results contradict the results of this study. The difference may be explained by the fact that Poe is not speaking verbatim, but reading aloud, or he is mistakenly attempting to replicate female dialogue by talking faster, when it should actually be slower according to the Yuan/Ciere/ Liberman study. Another explanation may be that the context of the female lines analyzed is interfering with what would be a (s)lower speech rate. Perhaps the context of the lines examined naturally dictate that they be spoken faster (e.g. urgency), which would cause the rate to be faster. Male versus Female: Pitch [See chart 3C.] The average pitch for female dialogue was 148.89 Hz, with an average range of 309.68 Hz. The minimum pitch among samples was 71.15 Hz, and the highest maximum was 523.82 For males, the average pitch was 143.12 Hz, with an average range of 328.75 Hz. The lowest minimum pitch attained was 69.36 Hz, and the highest maximum pitch was 524.46 Hz. These numbers show that, overall, the pitch level was fairly consistent between Poe's vocalizations of male versus female characters. Male dialogue had a greater average range, suggesting that given an arbitrary piece of dialogue, there would be more pitch variation between the lowest and highest pitches. Females, on the other hand, had an average range of approximately 20 Hz lower, meaning they "wavered" less in pitch. Twenty hertz is mildly significant an amount, considering that in one sample (#68), the entire range was only 32.5 Hz. It may be skewed due to the presence of the aforementioned alveolar consonants, that caused spikes in the maximum and average data. As expected, the lowest pitch among the entire sample selection was 69.36 Hz and occurred in male dialogue. Surprisingly though, the highest pitch was also attained during a male (and not female) sample. This occurred in sample #99, spoken by the Hassasin. The "high point" in the pitch diagram occurs when the Hassasin says: What do you think I intend?... The extremely high elevated pitch once again occurs where there is a /t/ consonant. The male presence of the highest pitch is not particularly noteworthy, because the next two highest pitches attained were by two female characters (Vittoria Vetra and the MSNBC reporter), and their pitch levels were approximately 523.82, only 1 Hz lower than the maximum pitch in sample #99. In conclusion, Richard Poe seems to maintain a fairly consistent level of pitch, with only minor variances between male and female. The average pitches in his reading are between the male and female averages labeled as F0 in this article, which are 112.01±8.11 Hz and 204.68±19.31 Hz respectively. The uniformity of pitch may be again because he wishes to stay with a general baseline to keep consistent for the listener. Another explanation could be that he fails to replicate female speech pitch because he is physically unable to produce acceptably audible speech in a (normal) female register. Individual Character Analysis [See charts 4A and 4B.] Analyzing individual characters separately, the trends in speech rate are once again consistent in that the female characters, in general, "spoke" faster than male characters, although not completely. Chinita Macri and Vittoria Vetra had average speech rates of 3.29 W/s and 2.81 W/s respectively. The lead male characters followed, with Langdon (2.72), Kohler (2.67), Olivetti (2.67), the Camerlengo (2.588), Rocher (2.517), and Janus - who actually ends up being the Camerlengo - (2.384). Anomalies occurred for the MSNBC reporter, who is female but had a 2.3854 W/s average rate (lower than all the men), and Gunther Glick, who is male but had the highest average speech rate of all at 3.534 W/s, outflanking even the female characters. These last two cases may be explained by the fact that the sample sizes of which their averages were taken was remarkably small - only three cases each - so their speech rate was not as well represented as could be, when compared to the sample sizes of 25-40 samples for main characters.
When looking at the range of pitches, there is again a lack of trend. One particular fact that might be noted is that Vittoria Vetra's average range is lower than most other main characters. Her average range in pitch is 298.79 Hz, whereas the main male characters range from approximately 319.738 Hz to 395.62 Hz. This suggests that Vetra's dialogue wavers less in pitch and is more evenly toned, extending the inference drawn earlier that female characters, in general, have a lower pitch range and therefore waver less in pitch. Gunther Glick again provides an anomaly. Either he "speaks" very smooth and even-pitched, or the small representative sample size is the cause of his average range of 270 Hz, which is very low. Contextual Factors [See charts 5A, 5B, and data table.] In examining the pre-selected contextual and emotional tags marked with certain samples, certain predictions are justified. For example, the highest average speech rates among the categories occur in "angry," "confrontational" or "urgent" situations. These three categories returned rates of 2.759 W/s, 3.01 W/s and 2.84 W/s respectively. Dialogue labeled as "confused" also generated a particularly high average speech rate: 2.86 W/s. One can infer that Dan Brown's characters may "speak" more quickly when they are feeling confused, or that these feelings of confusion are often coupled with other emotions such as urgency or panic, that would also innately cause an elevation of speech rate. One interesting observation occurs in the average pitch of angry, confrontational and urgent dialogue situations. As explained above, all contain elevated speech rates, and they all also have an elevated pitch. For "confrontational" and "urgent" samples, this elevated pitch is around 144 Hz. In the case of "angry" samples though, the average pitch is much higher, at 162 Hz, a considerable increase. This makes certain sense, as one thinks of someone who is angry as shouting, and verbalizing at a higher pitch than normal. The speech rate for "thoughts" was interestingly high, at 2.72 W/s. The transcripts for these samples reveal that a variety of emotions are represented during lines of thought - defined here as musings of a character, which are not spoken aloud. If we further subdivide the category of thoughts to male- and female- spoken samples, the statistics return a 2.853 W/s average speech rate (2.71 min, 3.125 max) and an average pitch of 141.68 Hz (range of 229.29). The same thought analysis applied to male "thought" samples returns a 2.698 W/s average speech rate (1.85 min, 3.98 max) and an average pitch of 139.9 Hz (range 329.879). Comparing these numbers to those cited above in the Male versus Female analysis, the average speech rates are remarkably close. The average pitch is also similar and the pitch range shows the same trend of being higher in male samples. In conclusion, the trends here show that Richard Poe voices out thoughts no differently than normal speech; he narrates thoughts (italicized in the novel) almost exactly as he does spoken text. Introductions and explanatory texts were slightly below the normal speech average, at 2.43 W/s and 2.5 W/s respectively. I was surprised that "end of chapter" texts were, on average, actually faster than introductions, at 2.53 W/s. However, the "end of chapter" category did contain the lowest minimum speech rate of all the samples, at 1.301 W/s, showing that at least one sample correlated with my hypothesis that Richard Poe would slow down end-of-the-chapter lines in order to create more drama and suspense. A further examination of the twenty "end of chapter" samples that generated this average showed that the majority of the samples did not lie in the 2.4-2.6 W/s range, as the average would suggest. In fact, fourteen of the twenty samples contain speech rates "on opposite ends" (which consequently average out to the middle). In this case, the average is slightly misleading, and it seems like end of the chapter lines are treated differently: they are more likely to be slow and drawn out, or fast paced and frantic (such as in sample #60). Pitch wise, the average for "end of the chapter" lines was slightly at 155.9 Hz (as compared to the female/male averages of ~145 Hz). Lines in Italian showed an interesting trend that the average speech rate was significantly slower than "normal" (English) lines. At a 1.86 W/s average, this is nearly one-third less fast than the male and female English averages cited earlier. One explanation may be that Italian words are naturally shorter, or the sample taken in this audiobook (a total of nine samples) contained particularly short words. Therefore producing the same amount of "sound" (syllabically) translates to more words, and the same amount of time divided by a higher number of words results in a slower speech rate. The average pitch in Italian lines is higher (156.109 Hz) but the range was lower (262.09 Hz, as compared to the 300+ Hz ranges in English). Ultimately, these results speculate that producing Italian words elicits an overall higher base pitch, but this pitch wavers less. [See diagram 6.] Sad, shocked, happy, and "flashback" lines contained no particularly shocking results. Sad, or emotional, samples was somewhat fast to my surprise - an average of 2.65 W/s. Shocked, on the other hand, was only 2.27 W/s, supporting the theory that Richard Poe prefers to draw out shocked lines in a wide-eyed, "Oh… dear… wow" sort of manner - one that is comparatively slow, and not a manner that is fast or frantic. For sad samples, the pitch average was averaged at 144.66, lower than the averages of shocked, happy, and flashback texts which bordered 155.4 Hz. Final Thoughts From the data gathered on speech rate and pitch, it seems as if pitch plays an extremely small factor in Richard Poe's method of variation of character voice. This, however, may be due to experimental error and pitch analysis interference due to the technology available. Speech rate shows more fruitful results, somewhat guiding a distinction between faster "speaking" female characters and slower male characters. The conclusions gathered for contextual factors were more interesting, and supported several common sense hypotheses that certain situations dictate a more naturally faster or slower pace. The results here are interesting and the results could be generalized to a situation in which one single person (in this case, the narrator) is attempting to impersonate the speech of many other persons, in particular those of the opposite gender. |
|||
| Fall 2007, University of Pennsylvania |