The Phonetic Analysis of Speech Corpora

Peters, B. (2006) Form und Funktion prosodischer Grenzen im Gespräch. PhD dissertation, Institute of Phonetics and digital Speech Processing, University of Kiel, Germany

Download 1.58 Mb.

Page	28/30
Date	29.01.2017
Size	1.58 Mb.
	#11978

1 ... 22 23 24 25 26 27 28 29 30

Figure legends

Peters, B. (2006) Form und Funktion prosodischer Grenzen im Gespräch. PhD dissertation, Institute of Phonetics and digital Speech Processing, University of Kiel, Germany.

Peterson, G., and Barney, H. L. (1952) Control methods used in a study of the vowels, Journal of the Acoustical Society of America 24, 175–184.

Pierrehumbert, J. B. (1980) The Phonology and Phonetics of English Intonation. Ph.D. dissertation, MIT. [Published by Indiana University Linguistics Club, Bloomington].
Pierrehumbert, J. (2002). Word-specific phonetics. In C. Gussenhoven, C. and N. Warner (eds.) Laboratory phonology 7. Mouton de Gruyter: Berlin and New York. (p. 101-140).
Pierrehumbert, J. (2003a). Probabilistic phonology: discrimination and robustness. In R. Bod, J. Hay, J. and S. Jannedy (Eds.) Probabilistic Linguistics. MIT Press: Cambridge, Mass. p. 177-228.
Pierrehumbert, J. (2003b). Phonetic diversity, statistical learning, and acquisition of phonology. Language & Speech, 46, 115-154.
Pierrehumbert, J. (2006). The next toolkit. Journal of Phonetics, 34, 516-530.
Pierrehumbert, J. and Talkin. D. (1990) Lenition of /h/ and glottal stop. In: Gerard J.

Doherty and D. Robert Ladd (eds.), Gesture, Segment, Prosody (Papers in Laboratory

Phonology 2). Cambridge: Cambridge University Press. p. 90-117.
Pitrelli, J., Beckman, M.E., and Hirschberg, J. (1994). Evaluation of prosodic transcription labeling reliability in the ToBI framework. Proceedings of the International Conference on Spoken Language Processing, p. 123-126.
Pitt, M., Johnson, K., Hume, E., Kiesling, S. and Raymond, W. (2005). The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Communication, 45, 89–95.
Pols, L. (2001) The 10-million-words Spoken Dutch Corpus and its possible use in experimental phonetics. In Proceedings of 100 Years of Experimental Phonetics in Russia. St. Petersburg, Russia. p. 141-145.
Pols, L., Tromp, H., and Plomp, R., (1973). Frequency analysis of Dutch vowels from 50 male speakers. Journal of the Acoustical Society of America 53, 1093-1101.
Potter, R.K., Kopp, G. and Green, H., (1947) Visible Speech. Dover Publications, New York.
Potter, R. K., and Steinberg, J. C. (1950) Toward the specification of speech. Journal of the Acoustical Society of America 22, 807–820.
Quené, H. & van den Bergh, H. (2008) Examples of mixed-effects modeling with crossed

random effects and with binomial data. Journal of Memory and Language, 59, 413-425.

Rastle, K., Harrington, J., and Coltheart, M. (2002). 358,534 Nonwords: The ARC nonword database. Quarterly Journal of Experimental Psychology, 2002,55A(4), 1339–1362.
Raymond, W., Dautricourt, R., and Hume, E. (2006). Word-medial /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors. Language Variation and Change, 18, 55-97.
Recasens, D. (2004). The effect of syllable position on consonant reduction: Evidence from Catalan consonant clusters. Journal of Phonetics 32, 435-453.
Recasens, D., Farnetani, E., Fontdevila, J. and Pallarès, M.D. (1993) An electropalatographic study of alveolar and palatal consonants in Catalan and Italian. Language and Speech, 36, 213-234.
Reed, M., DiPersio, D. and Cieri, C. (2008). The Linguistic Data Consortium member survey: purpose, execution and results. In Proceedings of the Sixth International Language Resources and Evaluation, 2969-2973.
Roach, P., Knowles, G., Varadi, T., and Arnfield, S. (1993) MARSEC: A Machine-Readable Spoken English Corpus. Journal of the International Phonetic Association, 23, 47-54.
Robson, C. (1994) Experiment, Design and Statistics in Psychology. Penguin Books.
Rose, P. (2002) Forensic Speaker Identification. Taylor and. Francis: London.
Saltzman, E. L., & Munhall, K. G. (1989) A dynamical approach to gestural patterning in speech production. Ecological Psychology, 1, 333-382.
Sankoff, G. (2005) Cross-Sectional and Longitudinal Studies In U. Ammon, N. Dittmar, K.. Mattheier, and P. Trudgill, (Eds.) An International Handbook of the Science of Language and Society, Volume 2, 2. Berlin: de Gruyter, p.1003-1013.
Schiel F (1999) Automatic phonetic transcription of non-prompted speech. Proceedings of the International Conference of Phonetic Sciences, 607-610.
Schiel, F. (2004). MAUS goes iterative. In Proc. of the IV. International Conference on Language Resources and Evaluation, p 1015-1018.
Schiel, F. & Draxler, C. (2004) The Production of Speech Corpora. Bavarian Archive for Speech Signals: Munich. Available from: http://www.phonetik.uni-muenchen.de/forschung/Bas/BasLiteratur.html.
Schouten, M. E. H., and Pols, L. C. W. (1979). CV- and VC-transitions: a spectral study of coarticulation, Part II, Journal of Phonetics, 7, 205-224.
Schafer, A., Speer, S., Warren, P. & White. S. (2000). Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research, 29, 169-182.
Shearer, W. (1995) Experimental design and statistics in speech science. In W.J. Hardcastle & J. Laver (Eds.) The Handbook of Phonetic Sciences. Blackwell. p. 167-187.
Short, T. (2005) R/Rpad Reference Card. http://www.rpad.org/Rpad/Rpad-refcard.pdf
Shriberg, L. and Lof, G., (1991). Reliability studies in broad and narrow phonetic transcription. Clinical Linguistics and Phonetics, 5, 225–279.
Silverman, K., Beckman, M.E., Pitrelli, J., and Ostendorf, M. (1992). TOBI: A standard for labelling English prosody. Proceedings International Conference on Spoken Language Processing, Banff, p. 867-870.
Silverman, K. and J. Pierrehumbert (1990) The Timing of prenuclear high accents in English, Papers in Laboratory Phonology I, Cambridge University Press, Cambridge. p. 72-106.
Simpson, A. (2001) Does articulatory reduction miss more patterns than it accounts for? Journal of the International Phonetic Association, 31, 29-39.
Simpson, A. (2002) Gender-specific articulatory-acoustic relations in vowel sequences. Journal of Phonetics, 30, 417-435.
Simpson, A. (1998). Phonetische Datenbanken des Deutschen in der empirischen Sprachforschung und der phonologischen Theoriebildung. Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel. 33.
Simpson, A., Kohler, K., and Rettstadt, T. (1997). The Kiel Corpus of Read/Spontaneous Speech: acoustic data base, processing tools. Arbeitsberichte des Instituts für Phonetik und digitale Sprachverarbeitung der Universität Kiel, 32, 31-115.
Sjölander, K. and Beskow, J. (2006). Wavesurfer. http://www.speech.kth.se/wavesurfer/
Sjölander, K. (2002) Recent developments regarding the WaveSurfer speech tool. Dept. for Speech, Music and Hearing Quarterly Progress and Status Report, 44, 53-56.
Srivastava, S., Gupta, M, and Frigyik, B. (2007) Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8, 1277-1305.
Stephenson, L.S. (2005) An electropalatographic and acoustic analysis of frequency effects in the lexicon. Unpublished PhD thesis, Macquarie Centre for Cognitive Science, Macquarie University, Sydney.
Stephenson, L.S. (2004). Lexical frequency and neighbourhood density effects on vowel production in words and nonwords. Proceedings of the 10th Australian International Conference on Speech Science and Technology, p 364-369.
Stephenson, L.S. (2003). An EPG study of repetition and lexical frequency effects in alveolar to velar assimilation. Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS-03), p. 1891-1894.
Stephenson, L. & Harrington, J., (2002). Assimilation of place of articulation: Evidence from English and Japanese. Proceedings of the 9th Australian International Conference on Speech Science and Technology, p. 592-597.
Stirling, L., Fletcher, J., Mushin, I. and Wales, R. (2001). Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse. Speech Communication, 33, 113-134.
Sussman, H.M., McCaffrey, H., and Matthews, S.A., (1991) An investigation of locus equations as a source of relational invariance for stop place categorization. Journal of the Acoustical Society of America, 90, 1309-1325.
Sussman, H.M., Fruchter, D., Cable, A., (1995) Locus equations derived from compensatory articulation. Journal of the Acoustical Society of America 97, 3112-3124.
Syrdal, A. K., and Gopal, H. S. (1986) A perceptual model of vowel recognition based on the auditory representation of American English vowels. Journal of the Acoustical Society of America 79, 1086–1100.
Syrdal, A., and McGory, A. (2000) Inter-transcriber reliability of ToBI prosodic labeling. Proceedings of the International Conference on Spoken Language Processing, Beijing: China, p. 235-238.
Taylor, P., Black, A. & Caley, R (2001) Heterogeneous relation graphs as a formalism for representing linguistic information. Speech Communication, 33, 153-174.
Traunmüller, H. (1990) Analytical expressions for the tonotopic sensory scale. Journal of the Acoustical Society of America 88, 97-100.
Traunmüller, H. and Lacerda, F., (1987) Perceptual relativity in identification of two-formant vowels. Speech Communication, 6, 143 -157.
Trochim, M. (2007). Research Methods Knowledge Base. Thomson: London. Also online at: http://www.socialresearchmethods.net/kb/index.php
Trudgill, P. (1988). Norwich revisited: Recent linguistic changes in an English urban

dialect. English World Wide, 9, 33-49.

Vance, A. (2009) Data analysts captivated by R’s power. Article in the Business Computing section of the New York Times, Jan. 6^th 2009.
Vasishth, S. (in press). The foundations of statistics: A simulation-based approach. http://www.ling.uni-potsdam.de/~vasishth/SFLS.html
van Bergem, D.R. (1993) Acoustic vowel reduction as a function of sentence accent, word stress, and word class, Speech Communication, 12, 1-23.
van Bergem, D.R. (1994). A model of coarticulatory effects on the schwa. Speech Communication, 14, 143- 62.
van Son, R., and Pols, L. (1990). Formant frequencies of Dutch vowels in a text, read at normal and fast rate. Journal of the Acoustical Society of America 88, 1683-1693.
Verbrugge, R., Strange, W., Shankweiler, D.P. and Edman, T.R., (1976) What information enables a listener to map a talker's vowel space? Journal of the Acoustical Society of America, 60, 198-212.
Watson, C. I., and Harrington, J. (1999) Acoustic evidence for dynamic formant trajectories in Australian English vowels. Journal of the Acoustical Society of America 106, 458–468.
Wedel, A. (2006). Exemplar models, evolution and language change. The Linguistic Review, 23, 247–274.
Wedel, A. (2007) Feedback and regularity in the lexion. Phonology, 24, 147-185.
Weenink, (2001) Vowel normalization with the TIMIT corpus. Proceedings of the Institute of Phonetic Sciences, University of Amsterdam, 24, 117–123.
Wells, J.C. (1997) SAMPA computer readable phonetic alphabet. In Gibbon, D., Moore, R. and Winski, R. (eds.), Handbook of Standards and Resources for Spoken Language Systems. Berlin and New York: Mouton de Gruyter. Part IV, section B.
Wesenick, M. & Kipp, A. (1996) Estimating the quality of phonetic transcriptions and segmentations of speech signals. Proceedings of the International Conference on Spoken Language Processing, 129-132.
Wesener, T. (2001) Some non-sequential phenomena in German function words. Journal of the International Phonetic Association, 31, 17-27.
Westbury, J. R. (1994) X-ray Microbeam Speech Production Database User’s Handbook, Version 1.0. Madison, WI.
Wiese, R. (1996) The Phonology of German. Clarendon Press: Oxford.
Wrench, A. and Hardcastle, W. (2000) A multichannel articulatory speech database and its application for automatic speech recognition. Proc. 5th seminar on speech production: models and data, 305-308.
Wright, R. (2003). Factors of lexical competition in vowel articulation. In J. Local, R. Ogden, and R. Temple (Eds.), Laboratory Phonology VI, p. 75-87. Cambridge University Press. Cambridge.
Wuensch, K. (2009). Karl Wuensch's statistics lessons http://core.ecu.edu/psyc/wuenschk/StatsLessons.htm
Yoon, T., Chavarria, S., Cole, J., & Hasegawa-Johnson,. M., (2004) Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. Proceedings of the International Conference on Spoken Language Processing, Nara: Japan. p. 2729- 2732.
Zierdt, A. (2007) Die Entwicklung der Messtechnik für ein fünfdimensionales elektromagnetisches Artikulographensystem. Ph.D diss. Institute of Phonetics and Speech Processing, University of Munich.
Zwicker, E. (1961). Subdivisions of the audible frequency range into critical bands. Journal of the Acoustical Society of America, 33.
Figure legends
Fig. 2.1 An overview of the relationship between the stages of creating, querying, and analysing speech corpora.
Fig. 2.2. The Emu Database Tool as it appears when you first start up Emu. The left pane is for showing the available databases, the right pane for the utterances that each database is composed of.
Fig. 2.3. The Emu DatabaseInstaller is accessible from Arrange Tools. To install any of the available databases, first specify a path to which you want to save the data from New Database Storage and then click on any of the zip files. You must have an internet connection for this to work.
Fig. 2.4. The result of following the procedure described in Fig. 2.3 is access to the database first that is made up of five utterances shown on the right. The utterance names are displayed by selecting first in Databases on the left followed by Load Database. Double clicking any of the names in the Utterances pane on the right causes the utterance to be opened (Fig. 2.5).
Fig. 2.5. The display that appears when opening utterance gam001 showing four labelling tiers, a waveform, and a spectrogram. The two vertical lines show a selection. To make a selection, position the mouse in the waveform, hold down the left button and sweep without letting go of the left button to the desired position later in time, then release the button. To zoom in to the selection (Fig. 2.6), click the ↔ symbol in the top left of the display.
Fig. 2.6. The resulting display after zooming in to the segment marks in Fig. 2.5. The following further adjustments were also made. Firstly, click the button inside the ellipse on the left to get the pull out menu shown over the spectrogram and then adjust the contrast and brightness sliders and reset the maximum spectral range to 4000 Hz. You can also produce a narrow band spectrogram showing harmonics by resetting the bandwidth to e.g., 45 Hz. The waveform and spectrogram windows can be made bigger/smaller using the triangles shown inside the ellipse on the right.
Fig. 2.7. The steps for opening the utterance gam002 in Praat from Emu. Click gam002 in the Utterances pane once to select it, then Praat from the Open with... pull-down menu. Praat must already be running first for this to work.
Fig. 2.8. The utterance gam002 opened in Praat and segmented and labelled at the Word tier.
Fig. 2.9. The corresponding display in Emu (obtained by double clicking gam002- see Fig. 2.7) after labelling the data with Praat in the manner of Fig. 2.8. The other labelling tiers have been removed from the display with Display → SignalViewLevels and then by de-selecting Phoneme, Phonetic, Target.
Fig. 2.10. Opening files in Praat. Open the utterance msajc023.wav with Read → Read from File in the left pane, then select to TextGrid from the Annotate pull-down menu to bring up the pane shown top right and enter Word as a segment tier. After clicking the OK button in the top right pane, the TextGrid object will appear in the Praat objects window as shown below right. Select both the sound file and this TextGrid object together to derive the initially unlabelled waveform and spectrogram in Fig. 2.11.

Fig. 2.11. The audio file msajc023.wav segmented and labelled into words. Save the TextGrid to the same directory where msajc023.wav is located with File → Write TextGrid to text file.

Fig. 2.12. The labConvert window for inter-converting between Emu and Praat label files. Click on Praat 2 Emu to bring up this window, and enter the full path and filename for msajc023.TextGrid under Input File as shown above. Then choose a directory into which the output of the conversion is to be written. Make sure you check the box templatefile as shown in order to create an Emu template during the conversion. Begin the conversion with Start.
Fig. 2.13 The files in the first directory after converting the Praat TextGrid. At this point, you should rename the template file p2epreparedtpl.tpl to something else e.g., jec.tpl.
Fig. 2.14. The Emu Database Tool showing the new database whose template should be edited with Edit Template.
Fig. 2.15. The Tracks (above) and Variables (below) panes of the template file for the database jec. Specify the extension as wav and the path as x/first, where x is the directory in which msajc023.wav is stored. For the Variables pane, specify the primary extension as wav: the utterances of the database will then be defined to be all wav files that are found under Path of the Tracks pane.
Fig. 2.16. The Emu Database Tool showing the database jec. The utterances are accessible after editing the template file in the manner described in Fig. 2.15 and then selecting Load Database. Double clicking on the utterance name opens the utterance in Emu as shown on the right.
Fig. 2.17. The information to be entered in the Levels pane.
Fig. 2.18. The information to be entered in the Labfiles pane.
Fig. 2.19. The information to be entered in the Tracks pane.
Fig. 2.20. The information to be entered in the Variables pane.
Fig. 2.21. The Emu configuration editor showing the paths for the template files.
Fig. 2.22 The Emu Database Tool showing the myfirst database and associated utterances.
Fig. 2.23 The utterance gam007 showing a segmentation into words and a single i: segment at the Phoneme tier.
Fig. 2.24. The labConvert window to convert the Emu annotations into a Praat TextGrid. Select myfirst from the … pull-down menu at the top, then gam007 from the … menu in the middle, and then choose Automatic to save the TextGrid to the same directory in which the Emu annotations are stored. Finally select Start.
Fig. 2.25. The same utterance and annotations in Fig. 2.23 as a Praat TextGrid.
Fig. 3.1 A schematic view of the phonetic vowel quadrilateral and its relationship to the first two formant frequencies.
Fig. 3.2. Spectrogram and superimposed second formant frequency of a production by a male speaker of the German word drüben with phonetic segments and boundaries shown. From Harrington (2009).
Fig. 3.3. The Emu Database Tool after downloading the database second.zip. Enter gam* and confirm with the ENTER key to select all utterances beginning with gam (the male speaker) then select Send to tkassp from the Utterance List… menu to bring up the tkassp window in Fig. 3.4.
Fig. 3.4. Upon selecting Send to tkassp (Fig. 3.3) a window (shown in the middle of this figure) appears asking whether samples should be selected as the input track. Selecting OK causes the sampled speech data (audio files) of the utterances to appear in the pane on the left. Check the forest box as shown to calculate formants and choose the forest pane (at the top of the display) to see the default parameters. Leaving the default output as auto (top right) causes the formants to be stored in the same directory as the audio files from which the formants have been calculated. The calculation of formants is done with the default settings (shown on the right) which include a window size of 25 ms and a window shift of 5 ms. The formant files are created with an extension .fms. When you are ready to calculate the formants, select Perform Analysis.
Fig. 3.5. The additions to the Tracks (above) and View (below) panes that are needed for displaying the formants. Select Add New Track then enter fm under Track, fms for the extension and copy the path from the audio file (the path next to wav). In the View pane, check the fm box which will have the effect of overlaying formants on the spectrograms. Finally, save the template file.
Fig. 3.6. The utterance gam002 with overlaid formants and spectrogram parameters readjusted as shown in the Figure. The cursor is positioned close to an evident tracking error in F2, F3, and F4. The pen buttons on the left can be used for manual correction of the formants (Fig. 3.7).
Fig. 3.7. Manual correction (below) of the F2-formant tracking error (inside the ellipse). The spectrogram is from the same utterance as in Fig. 3.6 but with the frequency range set to 0 - 1500 Hz. Selecting the pen color corresponding to that of F2 has the effect of showing the F2 values on the track as points. In order to change the F2 values manually, hold down the left mouse button without letting go and sweep across the spectrogram, either from left to right or right to left slowly in order to reposition the point(s). When you are done, release the mouse and select the same pen color again. You will then be prompted to save the data. Choosing yes causes the formant file to be overwritten. Choosing no will still have the effect of redrawing the track according to your manual correction, but when you close the window, you will be asked again to save the data. If you choose no again, then the formant changes will not be saved.
Fig. 3.8 A flow diagram showing the relationship between signals, annotations, and the output, ellipses. Entries followed by () are functions in the Emu-R library. Remember to enter library(emu) at the R prompt to make use of any of these functions.
Fig. 3.9. A display of the first four formants in R (left) and the corresponding formant display in Emu (right) for an [i:] vowel from the same gam006 utterance. The vertical line in the display on the left marks the temporal midpoint of the vowel at 562.5 ms and can be marked with abline(v=562.5), once the formant data have been plotted.

Fig. 3.10. 95% confidence ellipses for five vowels from isolated words produced by a male speaker of Standard German.

Fig. 3.11 Vowels for the female speaker agr in the F2 × F1 plane (values extracted at the temporal midpoint of the vowel), after (left) and before (right) correction of an outlier (at F2 = 0 Hz) for [u:].
Fig. 3.12. The tkassp window for calculating intensity data for the aetobi database. Select the … button in the top right corner then choose manual and from that the directory into which you want to store the intensity data. Make a note of the directory (path) because it will need to be included in the template file to tell Emu where these intensity data are located on your system.
Fig. 3.13. The required modifications to the aetobi template file in order to display intensity data in Emu in the Tracks (top) and View (below) panes. The path entered in the Tracks pane is the one to which you wrote the intensity data in Fig. 3.12.
Fig. 3.14. The utterance bananas showing a spectrogram and intensity contour.
Fig. 3.15. The same utterance as in Fig. 3.14 showing only the Word tier and intensity contour.
Fig. 3.16. The defaults of the rmsana pane set to a window shift and size of 2 ms and 10 ms respectively and with an output extension rms2. These data will be saved to the same directory as the one selected in Fig. 3.13.

Directory: ~jmh -> research -> pasc010808
pasc010808 -> The Phonetic Analysis of Speech Corpora

Download 1.58 Mb.

Share with your friends:

1 ... 22 23 24 25 26 27 28 29 30