The Phonetic Analysis of Speech Corpora


F2 locus, place of articulation and variability



Download 1.58 Mb.
Page16/30
Date29.01.2017
Size1.58 Mb.
#11978
1   ...   12   13   14   15   16   17   18   19   ...   30

6.7 F2 locus, place of articulation and variability

In the production of an oral stop, the vocal tract is initially sealed during which time air-pressure builds up and is then released. At the point of release, the shape of the vocal tract has a marked influence on the acoustic signal and this influence extends at least to the beginning of the formant transitions, if the following segment is a vowel. Since the shape of the vocal tract is quite different for labial, alveolar, and velar places of articulation, the way in which the acoustic signal is influenced following the release differs correspondingly. If the influence extends to the onset of periodicity for the following vowel, then the formant onset frequencies of the same phonetic vowel should be different when it occurs after consonants at different places of articulation.

As a study by Potter, Kopp and Green (1947) had shown, different places of articulation have their greatest influence on the onset of the second formant frequency. But it was the famous perception experiments using hand-painted spectrograms at the Haskins Laboratories that gave rise to the concept of an F2-locus. In these experiments, Liberman and colleagues (Delattre, Liberman & Cooper, 1955; Liberman, Delattre, Cooper & Gerstman, 1954; Liberman, et al, 1958) showed that the origin of the second formant frequency influenced the perception of the place of articulation of the following consonant. More specifically, if the F2-onset was low and at around 700 Hz, listeners would predominantly hear /b/; if it was roughly in the 1800 Hz region, they would hear /d/; while if the F2-onset began near 3000 Hz, listeners would perceive /g/ before front vowels. The locus frequency was not at the acoustic vowel onset itself, but at some point prior to it during the consonantal closure. More specifically, in synthesising a CV transition, formants would be painted from the F2-locus somewhere in the consonant closure to the F2-vowel target and then the first part of the transition up to where the voicing for the vowel began would be erased. With these experiments, Liberman and colleagues also demonstrated that the perception of place of articulation was categorical and this finding was interpreted in favour of the famous motor theory of speech perception (see e.g. Hawkins, 1999 for a thorough review of these issues).

In the years following the Haskins Laboratories experiments, various large-scale acoustic studies were concerned with finding evidence for an F2-locus (including e.g., Fant, 1973; Kewley-Port, 1982; Lehiste & Peterson, 1961; Öhman, 1966). These studies showed that the most stable F2-locus was for alveolars (i.e., alveolars exhibit the least F2-onset variation of the three places of articulation) whereas velars, which exhibit a great deal of variability depending on the backness of the following vowel, showed no real evidence from acoustic data of a single F2-locus.



From the early 1990s, Sussman and Colleagues applied the concept of a locus equation to suggest that F2 transitions might in some form provide invariant cues to the place of articulation (see e.g., Sussman et al, 1991; Sussman et al, 1995), a position that has also not been without its critics (e.g., Brancazio & Fowler, 1998; Löfqvist, 1999).

Locus equations were first investigated systematically by Krull (1988, 1989) and they provide a numerical index of the extent to which vowels exert a coarticulatory influence on a consonant's place of articulation. Thus, whereas in studies of vowel undershoot, the concern is with the way in which context influences vowel targets, here it is the other way round: that is, the aim is to find out the extent to which vowel targets exert a coarticulatory influence on the place of articulation of flanking consonants.
Fig. 6.22 about here
The basic idea behind a locus equation is illustrated with some made-up data in Fig. 6.22 showing two hypothetical F2-trajectories for [bɛb] and [bob]. In the trajectories on the left, the extent of influence of the vowel on the consonant is nil. This is because, in spite of very different F2 vowel targets, the F2 frequencies of [ɛ] and of [o] both converge to exactly the same F2-locus at 700 Hz for the bilabial – that is, the F2-onset frequency is determined entirely by the consonant with no influence from the vowel. The other (equally unlikely) extreme is shown on the right. In this case, the influence of the vowel on the consonant's place of articulation is maximal so that there is absolutely no convergence to any locus and therefore no transition.

A locus equation is computed by transforming F2 frequencies (top row, Fig. 6.22) as a function of time into the plane of F2-target × F2-onset (bottom row, Fig. 6.22) in order to estimate the extent of coarticulatory influence of the vowel on the consonant. Firstly, when there is no vowel-on-consonant coarticulation (left), the locus equation, which is the line that connects [bɛb] and [bob] is horizontal: this is because they both converge to the same locus frequency and so F2-onset is 700 Hz in both cases. Secondly, when vowel-on-consonant coarticulation is at a maximum (right), then the locus equation is a diagonal in this plane because the locus frequency is equal to the target frequency.

A fundamental difference between these two extreme cases of coarticulation is in the slope of the locus equation. On the left, the slope in this plane is zero (because the locus equation is horizontal) and on the right it is one (because F2Target = F2Onset). Therefore, for real speech data, the slope of the locus equation can be used as a measure of the extent of vowel-on-consonant coarticulation: the closer the slope is to one, the more the consonant's place of articulation is influenced by the vowel (and the slope must always lie between 0 and 1 since these are the two extreme cases of zero and maximal coarticulation). Finally, it can be shown (with some algebraic manipulation: see Harrington & Cassidy 1999, p. 128-130) that the locus frequency itself, that is the frequency towards which transitions tend to converge, can be estimated by establishing either where the locus equation and the line F2Target = F2Onset bisect. This is shown for the case of zero vowel-on-consonant coarticulation on left in Fig. 6.22: this line bisects the locus equation at the frequency at which the transitions in the top left panel of Fig. 6.22 converge, i.e. at 700 Hz which is the locus frequency. On the right, the line F2Target = F2Onset cannot bisect the locus equation, because it is the same as the locus equation. Because these lines do not bisect, there is no locus frequency, as is of course obvious from the 'transitions' in Fig. 6.22 top right that never converge.

This theory can be applied to some simple isolated word data produced by the first author of Clark et al (2007). In 1991, John Clark produced a number of isolated /dVd/ words where the V is one of the 13 possible monophthongs of Australian English. The relevant dataset which is part of the Emu-R library includes a segment list of the vowel in these words (isol), a vector of vowel labels (isol.l) and a trackdata object of the first four formant frequencies between the vowel onset and offset (isol.fdat).

The task is to investigate whether the coarticulatory influence of the vowel is greater on the final, than on the initial, consonant. There are various reasons for expecting this to be so. Foremost are the arguments presented in Ohala (1990) and Ohala & Kawasaki (1984) that initial CV (consonant-vowel) transitions tend to be a good deal more salient than VC transitions: compatibly, there are many more sound changes in which the vowel and syllable-final consonant merge resulting in consonant loss (such as the nasalization of vowels and associated final nasal consonant deletion in French) than is the case for initial CV syllables. This would suggest that synchronically a consonant and vowel are more sharply delineated from each other in CV than in VC syllables and again there are numerous aerodynamic and acoustic experiments to support this view. Secondly, there are various investigations (Butcher, 1989, Hoole et al, 1990) which show that in V1CV2 sequences where C is an alveolar, the perseverative influence of V1 on V2 is greater than the anticipatory influence of V2 on V1. This suggests that the alveolar consonant resists coarticulatory influences of the following V2 (so the alveolar has a blocking effect on the coarticulatory influences of a following vowel) but is more transparent to the coarticulatory influences of the preceding V2 (so the alveolar does not block the coarticulatory influences of a preceding vowel to the same extent). Therefore, the vowel-on-consonant coarticulatory influences can be expected to be weaker when the vowel follows the alveolar in /dV/ than when it precedes it in /Vd/.
Fig. 6.23 about here
Before proceeding to the details of quantification, it will be helpful as always to plot the data. Fig. 6.23 shows the second formant frequency trajectories synchronised at the vowel onset on the left and at the vowel offset on the right. The trajectories are of F2 of the separate vowel categories and there is only one token per vowel category, as table(isol.l) shows49. The plots can be obtained by setting the offset argument in this function argument to 0 (for alignment at the segment onset) and to 1 (for alignment at the segment offset).
par(mfrow=c(1,2))

dplot(isol.fdat[,2], offset=0, ylab = "F2 (Hz)", xlab="Time (ms)")

dplot(isol.fdat[,2], offset=1, xlab="Time (ms)")
It seems clear enough from Fig. 6.23 that there is greater convergence of the F2-transitions at the alignment point on the left (segment onset) than on the right (segment offset). So far, the hypothesis seems to be supported.

The task now is to switch to the plane of F2-target × F2-onset (F2-offset) and calculate locus equations in these two planes. The vowel target is taken to be at the temporal midpoint of the vowel. The following three vectors are F2 at the vowel onset, target, and offset respectively:


f2onset = dcut(isol.fdat[,2], 0, prop=T)

f2targ = dcut(isol.fdat[,2], .5, prop=T)

f2offset= dcut(isol.fdat[,2], 1, prop=T)
The next step is to plot the data and then to calculate a straight line of best fit known as a regression line through the scatter of points: that is, there will not be two points that lie on the same line as in the theoretical example in Fig. 6.22, but several points that lie close to a line of best fit. The technique of linear regression, which for speech analysis in R is described in detail in Johnson (2008), is to calculate such a line, which is defined as the line to which the distances of the points are minimised. The function lm() is used to do this. Thus for the vowel onset data, the second and third commands below are used to calculate and draw the straight line of best fit:
plot(f2targ, f2onset)

regr = lm(f2onset ~ f2targ)

abline(regr)
The information about the slope is stored in $coeff which also gives the intercept (the value at which the regression line cuts the y-axis, i.e. the F2-onset axis). The slope of the line is just over 0.27:
regr$coeff

(Intercept) f2targ

1217.8046955 0.2720668
Recall that the best estimate of the locus is where the line F2Target = F2Onset cuts this regression line. Such a line can be superimposed on the plot as follows:
abline(0, 1, lty=2)
There is a function in the Emu-R library, locus(), that carries out all of these operations. The first two arguments are for the data to be plotted on the x- and y-axes (in this case the F2-target and the F2-onset respectively) and there is a third optional argument for superimposing a parallel set of labels. In this example, the vowel labels are superimposed on the scatter and the x- and y-ranges are set to be identical (Fig. 6.24):
xlim = c(500, 2500); ylim = xlim; par(mfrow=c(1,2))

xlab = "F2-target (Hz)"; ylab = "F2-onset (Hz)"

stats.on = locus(f2targ, f2onset, isol.l, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)

stats.off = locus(f2targ, f2offset, isol.l, xlim=xlim, ylim=ylim, xlab=xlab)


Fig. 6.24 about here
Fig. 6.24 shows that the regression line (locus equation) is not as steep for the F2-onset compared with the F2-offset data. The following two commands show the actual values of the slopes (and intercepts) firstly in the F2-target × F2-onset plane and secondly in the F2-target × F2-offset plane:
stats.on$coeff

(Intercept) target

1217.8046955 0.2720668
stats.off$coeff

(Intercept) target

850.4935279 0.4447689
The locus() function also calculates the point at which the regression line and the lines F2Target = F2Onset (Fig. 6.24, left) and F2Target = F2Offset (Fig. 6.24, right) bisect, thus giving a best estimate of the locus frequency. These estimates are in $locus and the calculated values for the data in the left and right panels of Fig. 6.24 respectively are 1673 Hz and 1532 Hz. In fact, the locus frequency can also be derived from:
(1) L = c/(1 – α)
where L is the locus frequency and c and α are the intercept and slope of the locus equation respectively (Harrington, 2009). Thus the estimated locus frequency for the CV syllables is equivalently given by:
1217.8046955/(1- 0.2720668)

1672.962
The reason why there is this 141 Hz difference in the calculation of the locus frequencies for initial as opposed to final /d/ is not immediately clear; but nevertheless as Fig. 6.23 shows, these are reasonable estimates of the point at which the F2 transitions tend, on average, to converge. Finally, statistical diagnostics on the extent to which the regression line could be fitted to the points is given by the summary function:


summary(stats.on)

Call:


lm(formula = onset ~ target)
Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.218e+03 5.134e+01 23.721 8.51e-11 ***

target 2.721e-01 3.308e-02 8.224 5.02e-06 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 53.12 on 11 degrees of freedom

Multiple R-squared: 0.8601, Adjusted R-squared: 0.8474

F-statistic: 67.64 on 1 and 11 DF, p-value: 5.018e-06
The probability that the regression line slope accounts for the data is given by the results of t-test in the line beginning target. The null hypothesis is that there is no relationship at all between the F2-onset and F2-target (which would mean that the slope of the regression line would be zero): the t-test computes the likelihood that this could be the case. This probability is shown to be 5.02e-06 or 0.00000502 i.e., very small. So the null hypothesis is rejected. The other important diagnostic is given under adjusted R-squared which is a measure of how much of the variance is explained by the regression line. It shows that for these data, just over 86% of the variance is explained by the regression line and the probability that this value is different from zero is given by the results of an F-test in the next line beginning 'F-statistic'.

The analysis of these (very limited) data lends some support to the view that the coarticulatory influence of the vowel on an alveolar stop is greater when the consonant is in final than when it is in initial position. However, data from continuous speech and above all from other consonant categories will not always give such clear results, so the locus equation calculations must be used with care. In particular, an initial plot of the data to check the formant transitions for any outliers due to formant tracking errors is essential. If the slope of the line is negative or greater 1, then it means either that the data could not be reliably modelled by any kind of regression line and/or that no sensible locus frequency could be computed (i.e., the formant transitions do not converge, and the resulting locus frequency calculation is likely to be absurd).


6.8 Questions
These questions make use of existing objects in the Emu-R library.
A. Plotting vowel ellipses and removing outliers (sections 6.1 and 6.2)

A1. Extract the formant data at the temporal midpoint from the trackdata object vowlax.fdat and convert the Hz values to Bark.
A2. F3–F2 in Bark was suggested by Syrdal & Gopal (1986) as an alternative to F2 in Hz for representing vowel backness. Make a plot like the one in Fig. 6.25 of the female speaker's (speaker 68) E vowels in the plane of –F1 in Bark (y-axis) and F3 – F2 in Bark (x-axis) showing the data points.
A3. Using a logical vector, create a matrix that is the same as the one in A1 (of Bark values at the temporal midpoint) but which excludes the very obvious outlier shown in the upper left of Fig. 6.25.
Fig. 6.25 about here
A4. Use the same logical vector as in A3, make a vector of speaker labels from vowlax.spkr and a vector of vowel labels from vowlax.l that are each parallel to the matrix that you created in A3.
A5. Make two plots as shown in Fig. 6.26 of the male and female speakers' data in the plane

scaled to the same axes as in Fig. 6.25.


Fig. 6.26 about here
B. Finding targets (section 6.3)

B1. The trackdata object vowlax.rms contains dB-RMS data that is parallel to the segment list vowlax and the other objects associated with it. Use peakfun()from 6.3 or otherwise to find for each segment the time within the vowel at which the dB-RMS reaches a maximum value.
B2. Extract the F1 and F2 data from the trackdata object vowlax.fdat for the male speaker 67 at the target time defined by B1 and make a plot of the vowels in the F2 x F1 plane.
B3. The objects dip (segment list), dip.fdat (trackdata objects of F1-F4), dip.l (vector of diphthong labels), and dip.spkr (vector of speaker labels) are available for three diphthongs in German [aɪ, aʊ, ɔʏ] for the same two speakers as for the lax vowel data presented in this Chapter. Make a plot of F1 as a function of time for the male speaker's [aʊ] thereby verifying that F1 seems to reach a target/plateau in the first half of the diphthong.
B4. Make a matrix of the F1 and F2 values at the time at which F1 reaches a maximum in the first half of [aʊ]. Create a vector of [aʊ] labels that is parallel to this matrix (i.e., a vector of the same length as there are rows in the matrix and consisting entirely of aU, aU, aU …).
B5. The task is to check how close the quality of the diphthong at the alignment point in B4 is to the same speaker's (67) lax [a, ɔ] (a, O) vowels extracted from F1 and F2 data at the temporal midpoint. Using the data from B4 above in combination with the R objects of these lax monophthongs in A1, make a plot of the kind shown in Fig. 6.27 for this purpose. What can you conclude about the quality of the first target of this diphthong? What phonetic factor might cause it to have lower F2 values than the lax monophthong [a]?
Fig. 6.27 about here
C. Formant curvature and undershoot (section 6.4)

C1 Which diphthong, [aɪ] or [ɔʏ] would you expect to have greater curvature in F1 and why?
C2. Produce a linearly time-normalized plot of F1 of the female speaker's [aɪ] or [ɔʏ] diphthongs. Does this match your predictions in C1?
C3. Fit parabolas to F1 of the diphthongs for both speakers (i.e., create a matrix of coefficients that is parallel to the R objects for the diphthong segments). Based on the evidence of the hyperarticulation differences between the speakers presented in this Chapter, which speaker is expected to show less formant F1-curvature in the diphthongs?
C4. Produce on one page four boxplots (analogous to Fig. 6.15) of the curvature-parameter for the label combinations aI.67 (the male speaker's [aɪ]), OY.67 (the male speaker's [ɔʏ]), aI.68 (the female speaker's [aɪ]), and OY.68 (the femmale speaker's [ɔʏ]).
D. F2-loci (section 6.7)

D1.

(a) How would you expect the slope of the locus equation to be different in continuous speech compared with the type of isolated, citation-form production examined in 6.7 above?


(b) If the female speaker 68 hyperarticulates compared with the male speaker 67, i.e., produces more distinctive consonant and vowel targets, how might their locus equation slopes differ for the same place of articulation?
(c) Check your predictions by calculating and plotting locus equations in the plane of F2-Onset x F2-Target for /d/ preceding the lax vowels in the segment list vowlax separately for both speakers (the left context labels are given in vowlax.left; take F2 at the temporal midpoint for F2-target values). Set the ranges for the x- and y-axes to be between 1000 Hz and 2500 Hz in both cases.
D2. How would you expect (a) the estimated F2-locus frequency and (b) the slope of the locus equation for initial [v] to compare with those of initial [d]? Check your hypotheses by making F2 locus equations based on vowels following initial [v] analogous to the ones above for initial [d] and compare the F2-loci and slopes between these places of articulation for both speakers.
D3. Make a time-aligned plot colour-coded for [v] or [d] of F2-trajectories for the male speaker following these two consonants (a plot like the one in Fig. 6.23 but in which the trajectories are of F2 for the vowels following [d] or [v]). Is the plot consistent with your results from D2 (i.e, locus differences and slope differences?).
6.9 Answers

A1.


mid = dcut(vowlax.fdat, .5, prop=T)

mid = bark(mid)


A2.

temp = vowlax.spkr=="68" & vowlax.l=="E"

eplot(cbind(mid[temp,3]-mid[temp,2], -mid[temp,1]), vowlax.l[temp], dopoints=T, ylab="- F1 (Bark)", xlab="F3 - F2 (Bark)")
A3.

temp = vowlax.spkr=="68" & vowlax.l=="E" & mid[,3]-mid[,2] < -12

mid = mid[!temp,]
A4.

mid.sp = vowlax.spkr[!temp]

mid.l = vowlax.l[!temp]
A5.

temp = mid.sp=="67"

par(mfrow=c(1,2)); xlim = c(0, 8); ylim = c(-9, -2)

ylab="- F1 (Bark)"; xlab="F3 - F2 (Bark)"

eplot(cbind(mid[temp,3]-mid[temp,2], -mid[temp,1]), mid.l[temp], dopoints=T, xlim=xlim, ylim=ylim, xlab=xlab, ylab=ylab)

eplot(cbind(mid[!temp,3]-mid[!temp,2], -mid[!temp,1]), mid.l[!temp], dopoints=T, xlim=xlim, ylim=ylim, xlab=xlab)


B1.

mtime = trapply(vowlax.rms, peakfun, simplify=T)


B2.

form = dcut(vowlax.fdat[,1:2], mtime)

temp = vowlax.spkr == "68"

eplot(form[temp,], vowlax.l[temp], centroid=T, form=T)


B3.

temp = dip.l == "aU"

dplot(dip.fdat[temp,1], ylab="F1 (Hz)")

B4.


maxf1 = trapply(dcut(dip.fdat[,1], 0, 0.5, prop=T), peakfun, simplify=T)

temp = dip.l == "aU"

formau = dcut(dip.fdat[temp,1:2], maxf1[temp])

labsau = dip.l[temp]


B5. [ʊ] has a backing effect on [a] in the diphthong [aʊ], thus lowering F2.
temp = vowlax.l %in% c("a", "O") & vowlax.spkr == "67"

mono.f.5 = dcut(vowlax.fdat[temp,1:2], .5, prop=T)

both = rbind(mono.f.5, formau)

both.l = c(vowlax.l[temp], labsau)

eplot(both, both.l, form=T, dopoints=T, xlab="F2 (Hz)", ylab="F1 (Hz)")
C1. [aɪ] because the change in openness of the vocal tract between the two component vowels is greater than for [ɔʏ].
C2.

temp = dip.l %in% c("aI", "OY") & dip.spkr == "68"

dplot(dip.fdat[temp,1], dip.l[temp], norm=T)
Yes.
C3. Speaker 67.

coeff = trapply(dip.fdat[,1], plafit, simplify=T)


C4.

temp = dip.l %in% c("aI", "OY")

boxplot(coeff[temp,3] ~ dip.l[temp] * dip.spkr[temp])
D1.

(a) The slope in continuous speech should be higher because of the greater coarticulatory effects.

(b) The slopes for speaker 68 should be lower.

(c)


temp = vowlax.left=="d" & vowlax.spkr == "67"

on.m = dcut(vowlax.fdat[temp,2], 0, prop=T)

mid.m = dcut(vowlax.fdat[temp,2], .5, prop=T)
temp = vowlax.left=="d" & vowlax.spkr == "68"

on.f = dcut(vowlax.fdat[temp,2], 0, prop=T)

mid.f = dcut(vowlax.fdat[temp,2], .5, prop=T)
par(mfrow=c(1,2)); xlim=ylim=c(1000, 2500)

l.m.d = locus(mid.m, on.m, xlim=xlim, ylim=ylim)

l.f.d = locus(mid.f, on.f, xlim=xlim, ylim=ylim)
D2.

(a) The locus frequency for [v] should be lower because labials according to the locus theory have a lower F2-locus than alveolars.


(b) The slope of the locus equation for [v] should be higher. This is because alveolars are supposed to have the most stable locus of labials, alveolars, and velars, i.e., the place of articulation for alveolars, and hence the F2-locus, shifts the least due to the effects of vowel context.
temp = vowlax.left=="v" & vowlax.spkr == "67"

on.m = dcut(vowlax.fdat[temp,2], 0, prop=T)

mid.m = dcut(vowlax.fdat[temp,2], .5, prop=T)
temp = vowlax.left=="v" & vowlax.spkr == "68"

on.f = dcut(vowlax.fdat[temp,2], 0, prop=T)

mid.f = dcut(vowlax.fdat[temp,2], .5, prop=T)
par(mfrow=c(1,2)); xlim=ylim=c(1000, 2500)

l.m.v = locus(mid.m, on.m, xlim=xlim, ylim=ylim)

l.f.v = locus(mid.f, on.f, xlim=xlim, ylim=ylim)
The F2 locus for [v] (given by l.m.v$locus and l.f.v$locus) for the male and female speakers respectively are, rounded to the nearest Hz, 749 Hz and 995 Hz, i.e. markedly lower than for alveolars. The slopes of the locus equations for [v] (given by entering by l.m.v and l.f.v) are 0.794 and 0.729 for the male and female speakers respectively and these are both higher than for the alveolar.
D3.

temp = vowlax.left %in% c("v", "d") & vowlax.spkr=="67"

dplot(vowlax.fdat[temp,2], vowlax.left[temp], ylab="F2 (Hz)")
Fig. 6.28 about here
Yes. The [v] and [d] transitions point to different locis as Fig. 6.28 shows. Fig. 6.28 also shows that the locus equation slope for [d] is likely to be lower because, even though the F2 range at the vowel target is quite large (over 1000 Hz) at time point 50 ms, the F2-transitions for [d] nevertheless converge to about 1600-1700 Hz at F2-onset (t = 0 ms). There is much less evidence of any convergence for [v] and this is why the locus equation slope for [v] is higher than for [d].



Download 1.58 Mb.

Share with your friends:
1   ...   12   13   14   15   16   17   18   19   ...   30




The database is protected by copyright ©ininet.org 2024
send message

    Main page