The Phonetic Analysis of Speech Corpora

Chapter 4 Querying annotation structures

Download 1.58 Mb.

Page	7/30
Date	29.01.2017
Size	1.58 Mb.
	#11978

1 2 3 4 5 6 7 8 9 10 ... 30

Chapter 4 Querying annotation structures

The purpose of this Chapter is to provide an overview both of the different kinds of annotation structures that are possible in Emu and of some of the main types of queries for extracting annotations from them. This will take in a discussion of how annotations from different tiers can be linked, their relationship to a Praat TextGrid, and the way that they can be entered and semi-automated in Emu. This Chapter will begin with a brief review of the simplest kinds of queries that have been used in the preceding Chapters to make segment lists.

4.1 The Emu Query Tool, segment tiers and event tiers

As already discussed, whenever annotations are queried in Emu the output is a segment list containing the annotations, their start and end times and the utterances from which they were taken. One of the ways of making segment lists is with the emu.query() function in the R programming language. For example, this function was used in the preceding Chapter to make a segment list from all utterances beginning with gam (i.e., for the male speaker) in the second database of five types of vowels annotated at the Phonetic tier:

emu.query("second", "gam*", "Phonetic = i: | e: | a: | o: | u:")
The other equivalent way to make a segment list is with the Emu Query Tool which is accessed with Database Operations followed by Query Database from the Emu DB window as shown in Fig. 4.1. One you have entered the information in Fig. 4.1, save the segment list to the file name seg.txt and in a directory of your choice. You will then be able to read this segment list into R with the read.emusegs() function as follows:
read.emusegs("path/seg.txt")
where path is the name of the directory to which you saved seg.txt. This will give exactly the same output as you get from the emu.query() function above (and therefore illustrates the equivalence of these two methods).
Fig. 4.1 about here
In Chapter 2, a distinction was made between a segment tier whose annotations have durations and an event or point tier in which an annotation is marked by a single point in time. The commands for querying annotations from either of these tiers is the same but what is returned in the first case is a segment list and in the second a variation on a segment list called an event list in which the end times are zero. An example of both from the author utterance of the aetobi database is as follows:
state.s = emu.query("aetobi", "author", "Word = state")

state.s

segment list from database: aetobi

query was: Word = state

labels start end utts

1 state 2389.44 2689.34 author

bitonal = emu.query("aetobi", "author", "Tone = L+H*")

bitonal

event list from database: aetobi

query was: Tone = L+H*

labels start end utts

1 L+H* 472.51 0 author

2 L+H* 1157.08 0 author
Since the first of these is from a segment tier, the annotation has a start and end time and therefore a duration:
dur(state.s)

299.9
On the other hand, the two annotations found in the second query have no duration (and their end times are defined to be zero) because they are from an event tier:

dur(bitonal)

0 0
4.2 Extending the range of queries: annotations from the same tier

As shown by various examples so far, the most basic query is of the form T = x where T is an annotation tier and x an annotation at that tier. The following extensions can be made to this basic query command for querying annotations from the same tier. In all cases, as before, the output is either a segment list or an event list.
The | (or) operator and classes (features)

An example of this has already been given: Phonetic = i: | e: | a: | o: | u: makes a segment list of all these vowels. A very useful way of simplifying this type of instruction is to define annotations in terms of classes (or features). In Emu, classes can be set up in the Legal Labels pane of the template file. For example, a number of annotations have been grouped for the second database at the Phonetic tier into the classes shown in Fig. 4.2. Consequently, a query for finding all rounded vowels can be more conveniently written as follows:

emu.query("second", "gam*", "Phonetic=round")
Fig. 4.2 about here
The != operator

T != a means all annotations except x. So one way to get at all annotations in tier T in the database is to put the right hand side equal to an annotation that does not occur at that tier. For example, since abc is not an annotation that occurs at the Phonetic tier in the second database, then Phonetic != abc returns all annotations from that tier. In the following example, this instruction is carried out for speaker gam in R and then the label() and table() functions are used to tabulate all the corresponding annotations:

seg.all = emu.query("second", "gam*", "Phonetic != abc")

table(label(seg.all))

H a: au b d e: g i: o: oe oy u:

72 9 9 24 24 9 24 9 9 9 9 9

The & operator

Apart from its use for queries between linearly linked tiers discussed in 4.3, this operator is mostly useful for defining annotations at the intersection of features. For example, the following feature combinations can be extracted from the second database as follows:

rounded high vowels

Phonetic = round & Phonetic = high

unrounded high vowels

Phonetic != round & Phonetic = high

mid vowels

Phonetic != high & Phonetic != low

mid rounded vowels

Phonetic != high & Phonetic != low & Phonetic = round

Such definitions are then equivalent to those found in some distinctive feature notations in phonology and linguistic phonetics. Thus the last instruction defines vowels that are [-high, -low, +round] and they can be read into R using emu.query() in the same manner as before:
midround = emu.query("second", "*", "Phonetic != high & Phonetic != low & Phonetic = round")

table(label(midround))

o: oe oy

18 18 18
The -> operator

Any two queries at the same tier can be joined together with the -> operator which finds either a sequence of annotations, or an annotation in the context of another annotation. There are three cases to consider:
1. Make a segment list of a b (the segment list has a start time of a and end time of b)

[T = a -> T = b]

2. Make a segment list of a if a precedes b

[#T = a -> T = b]

3. Make a segment list of b if b follows a

[T = a -> # T = b]

For example:
(i) A segment list of any word followed by of in the aetobi database:
emu.query("aetobi", "*", "[Word != x -> Word = of]")

labels start end utts

1 kind->of 2867.88 3106.87 amazing

2 lot->of 5315.62 5600.12 amazing

3 author->of 296.65 894.27 author
(ii) A segment list of all words preceding of:
emu.query("aetobi", "*", "[#Word != x -> Word = of]")

segment list from database: aetobi

query was: [#Word != x -> Word = of]

labels start end utts

1 kind 2867.88 3016.44 amazing

2 lot 5315.62 5505.39 amazing

3 author 296.65 737.80 author
(iii) As above, but here the segment list is of the second word of:
emu.query("aetobi", "*", "[Word != x -> #Word = of]")

segment list from database: aetobi

query was: [Word != x -> #Word = of]

labels start end utts

1 of 3016.44 3106.87 amazing

2 of 5505.39 5600.12 amazing

3 of 737.80 894.27 author
4.3 Inter-tier links and queries

An inter-tier query is, as its name suggests, any query that spans two or more annotation tiers. For the aetobi database, an example of an inter-tier query is: 'find pitch-accented words'. This is an inter-tier query because all annotations at the Word tier have to be found, but only if they are linked to annotations at the Tone tier. For this to be possible, some of the annotations between tiers must already have been linked (since otherwise Emu cannot know which words are associated with a pitch-accent). So the first issue to be considered is the different kinds of links that can be made between annotations of different tiers.

Fig. 4.3 about here
An initial distinction needs to be made between two tiers that are linearly and non-linearly linked. The first of these is straightforward: when two tiers are linearly linked, then one tier describes or enriches another. For example, a tier Category might be included as a separate tier from Word for marking words' grammatical category membership (thus each word might be marked as one of adjective, noun, verb etc.); or information about whether or not a syllable is stressed might be included on a separate Stress tier. In both cases, the tiers are linearly linked because for every annotation at Word or Syllable tiers, there are exactly corresponding annotations at the Category or Stress tiers. Moreover, the linearly linked annotations have the same times. In the downloadable database gt which contains utterances labeled more or less according to the conventions of the German Tones and Break Indices system GToBI (Grice et al, 2005), the tiers Word and Break are linearly linked. The Break tier contains annotations for so-called break-indices which define the phonetic juncture at word boundaries. Each word is associated with a break on a scale from 0 to 5, with lower numbers corresponding to less juncture. So if there is a full-pause between two words, then the word before the pause on the Break tier is marked with a high value e.g., 4 or 5. On the other hand, the first word in did you when produced as the assimilated form [dɪdʒə] would be 0 to denote the substantial overlap at the word boundary. Whereas Break and Word are linearly linked, all of the other tiers in the gt database stand in a non-linear relationship to each other. In Fig. 4.3, the relationship between Tone and Word (and therefore also between Tone and Break) must be non-linear, because there is evidently not one pitch-accent per word (not one annotation at the Tone tier for every annotation at the Word tier).

Fig.4.3 also shows the organisation of the utterance into a prosodic hierarchy. In an annotation of this kind, an utterance (tier Utt) is made up of one or more intonation phrases (tier I). An intonation phrase is made up of at least one intermediate phrase (tier i) and an intermediate phrase is made up of one or more words. (Intonation and intermediate phrases are collectively referred to as prosodic phrases). The criteria for marking these groupings depend to a certain extent on phonetic juncture. Thus, there is a greater juncture between word pairs that are in different phrases ( morgen/fährt, Thorsten/ja, and Studio/bei in Fig. 4.3) than those within a prosodic phrase. Moreover, the break between adjacent words in different intonation phrases (Studio/bei) is greater than between adjacent words in different intermediate phrases.

The non-linear association extends beyond those tiers that are in a parent-child relationship^³⁵. Thus since I is a parent of i and since the relationship between i and Word is non-linear, then the grandparent-child relationship between I and Word is also necessarily non-linear. This becomes completely clear in skipping the i tier and displaying the links between I and Word that fall out from these relationships, as in Fig. 4.4.
Fig. 4.4 about here
There are two further parameters that need to be mentioned and both only apply to non-linear relationships between tiers. The first is whether a non-linear association is one-to-many or many-to-many. All of the relationships between tiers in Fig. 4.3 are one-to-many because an annotation at a parent tier maps onto one or more annotations at a child tier, but not the other way round: thus an intermediate phrase can be made up of one or more words, but a word cannot map onto more than one intermediate phrase. In a many-to-many relationship by contrast, an annotation at the parent tier can map onto one or more annotations at the child tier, and vice-versa. Two examples of this type of many-to-many association are shown in Fig. 4.5. Firstly, the final syllable was produced with a final syllabic nasal [n̩] (that is with no discernible weak vowel in the final syllable) but in order to express the idea that this word could (in a more careful speech production) be produced with a weak vowel as [ən], the word is annotated as such at the Phoneme tier and both segments are linked to the single n annotation at the child tier. Since two annotations from a parent tier can map onto a child tier and vice-versa, the inter-tier relationship is many-to-many. Secondly, to express the idea inherent in some prosodic models that the medial /s/ in (non-rhotic varieties of) person is ambisyllabic (e.g., Kahn, 1976; Gussenhoven, 1986), the single s annotation is linked to both S and W (strong and weak) annotations at the Syllable tier. Evidently, Syllable and Phoneme also stand in a many-to-many relationship to each other because a syllable can be made up of more than one phoneme, but a phoneme can also map onto two syllables.

The final parameter that needs to be mentioned is whether the non-linear association between two tiers is hierarchical or autosegmental (see also Bird & Liberman, 2001 and Taylor, 2001 for a similar distinction in query languages). Two tiers are defined in Emu to be in a hierarchical relationship when the annotations of a parent tier are composed of those from the child tier (or seen from the bottom upwards, when the annotations of the child tier can be parsed into those of the parent tier). For example, syllables stand in a hierarchical relationship to phonemes in Fig. 4.5, because syllables are made up of a sequence of phonemes (phonemes are parsed into syllables). In the autosegmental-metrical model of intonation (Pierrehumbert 1980; Beckman & Pierrehumbert 1986; Ladd 1996), words are made up of a sequence of syllables, and for this reason, the tiers stand in a hierarchical relationship to each other. Where → means 'stands in a hierarchical relationship to' then for the GToBI annotation in Fig 4.3, Utt → I → i → Word. On the other hand, the meaning of autosegmental is 'belongs to' or 'is associated with'. In Fig. 4.3, Word and Tone stand in an autosegmental relationship to each other. Their relationship is not hierarchical because a word is evidently not made up of a sequence of pitch-accents (the units at the Tone tier) in the same way that it is always composed of a sequence of syllables or a sequence of phonemes.

In Emu, the difference between a hierarchical or autosegmental relationship depends on whether the time stamps from the different tiers can be predicted from each other. In an autosegmental relationship they cannot. For example, given a pitch-accent it is not possible to say anything about the start and end times of the word with which it is associated nor vice-versa (beyond the vague statement that a pitch-accent is likely to be annotated at a point on the f0-contour somewhere near the word's rhythmically strongest vowel). On the other hand, given that a word is composed of a sequence of syllables, then the start and end times of a word are necessarily predictable from those of the first and last syllable. Similarly, given the hierarchical relationship Utt → I → i → Word in Fig.4.3, then the duration of the first H% at tier I in this figure extends from the onset of the first word to the offset of the last word that it dominates, i.e., between the start time of jeden and the end time of Thorsten; similarly, the duration of the first H- at level i extends from the beginning to the end of jeden morgen that it dominates, and so on.
Fig. 4.5 about here
Specifying a relationship as hierarchical and many-to-many allows hierarchies to overlap with each other in time at their edges. For example, since the Syllable and Phoneme tiers are hierarchically related to each in Fig. 4.5, then the duration of the first syllable (S) of person extends across the segments that it dominates, i.e., from the onset of p to the offset of s; but since the second syllable (W) also extends across the segments it dominates, then its duration is from the onset of the same s to the end of word: that is, the two syllables overlap in time across the durational extent of the medial s. For analogous reasons, since @ (schwa) and n at the Phoneme tier both map onto the same n annotation at the Phonetic tier, and since Phoneme and Phonetic stand in a hierarchical relationship to each other, then these annotations are both defined to have the same start times and they both have the same end times: thus since both @ and n inherit their times from the same annotation at the Phonetic tier, they are defined to be temporally overlapping.

When two tiers, T and U, are non-linearly (therefore either autosegmentally or hierarchically) linked, then their annotations can be queried with [T = a ^ U = b] where a and b are annotations at those tiers: such a query finds all a annotations at tier T that are linked to b annotations at tier U. Some examples with respect to the thorsten utterance of the gt database (Fig. 4.3) are given below. As discussed earlier, these search instructions can be entered either in the Emu Query Tool in the manner shown in Fig 4.1, or by using emu.query() function in R. For example, the search instruction

[Tone=L* ^ Word=morgen | Studio]
can be embedded as argument to the emu.query() function as shown in the following examples.
(i) All L* tones at the Tone tier linked to either morgen or Studio at the Word tier.

emu.query("gt","thorsten","[Tone=L* ^ Word=morgen | Studio]")

labels start end utts

1 L* 325.625 0 thorsten

2 L* 2649.770 0 thorsten
(ii) As (i), but return the corresponding words

[Tone = L* ^ #Word = morgen | Studio]

labels start end utts

1 morgen 250.366 608.032 thorsten

2 Studio 2438.090 2962.670 thorsten

The != operator discussed in 4.1 can be used to find all annotations. For example, the following search instruction finds all pitch-accented words i.e., any annotation at the Word tier that is linked to any annotation at the Tone tier:

[Word != x ^ Tone != x]

1 morgen 250.366 608.032 thorsten

2 Thorsten 1573.730 2086.380 thorsten

3 Studio 2438.090 2962.670 thorsten

4 Chaos 3165.340 3493.200 thorsten

5 Stunden 3982.010 4274.110 thorsten

Queries can be made across intervening tiers as long as all the tiers are linked. Thus since tiers i and Tone are linked via Word, a query such as 'find any intermediate phrase containing an L*' (any annotation at tier i that is linked to L* at the Tone tier) is defined for this database:
[i != x ^ Tone = L*]

labels start end utts

1 H- 47.689 608.032 thorsten

2 L- 2086.380 2962.670 thorsten

Two intermediate phrases are returned, which are the two that are associated with L* via the Word tier (see Fig. 4.3). Notice that, although the query is made with respect to an event which has no duration (L*), since the intermediate phrases inherit their durations from the Word tier, they have a duration equal to the words that they dominate.

It is possible to nest non-linear queries inside other non-linear queries. For example, the following query finds the words that occur in an H- intermediate phrase and in an L% intonational phrase (any annotation at the Word tier linked both to H- at the i tier and to L% at the I tier):

[ [ Word != x ^ i = H- ] ^ I = L% ]

1 bei 2962.67 3165.34 thorsten

2 Chaos 3165.34 3493.20 thorsten

3 Kaja 3493.20 3815.10 thorsten

4 vier 3815.10 3982.01 thorsten

5 Stunden 3982.01 4274.11 thorsten

6 lang 4274.11 4691.39 thorsten
The following does the same but under the additional condition that the words should be pitch-accented, (linked to an annotation at the Tone tier):
[[[Word !=x ^ Tone !=x] ^ i = H-] ^ I = L%]
or equivalently:
[[[ Tone !=x ^ # Word!=x] ^ i = H-] ^ I = L%]

1 Chaos 3165.34 3493.20 thorsten

2 Stunden 3982.01 4274.11 thorsten
4.4 Entering structured annotations with Emu

A helpful first step in entering annotations such as the one shown in Fig 4.3 is to summarise the inter-tier relationships for a database in a path as in Fig. 4.6. In this Figure, (S) and (E) denote segment and event tiers which, for convenience, can be collectively referred to as time tiers. All other tiers (Utt, I, i, Break) are (initially) timeless i.e., they inherit their times from another tier. An arrow between two tiers means that they stand in a non-linear relationship to each other, either one-to-many (a single downward arrow) or many-to-many (a double arrow). Any adjacent two tiers not connected by an arrow stand in a linear relationship to each other (Word, Break). When a single or double arrow extends between any two time tiers (Word, Tone) then the relationship is autosegmental; otherwise if either one (i, Word) or both (Utt, I and I, i) of the tiers is timeless, then the relationship is hierarchical. A timeless tier inherits its times from the child tier that it dominates. Thus, times percolate up the tree: i inherits its times from Word, I from i, and Utt from I. In linear relationships, a timeless tier inherits its times from the tier with which it is linearly associated (Break inherits its times from Word).

Fig. 4.6 about here
The tiers that stand in a non-linear relationship to each other are entered into Emu in the Levels pane of the template file (Fig. 4.6) in which the child:parent relationship is specified as one-to-many or many-to-many. A tier that is linearly linked to another tier is declared in the Labels pane. The distinction between hierarchical and autosegmental depends on whether a tier is marked as timeless in the Labfiles pane. Unless a tier is defined as a segment or event tier (Fig. 4.6), it is timeless. Thus Utt → I → i → Word defines a hierarchical relationship in this template file for two reasons: firstly, because of the child:parent relationships that are declared in the Levels pane, and secondly because Word is the only one of these tiers declared to be linked to times in the Labfiles pane. As described earlier, Utt, I and i inherit their times from Word both because they dominate it, and because Word is associated with times. Moreover, because these tiers do not have their own independent times, they do not appear in the utterance's signal view window: thus when you open any utterance in the gt database, you will only see Word/Break and Tone in the signal view window and you have to switch to the hierarchy view window to see the other tiers (Fig. 4.3).

Fig. 4.7 about here

There is one utterance in the gt database, dort, for which the non-linear links have not been set and these can be incorporated following the details below. Begin by opening the hierarchy window of this utterance in the manner described in Fig. 4.3. Once you have the hierarchy window in the top panel of Fig. 4.7, select Simple Tree so that the annotations do not overlap. The utterance dann treffen wir uns dort am Haupteingang (lit: then meet we us there at the main-entrance, i.e. then we'll meet each other there, at the main entrance) was produced as one intonational phrase and two intermediate phrases and with accented words dann, dort, and Haupteingang. The corresponding pitch-accents have already been entered at the Tone tier: the task, then, is to link the annotations to end up with the display shown in the lower pane of Fig. 4.7. Move the mouse over the leftmost H* which causes the annotation to turn blue. Then hold down the left button and without letting go, drag the mouse to the word to which this pitch-accent belongs, treffen. Once treffen is highlighted (it will also change colour to blue), then release the mouse button. This should set the link. It is quite a good idea to practice deleting the link, in case you make a mistake. To do this, select Delete, and move the mouse over the line you have just drawn and it will then turn red. Clicking the mouse once, deletes the link. You can in fact delete not just lines but annotations in this way. When you finish deleting, select Edit before continuing, otherwise you will delete material such as links and annotations (unfortunately, there is currently no undo button - so if you accidentally delete material, close the annotation window without saving).

In order to set the links between the i and Word tiers, move the mouse anywhere to the same height in the window as the i tier and click twice (twice because two intermediate phrases have to be entered). This will cause two asterisks to appear. Proceed in the same way as described earlier for setting the non-linear links between the asterisks and the words as shown inf Fig. 4.7: thus, move the mouse to the first word dann until it changes to blue, then hold the mouse button down and sweep to the first asterisk until it also changes to blue. Set all of the links between these tiers in the same way. You should end up with two asterisks at the i tier, the first of which is linked to the first five words, the second to last two. To enter text, click on the asterisk to get a cursor and enter the annotations. Set the labels and links at the Utt and I tier in the same matter as described above. Finally, enter the annotations of the Break tier (these could also be entered in the signal window because any tier that is linearly linked to a time tier - in this case to Word - also appears in the signal window)^³⁶ and then save the file.

Fig. 4.8 about here
The result of saving annotations is that one file per time tier and one so-called hierarchical label file will be created or updated. Since for this database, there are two time tiers, Tone and Word, then the annotations at the Tone tier and the annotations at the Word tier are stored in their own separate (plain text) files. The location of these files is given in the Labfiles pane of the template file (lower pane, Fig. 4.6). The other annotations from timeless tiers are stored in the hierarchical label file which always has the extension hlb and which is stored in the path given in the Levels pane of the template file. As you will see if you open any (plain text) hlb file, the information is a code that is equivalent to the kinds of structures shown in Fig. 4.8. (The actual numbers might differ depending on the order in which the annotations were created). For example, some of the lines from dort.hlb might look like this:
0

1 7

4 8

6 9

10 5 6 9

11 0 1 2 3 4 7 8

The number on the left denotes a parent of any of those on the right. So 1 7 in the second line means that annotation 1 is a parent (or grand-parent or great-great grand-parent etc.) of annotation 7 while annotation 11 of the last line stands in an analogous relationship to the adjacent annotations 0, 1, 2, 3, 4, 7, 8. The relationship between these numeric and actual annotations can be deduced from the other information in the same hlb file or alternatively by opening a hierarchy window and showing both types of annotations as in Fig. 4.8. This figure shows that 11 corresponds to the first H- at tier i while the other numbers to the right include the annotations at the Word tier of which it is a parent as well as those of the Tone tier of which it is a grand-parent.

Directory: ~jmh -> research -> pasc010808
pasc010808 -> The Phonetic Analysis of Speech Corpora

Download 1.58 Mb.

Share with your friends:

1 2 3 4 5 6 7 8 9 10 ... 30