Hermit Crab Parsing Engine Specification


Definition of Application of a Stratum



Download 403.76 Kb.
Page9/20
Date31.07.2017
Size403.76 Kb.
#25627
1   ...   5   6   7   8   9   10   11   12   ...   20

4.5Definition of Application of a Stratum


The following defines the application of a single Stratum of rules to a Lexical Entry.

4.5.1Application of a Noncyclic Stratum


Let Si be a noncyclic stratum. Then the application to a lexical entry from stratum Si of one morphological rule of stratum Si produces a storable lexical entry of stratum Si. If stratum Si+1 is a cyclic stratum, then the application to a lexical entry from stratum Si of the relevant Affix Template (if any) of Si, followed by the application of all the phonological rules of stratum Si, followed by the erasure of any boundary markers, followed by the application of all the phonological rules of stratum Si+1, produces a storable lexical entry of stratum Si+1. Otherwise (if stratum Si+1 is a non-cyclic stratum), the application to a lexical entry from stratum Si of the relevant Affix Template (if any) of Si, followed by all the phonological rules of stratum Si, followed by the erasure of any boundary markers, produces a storable lexical entry of stratum Si+1.

4.5.2Application of a Cyclic Stratum


Let Sj be a stratum of cyclic rules. Then the application to a storable lexical entry from stratum Sj of one or more cycles is also a storable lexical entry of stratum Sj. (A “cycle” is defined as the application of one morphological rule of the stratum, followed by the application of all phonological rules of that stratum, followed by the erasure of any boundary markers.) If stratum Sj+1 is also a cyclic stratum, then the application of all the phonological rules of stratum Sj+1 to a storable lexical entry of stratum Sj, followed by the application of the relevant Affix Template (if any) of Si, is a storable lexical entry of stratum Sj+1. Otherwise (if stratum Sj+1 is a non-cyclic stratum), then a storable lexical entry of stratum Sj to which the relevant Affix Template (if any) of Si has been applied is also a storable lexical entry of stratum Sj+1.

4.6Definition of Generation of a Surface Lexical Entry


For convenience, the pseudo-stratum *surface* is defined as the final stratum; it has no rules and is considered a non-cyclic stratum for purposes of the following definition. (That is, a lexical entry belonging to the *surface* stratum may have no further rules, morphological or phonological, applied to it. The user should not define another stratum with the name *surface*.)

Let LE be a lexical entry of stratum S1 to which no morphological or phonological rules have applied, and let RzHF be a set of Head Features which are to be realized on LE. Then LE may be converted into a Derived Lexical Entry of the Surface Stratum by first setting the Head Features list of LE to RzHF plus any non-conflicting features of the existing Head Features of LE, then applying all the Strata beginning with S1 through the Surface Stratum in order.


5Data Structures

5.1Input Data Format


The data input to the morpher module for the commands morph_and_lookup_word and morph_and_lookup_list is the output of the Preprocessor module (see chapter five), and contains the data to be morphed. To summarize that chapter: the input to the morpher is a list of one or more Token Record data structures, each containing the print form of the word and its normalized form, and representing a single word of the input string.

The Phonetic Shape field of those records is visible to the morpher, while the Orthographic Shape field is invisible to the morpher rules (although the morpher module passes it on to downstream modules in the Orthographic Shape field of Lexical Entry records).

The function morph_and_lookup_word accepts a list of length three; each member of the list is a Token Record data structure, and represent a single input word, plus the preceding and following words, in that order. The function morph_and_lookup_list accepts a list of Token Record data structures of any length. The morpher morphs each word separately; the previous word and the following word (if any) are, however, accessible to phonological rules through the phonological rule fields prev_word and next_word.

The input to the morpher module for the commands generate_word, apply_stratum, and apply_morpher_rule are similar, but are described under each command.


5.2Lexical Entry Data Structure


Lexical Entries are record structures; as described above (see Lexical Entries, section 3), each lexical entry represents a root, stem or word. The Lexical Entry data structure is used in the lexicon and in the output of the morpher. (A nearly identical structure is used in the syntactic parser to represent terminal nodes; see chapter seven, Parse Tree Format—Terminal Node Record Structure.)

This section describes the record structure of a lexical entry.



Note: The Lexical Entry structure may be augmented in future versions of Hermit Crab by the addition of fields, e.g. for indicating functional structure.

Record Label: lexical_entry

Fields:

5.2.1Lexical Entry ID


Optionality: obligatory

Label: id

Type: string

Contents: A code which uniquely identifies this lexical entry data structure.

Purpose: used in debugging to refer to lexical entries.

A derived lexical entry inherits the lex ID of the lexical entry from which it is derived.

A real lexical entry's lex ID remains valid during a single session of Hermit Crab; a virtual lexical entry's lex ID remains valid only until the next time either the function morph_and_lookup_word or the function morph_and_lookup_list is called. Deleting a (real) lexical entry also causes its lex ID to become invalid, as does resetting the lexicon (see reset_lexicon, section 6.4.6).

5.2.2Phonetic Shape


Optionality: obligatory in Real Lexical Entries; pertains to Virtual Lexical Entries only during debugging

Label: sh

Type: string

Contents: A string which represents the phonological form of the lexical entry. For lexical entries which represent entire tokens in the input, this field is copied from the field of the same name in the input Token Record data structure; in the case of lexical entries in the lexicon, it is the result of lexical lookup. In the case of virtual lexical entries, this field is translated from the phonetic sequence which represents its phonological form; this translation is only necessary when matching a storable lexical entry against a real lexical entry, or during debugging.

Implementation note: The translation of the phonetic sequence of a virtual lexical entry into a string may be ambiguous; see Translation from Phonetic Sequence to Regular Expression, section 4.1.1.2.

5.2.3Family


Optionality: optional, used only in Real Lexical Entries

Label: fam

Type: atom

Contents: Gives the family to which a given (real) lexical entry belongs.

Purpose: To allow blocking of derivations by irregular forms listed in the lexicon.

It may be useful for the shell to treat families of lexical entries as units when the user is editing lexical entries, so that changes to one member of the family are consistently propagated to others. An inheritance schema is one way this might be implemented.


5.2.4Gloss


Optionality: optional

Label: gl

Type: string

Contents: A translation of the lexical item as listed in the dictionary (for real lexical entries) or as morphed (for virtual lexical entries).

If this field is empty in a real lexical item, the default string “?” is used, as described below (see Morphological Rule Notation—Gloss String, section 7.2.1.14).



Purpose: To represent the morpher's analysis of the word's meaning. The intention is that it will contain the translation of one or more of the morphemes composing the word. This field may also the Display Module as a label for the word.

Glosses are shown in Hermit Crab’s output if the global variable *show_glosses* is true (default), otherwise they are not included.


5.2.5Part of Speech


Optionality: obligatory

Label: pos

Type: atom

Contents: The name of the part of speech of the lexical item.

5.2.6Subcategorization


Optionality: optional

Label: sub

Type: list

Contents: A list of atoms, each one of which is the name of a syntactic (parser) rule which the lexical item subcategorizes. If this field is absent, the lexical item does not subcategorize any rules.

Purpose: To allow the lexical item to subcategorize certain syntactic rules. Morphological rules may also be constrained to require that the lexical entry to which they apply subcategorize a specified rule.

Warning: The morpher does not check whether the rules in this list actually exist in the parser's rulebase.

5.2.7Grammatical Function Information


Optionality: empty

Label: gf

Type: atom

Purpose: This field is meant to carry information specified in syntactic rules as to the function of this node. This information is added by the Parser and/or Functional Structure Modules; the field is always empty in the Morpher module, and may therefore be omitted from all lexical entries within this module. (It is mentioned here only for completeness.)

5.2.8Morphological Rules


Optionality: optional (defaults to “?”)

Label: mrs

Type: list

Contents: The names (atoms) of the morphological rules (if any) which have applied to form this lexical entry; left-to-right order of this list represents the order in which morpher rules applied to produce this lexical entry (in the synthesis sense). This field will often be the empty list for real lexical entries. However, if a real lexical entry represents a stem, rather than a root, it may be desirable to indicate the morphological rules which “would have” applied, in order to prevent their applying. (For instance, if the irregular past tense verb ran is listed in the lexicon, its lexical entry might list the past tense rule as having applied, to avoid generating *ranned.)

Purpose: Used to prevent multiple application of morphological rules, and in debugging.

5.2.9Stratum


Optionality: obligatory

Label: str

Type: atom

Contents: The name of a rule stratum.

Purpose: This encodes the stratum of rules which may apply to this lexical entry.

The value of *surface* means that no more rules may apply to the lexical entry (it is a surface form).

For real lexical entries, the value of this field must be supplied by the user. For virtual lexical entries, the value is automatically supplied by the morpher.

See also: Storable Lexical Entries (section 3.3)

5.2.10Morphological/ Phonological Rule Features


Optionality: optional

Label: rf

Type: list

Contents: zero or more atoms, each of which is the name of a Morphological/ Phonological Rule (MPR) feature.

Purpose: These rule features govern which morphological or phonological rules a lexical entry will exceptionally undergo or not undergo. They may be used to encode such things as conjugation class and gender.

If this field is absent, the lexical entry has no MPR features.

If membership in a conjugation class or gender class is important in the syntax, the class membership should be indicated as a Head Feature, since syntactic rules make reference only to Head and Foot Features. Head and Foot Features are visible both to morphological and phonological rules, and to syntactic (phrase structure) rules, whereas MPR features are visible only to morphological/ phonological rules.

5.2.11Head Features


Optionality: optional

Label: hf

Type: list-valued feature list

Purpose: This list represents the assigned (non-default) Head Features of the lexical entry.

If this field is absent, the values of all Head Features of the lexical entry are the default values.



See also: Foot Features (section 5.2.12); Morphological/ Phonological Rule Features (5.2.10)

5.2.12Foot Features


Optionality: optional

Label: ff

Type: list-valued feature list

Purpose: This list represents the assigned (non-default) Foot Features of the lexical entry.

If this field is absent, the values of all Foot Features of the lexical entry are the default values.

Foot features are invisible to phonological rules.

See also: Head Features (section 5.2.11); Morphological/ Phonological Rule Features (5.2.10)

5.2.13Obligatory Head Features


Optionality: optional

Label: of

Type: list

Contents: A list of atoms, each of which is the name of a Head Feature.

Purpose: For each feature-name listed, some value must be assigned to that feature by the end of the derivation (in the synthesis sense, i.e. to the complete word). This feature value will usually be supplied by an affix yet to be attached to the stem represented by this lexical entry. If at the end of the derivation, no value has been assigned to such a feature, the derivation is ruled out.

If this field is omitted, there are no obligatory Head Features.



See also: Complete Lexical Entries (section 3.5); Morphological Rule Notation—Output Side Record Structure—Obligatory Features (section 7.2.1.13)

5.2.14Pseudo Lexical Entry Flag


Optionality: optional, relevant only to storable lexical entries

Label: ps

Type: Boolean

Default: false

Purpose: This field has the value false for Real Lexical Entries and for all Storable Lexical Entries which are derived from Real Lexical Entries. Storable Lexical Entries not derived from Real Lexical Entries (visible to the user only with the function show_morphings) have the value true for this field. In other words, this field is used in debugging to flag lexical entries which are not derivable from the lexicon.

See also: show_morphings (section 6.6.10).


Download 403.76 Kb.

Share with your friends:
1   ...   5   6   7   8   9   10   11   12   ...   20




The database is protected by copyright ©ininet.org 2024
send message

    Main page