Hithes doc Version 14



Download 7.2 Mb.
Page1/85
Date28.03.2018
Size7.2 Mb.
#43623
  1   2   3   4   5   6   7   8   9   ...   85

V. desire; wish, wish for; be desirous &c. adj. have a longing &c. n.; hope &c. 858.

<--

*************************************************************************

** Hithes.doc Version 1.14 **

*************************************************************************

Copyright (C) MICRA, Inc. 1991, 1992.
Hithes.doc is a hierarchically-organized thesaurus derived by reorganization of the version of Roget's Thesaurus published in 1911. The new organization is intended to allow use of ISA and other semantic relational markers.
Last edit 10-4-92. Completed shifting of most individual Roget categories to new positions, but the initial internal reorganization has only reached P2.3.1.3.9.3.3 (20%). page 215 of 639 = 424 remaining.

No systematic additions have been done yet.


=========================================================================
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

NOTATIONS

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-===-=

This thesaurus is organized, like the Original Roget's by major word categories, of which there are about 1060. Each main entry has words semantically related to the headword, and organized in subsections. The first subdivision of the main entry is the paragraph. All the words in any paragraph have the same semantic relation to the headword. A main entry may be headed by a Roget-entry number (a number prefixed by a "#" symbol), or by a hierarchical entry number (a number prefixed by a "A", "P", or "M").


ROGET-NUMBERED ENTRIES

The 1911 Roget category numbers have been retained, and subdivided in some cases with a,b,c, where significant parts of a Roget entry are broken out into a separate entry. This numbering serves merely as a temporary cross-reference to other entries, until all Roget entries have been assigned a hierarchical number. It also serves to indicate the origin of most of the entries. Eventually most Roget entries (with some exceptions) should have at least one hierarchical entry number, but as of the last edit this has not been completed.


HIERARCHICAL NUMBERING

The hierarchical numbering is in 3 groups: abstract, Physical, and Mental, in which the hierarchy numbers are preceded with an A, P, or M, repectively. Subdivisions are preceded by decimal points separating the hierarchical levels (segment numbers), as in "P2.3.2.3.1.5". A word in the hierarchical noun segment of each subgroup should be a proper subtype of all of the higher categories.

Within a paragraph, words which are approximately synonymous are grouped together, separated by commas; each synonym-group is separated by a semicolon (";") from other words which have a lot of similarity, but are not so closely synonymous. All paragraphs begin with a tab (indentation) and end with a period. The objective is to group within each paragraph words which have similar semantic categories, and a similar semantic relation to the headword.
SYNONYMS AND SUBTYPES

There may be several paragraphs under each main heading. Within the first paragraph the first synonym set in general contains words which are synonymous to the headword. The remaining synonym groups of the first paragraph contain words which share a lot of similarity in meaning to the head entry word. This is equivalent to the {{similar_to}} relation. Other paragraphs within the set of nouns, unless marked with different semantic relations, are subtypes of the headword.


OTHER SEMANTIC RELATIONS

Words which are not synonymous, but have other semantic relations to the headword, are contained in paragraphs with a semantic relation indicated in a double curly bracket heading the paragraph. These other semantic relations are to be interpreted as default conditions, to which there may be exceptions. Comments in square brackets are generally either explanatory comments or semantic relations for which a well-defined semantic relation has not yet been formulated.

In general, between the beginning of a main entry and the "end-hierarchy" mark "$$", any paragraph not headed by a semantic relation has the default semantic relation of a subtype of the headword, i.e., part of the hierarchy proper. This may also be written explicitly as {{has_subtype(headword)}}. However, if there is a direct semantic relation (in double brackets) heading a paragraph (e.g. {{has_part}}), this relation is assumed to be the only semantic relation of the headword to the paragraph words. Such "direct" semantic relations will not contain a parenthesis within the double curly brackets. In this case, if a group of words has a semantic relation in addition to the subtype relation, the subtype relation must be specified explicitly. (There is an exception, for the semantic relations dealing with quantity or intensity. These semantic relations imply a subtype relation, e.g. {{has_high_intensity}} is a direct semantic relation, but the words thus categorized will also be subtypes of the headword, although with a distinctive property differntiating them from other subtype words). In general, semantic relations of other words to the words which have their primary definition in a paragraph do NOT nullify the subtype relation(for example, under

Interpretation, R522, the paragraph:

{{uses(mysticism)}} anagoge, anagogoy.

shows that the words anagoge and anagogy, in addition to being subtypes of interpretation, are also devices employed in mysticism.

Likewise, if the specified semantic relation is another subtype relation (i.e., if a subtype relation is specified which is not the headword, this subtype is also assumed to apply in addition to the subtype of the headword). A paragraph may have any number of relations listed, all of which apply to all the words in the paragraph. For example, under R85a,

Computation:

{{&used_in(number, R84)}} {{used_in(symbol)}} operation, mathematical operation; operator.

Here operator and operation both use numbers and symbols, and both are a type of computation. The "&" in front of the semantic relation means "sometimes" or "may". In this case, the "sometimes" symbol is used because mathematical operations may operate on things other than numbers (but in this paragraph, the second relation states that a mathematical operation always uses symbols).

The interpretation of combinations of relations is not yet well defined. One example of a combination is in the entry under "Smooth", where various methods of smoothing are given. Here, smoothing by abrasion and smoothing by cutting. In the verb section:


V. smooth, smoothen[obs3].

[render smooth] {{has_subtype(smoothing)}} {{has_method(abrade, @P2.4.2.5.3)}} file; polish, burnish.

Interpretation: "filing is a type of smoothing by abrasion"

{{has_subtype(smoothing)}} {{has_method(cutting, R44)}} plane; mow; shave.

Interpretation: "planing, mowing, and shaving are types of smoothing by cutting"

Thus, in ordinary English, the "method" relation is usually expressed as "by".


Another example is the relation between wood and sawdust. Of course, sawdust is composed of wood, but more specifically, we write:
{{constitutes(wood)}} {{has_subtype(powder)}} {{forms(cut)}} {[with_object(wood)]} sawdust.
This means that "cutting wood forms a powdery material composed of wood, called sawdust".

# pound sign serves as the start of an entry The segment of the entry

which is directly included in the hierarchy (as opposed to non-hierarchical semantic relations) is exclusively in the noun section, beggining with the N. symbol.

$$ ends the hierarchical segment started by a "N." within an entry.

This is a temporary device to allow recognition of paragraphs containing subtypes, even though non-subtytpe paragraphs have not been given their proper semantic markers. At present, only a few entries have this boundary explicitly marked.

{{}} double curly brackets designate the defined semantic relation (see list in file "relation.doc") which the immediately following segment(paragraph) has to the immediately preceding hierarchically organized main entry. These brackets precede the word(s) being categorized, and the relation applies to all entries up to the end of the paragraph. For usage, see below. These relations are essentially predicate relations between two arguments. (where the relation does not contain a verb, the implied verb is "is", e.g. "part_of" actually means "[head word] is_part_of [following word]".). In the case where the first argument is not the main entry word(s) for that segment, the first argument will appear in parentheses behind the relation, thus "{{has_property(light): color}}" means that light has a property named color(and that color in this sense has its primary entry elsewhere).

Where the relation is a property, the property may have a value, and in this case th value may also appear inside the relational markers. In this case, there will be a second set of colons within the relation, immediately preceding the value. If the second concept of the relation is outside the relational markers (a primary entry), there will be two colons immediately adjacent. Thus, for a primary entry:

{{has_subtype(polygon)}} {{property_of(number of sides):: six}} hexagon.

and for a referenc entry:

{{has_subtype(polygon)}} {{property_of(number of sides): hexagon: six}}.


In order to find a conceptual path without excessive addition of new relations, it was sometimes necessary to uuse a derivative word not found in a typical dictionary, but easily recognizable: e.g. "pulverizability" in the grouping:
{{has_requirement(pulverization)}} pulverizability.

{{has_high_intensity(pulverizability)}} friability.

In cases where the semantic relation sometimes holds between the two words, and sometimes doesn't, an ampersand (&) is placed before the relation. This symbol is not symmetrical, i.e., it does not work the same way in both directions. For example, only some types of conversion are caused by radioactivity. But radioactivity always causes conversion of the radioactive element. Thus the non-symmetry:

{{&result_of(conversion): radioactivity}}.

{{has_result(radioactivity): conversion, transmutation}}.

(read as: "conversion is sometimes the result of radioactivity", or

"some types of conversion are the result of radioactivity").

{[]} Curly brackets followed by square brackets indicate a semantic relation which is one of usage in context. This includes usage of the type often indicated in dictionaries with the phrase "of ..." or "in ...", i.e., the specific meaning of a word when used in relation to a particular subject. In this sense, teh relation "in_frame" is used, e.g. under "limit"...

{[in_frame(gambling)]} maximum bet.

In most cases, there will be only one word in parentheses, indicating the relationship of the following word to the headword. Occasionally, two words will be in p[arentheses, and in this case, the second word takes the place of the headword in the relationship. These relations may apply to the subject or object of a verb, the noun modified by an adjective, or an activity (such as gambling, above) in which the term is used.

{} single curly brackets signify semantic relations in a manner similar to double curly brackets, but the semantic relations are not included in the defined list. Typically the semantic relations in single curly brackets are more complex than those in double curly brackets.

[] square brackets contain explanatory comments, and in some cases may contain semantic relations which have not yet been classified.

Brackets which appear at the head of a paragraph apply to all words in the paragraph, but brackets which appear at the end of a word or phrase apply only to that word or phrase. One type of comment, for example, is the context in which a word most often appears in the specific sense (e.g. MED means medical context, CHEM is chemistry, coll. means colloquial usage, etc.). Such comments appear after the relevant word. -->

<-- In this file, comments which are not a proper part of the thesaurus itself are contained within arrow brackets thus: "<-- comment -->".

-->


<--

&& signifies locations where rearrangement or additions are needed

/ the forward slash in any grouping serves as an "or" function; any

member of the group will serve in that location, or for that purpose.

% Section headings, which are not an actual part of the thesaurus proper, are included between percent (%) markers, or the equivalent combination "<% . . . %>".

@ References to numbers starting with "@" indicate that the reference

or relation is to a word within the designated main entry, but the reference word is not the head word itself (or its synonyms). In this case, the reference word is usually a word semantically related to the head word, with its primary occurrence at that location.

---------------------------------------------------------


Example of usage of relational brackets:

The default usage is that the relation links the main entry to the following words. If the related words are outside the brackets, the related words have their primary definition at this location, and these related words are counted as a separate sense (primary entry) of the words. If the related words are inside the brackets, the relation is followed by a colon, and this appearance of the related words is not counted as a separate sense (the word is a reference entry). If the symbol "@" appears as the first entry of a relation, then all subsequent words in that paragaph will be found at the referenced entry, and all are reference-entries.


If the first word of the relation is not the primary entry word, the first word of the relation is included in a parenthesis following the relation. This notation is used where the first word of the relation is not a primary entry word, and has its primary definition (in that specific sense) within the same main entry.

Entry word: pattern.

{{is_a: group[of characteristics]}}.

{{has_member}} characteristic, property, feature.

{{has_subtype}} disposition, arrangement, array.

{{has_property: order, A4.1}}.

{{has_property(characteristic/property/feature)}} value.

{{analogous_to: vector}}.


(note that what this says is that a property may have a property called a value!)
NOTE that because "characteristic" in this sense has its primary definition here, the has_property relation on that word is also located in this entry.

Where several words in their primary locations all have an is_a relation to another word, the inverse relationship is used for convenience.

Thus a mimeograph is_a copy, but for convenience this is included with other writings which are copies, i.e.
Entry word: Writing (Document)

{{has_subtype(copy,R21)}} mimeograph, xerox, facsimile; reprint, offprint; photo offset.


and, correspondingly, under Copy, R21:

{{has_subtype(writing): mimeograph, xerox, facsimile; reprint, offprint; photo offset}}.


In many cases, specific subtypes of a concept which are simply combinations of words are included explicitly, e.g. for "critic" are listed specific types of critic (music critic, drama critic). Although these concepts could be easily derived from the individual concepts, the purpose of having such combinations explicitly included is to make it easier for a parser to recognize the combinations as a single defined unit, making errors of interpretation less likely.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on individual relations:

For a complete explanation, see the file "relation.doc".

Closely related, almost synonymous concepts are marked as {{similar_to:}}. This indicates a closer similarity than [ref: ]. An intermediate degree of similarity is marked with the "sometimes, somewhat" operator "&", as in {{&similar_to:}}.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

DICTIONARY GENERATION

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The plan is to organize this thesaurus so that a dictionary outline of meanings (without glosses, pronunciation, etc.) can be generated automatically by a scan of the thesaurus; a large part of the semantically important information about a word should be derivable from its location in this hierarchy. Each word in a specific sense should appear only once in its proper location in the hierarchy. References to that word in that specific sense should be contained within brackets. The dictionary-outlining program will classify words which are not within brackets by their location in the hierarchy, and each new occurrence of a particular such word will be considered as a different sense from all others. Words within brackets will not be considered as a new sense of a particular word, although the indexing program will index them.

This semantic references may occur inside or outside of brackets, depending on whether the semantic relation contains the prime sense of that word. A common example of words for which the prime location contained within a semantic pointer would be parts of machines which are peculiar to that specific machine; that is, in this thesaurus, component parts of objects for which the primary function is to serve specifically as part of that object will be classified together with the object itself in the hierarchy, within a has_part relational group.


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The individual numbered entries of this thesaurus were originally derived from the 1911 Roget's thesaurus. The following differences will be noted between this version and the original edition of the printed 1911 thesaurus:

(1) the space-saving abbreviations in the original, using hyphens to represent common words, prefixes or suffixes, have been expanded into the full words or phrases.

(2) the side-by-side format for words and their opposites has been abandoned. Words are listed in order of their entry number.

(3) each main entry (1035 entries) has a pound sign "#" in front of the number to facilitate computerized search.

(4) greek words and phrases are transliterated and included between brackets in the format greek word/gr>.

(5) where italics occurred in the original, italics are used in the Microsoft Word(R) format file. In the plain ASCII file, this formatting is lost.

(6) in the original book, words which were obsolete (in 1911) were marked with a dagger. In this version, those words are marked with an "obsolete" notation: "[obs1]". There are over 300 of these.

Some of the words which were still current in 1911, but are no longer found in a current college-size dictionary (presently obsolete words), or which are no longer used in the specific indicated sense, have been marked with an "obsolete" notation: "[obs2]". However, this marking process is purely opportunistic, with no systematic attempt to cover the entire thesaurus; only a small portion of the words which are now obsolete have been thus marked. Most though not all of the foreign-language phrases are now obsolete.

The "obsolete" notation [obs3] indicates that the previous word (or some word in the previous phrase) is not recognized by the word processor's spelling checker, and also is either NOT in a modern college-sized dictionary, or is noted there as being "ARCHAIC".

(7) This file contains only the main body of the thesaurus. Neither outline nor index are contained here. The outline of the original thesaurus with an overview of the organization of the concepts is contained in a separate file, "outline.doc", on the distribution disk.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
This is a very small-scale (individual) project, which will not be competitive with academic or commercial efforts such as the CYC project, but is intended to provide a convenient resource primarily for our own programs, but also for experimentation in natural-language processing by individuals or small groups with no access to more complete systems. Anyone who is currently engaged in or contemplating a similar thesaurus or dictionary project, who would be willing to collaborate on this project, is encouraged to contact us, so that unnecessary duplication of effort can be avoided. We would also appreciate being notified of typos, errors, or omissions in any version. Send inquiries or comments to:
Patrick Cassidy

MICRA Inc.

735 Belvidere Ave.

Plainfield, NJ 07062-2054
voice: (908) 668-5252

fax: (908) 668-5904
(If no one answers, please leave a message.)

==========================================================================

-->

<-- HTENTRY.DOC

this is the detailed arrangement of the hierarchical thesaurus

based on Roget's 1911 thesaurus. Only the 1000+ main category

entries are listed here.


Plan of Classification: the three major categories are not

themselves included within the ISA hierarchy. The first

(top) level of the ISA hierarchy starts with the categories

designated by capital letters.

the top three categories are

A: abstract relations

P: physical universe

M: mental universe

all categories begin with one of these letters.

The "R" numbers refer to the main heading numbers (1 to 1000, with some subheadings, such as 615a)

from the 1911 Roget's Thesaurus

-->


<--

------------------------------------------------------------------------



=========================================================================

** DETAILED PLAN OF CLASSIFICATION *

=========================================================================

***************************************************************************

** HITHES **

** A hierarchical thesaurus with inheritance **

** MAIN TEXT OF THESAURUS **

***************************************************************************

-->[[hithes]]


A. Abstract Relations

A1. Existence R1 to R8


A1.1.1. ABSTRACT EXISTENCE

#1. Existence.-- N. {{antonym: nonexistence, R2}}

existence, being, entity, ens[Lat], esse[Lat], subsistence.

center of life, essence, inmost nature, inner reality, vital principle.

[real existence in the physical world] reality, actuality; positiveness &c. adj.; fact, matter of fact, sober reality; truth &c. 494; actual existence; stubborn fact, hard fact, irreducible fact; not a - dream &c. 515; no joke.

$$.

[ref: presence &c. (existence in space) 186; coexistence &c. 120].



{{studied_in}} ontology.

{{caused_by: creation, R161}}.



V. exist, be; have being &c. n.

stand, obtain, be the case; occur &c. (event) 151; have place, prevail.

[of animate things] subsist, live, breathe, find oneself, pass the time, vegetate.

consist in, lie in; be comprised in, be contained in, be constituted by.

come into existence, come into being; arise &c. (begin) 66; come forth &c. (appear) 446. become &c. (be converted) 144.

actualize; bring into existence &c. 161.

abide, continue, endure, last, remain, stay.


Download 7.2 Mb.

Share with your friends:
  1   2   3   4   5   6   7   8   9   ...   85




The database is protected by copyright ©ininet.org 2024
send message

    Main page