Dmlis 540 Spring 2004 Information System Project mecca pop and Jazz Music For Family Learning and Enjoyment All Team members participated in the development of the spec Project Manager: Carolyn Karis Information Architect: Emily Wheeler



Download 0.56 Mb.
Page7/13
Date17.05.2017
Size0.56 Mb.
#18448
1   2   3   4   5   6   7   8   9   10   ...   13

5.2.Full-text data structure(s)


Markup of IR Record Detail




5.2.2 Field and Index Summary


Record Type

Field

Indexing Methods

Building the Index

Song_Lyrics

Chorus

Word and Phrase searching will be required on the Chorus, because this is the most repeated text in the song and users searching by song lyrics will probably enter a phrase from the Chorus.

Word indexing can be done by machine, but a person should determine the phrases to include from the chorus.




Verse

Text


Verse will be searched by Word. The words will be matched against the inverted index. The IR is linked by an accession number which acts as a primary key for the Lyrics text data store. The Accession number is a foreign key in the RDB.

Artist_

Biography



Author

We do not think that users will search by the author of an artist biography, but we feel that this is useful information to include in the results. We will not include this field in the indexing.

The indexing for the biography text can be completed by a machine.




Date

Date the biography was written will be useful information for the user, but will probably not be searched. We will not include this field in the indexing.




Text

The biography will be searchable by word. An Accession number in the IR is linked by an accession number which acts as a primary key for the biography text data store. The Accession number is a foreign key in the RDB.




Artist_Name

We think that the name of the artist may be the link used by our young users as a means to access the biography of an Artist. This access will be through the Artist_Name or Artist_ID rather than through a name search in the IR.

Artist_Biography accessed in this manner will rely upon the RDB and the RDB index rather than the IR Index but name may be retrieved by the user by Searching on the Artists choice in the Exclusive Radio Button control that forms a part of the Search Feature found on all main pages of Mecca.

Album_

Commentary



Reviewer

We will not index this field.

The indexing for the album commentary can be completed by a machine. We believe that our human indexing should be focused on those documents that will require some phrase searching. Because Album Commentary will be indexed by word, a simple computer-generated inverted file will be sufficient.




Date

We will not index this field.




Text

The commentary text should be searched by word. The IR is linked by an accession number which acts as a primary key for the Album text data store. The Accession number is a foreign key in the RDB.

Style_

Description



Text

The text description of style will not be searchable but rather be displayed to the user as a result of a lookup tables.

This text will be a memo field in the RDB and will not be searchable as full-text.

Instrument_

Background



Author

We will not index this field.

The word indexing of Instrument descriptions can be completed by machine, but educational reviewers will scan the documents and add any important musical subjects to the inverted file. An example of the phrases included in the inverted file would be names of musical eras, such as “Harlem Renaissance.”




Date

We will not index this field.




Text

The instrument background should be indexed by word, by some important musical phrases, and by use of terms from the Musical Instrument Thesaurus.

The Mecca Information system includes several types of full-text structures. In addition to the lyrics of songs, the system includes biographies and background of the artists, reviews (of song recordings, individual or as part of albums, and of the artists or their work), and informational text about instruments. Although Image_Description contains full-text, this text will not be searchable but rather will be presented to the user as part of a Results or Details Page that includes the image. This Image_Description text will allow the user, the home-schooled children in particular, to learn information without needing to perform an active Search query. Style_Description is also non-searchable text. Because the main user of our system will be young, the system will allow access to the text data information in a number of ways. For example, Style_Description contains text that is not searchable but will be returned to the user through the processing codes that uses the RDB Index.

Indexing will be accomplished with a combination of human and machine indexing as outlined in the above table. In addition to the machine indexing of the text in Style_Description and Instrument_Background, human indexers will select important phrases for indexing in the Inverted Files and in the creation of the Controlled Vocabularies. The system will include several Controlled Vocabularies—for the Recording Style_Name, for Style_Name, for Style_Description, and for Musical_Instrument_Name. We have included an example of a Controlled Vocabulary, the Musical Instrument Thesaurus. (See Figures 5.2.3.A and 5.2.3.B.). The indexing of the song, lyrics, biographies and background of the artists, album commentary, and background on the instruments will mainly use machine indexing but a person will determine important phrases. Repeated word co-occurrences will be indexed as a phrase if they occur more than three times in the same text record. For example, the chorus of “I’ll Like a Bird” appears five times in the song lyrics of the text record. In this instance, the chorus would be indexed as a phrase. The phrase will become an additional searchable field in the text base.

The Recording is central to the Mecca Information System, as can be seen in the RDB. The structure is created in this way because of the characteristics of our users, the home-schooled children and their parents. We anticipate that much of the use of the system, especially the full-text items will be accessed by the users through the exclusive option search features and controls which appear on the homepage or other main pages. (See the basic Search feature on the Mecca homepage and the descriptions in Page Cycles, Section 6.1.1.). For example, the Featured Artists and other items featured on the homepage (See mockup found in Section 6.1.1) may lead the user to click to listen or to learn information from the various types of full-text (detailed in the previous paragraph).

We expect that the young users may access the lyrics by clicking on the title name of the song recording appearing on a results page. The full song lyrics will be retrieved using the passing of the Accession number from the results page to the details page. The Accession number in the IR index and text data store connects to the Song_Lyrics ID in the RDB. In all cases, the Accession number will not be displayed on details pages. We do not think that this information be of user or interest to our users.

Since many of the lyrics of the songs will be added to the text data store as groupings from the album on which they appear and also to speed retrieval of the lyrics of all songs from an album (something we expect our young users to do), our numbering/Accession Number system should have a pattern. To provide an example of the schema, Figure 5.2.2.A applies the number system to the “Whoa, Nelly! album, the one which appears in the scenario and in many of the examples found through this prospectus for the Mecca Information System.


Figure 5.2.2.A Example of the Suggested Schema

for Numbering/Accession numbers
As applied to two albums--

Albums: “900nn”

Songs: Album + “-nn” [the song accession number begins with a hyphen and the “n*” number from the album. The number of the song on the album (in its order of appearance on the album) is then added.]

Note: this is merely a suggested schema, which may be modified. However, the purpose of the schema-- providing a means to know the connection of the song lyrics to an album and the ordering of the songs on the album-- should be retained.


AL 90010= Whoa, Nelly!

Artist_Composer and Artist_Performer = Nelly Furtado (from RDB)

-11. Hey, Man!

-12. Shit on the Radio (Remember the Days)

-13. Baby Girl

-14. Legend

-15. I'm Like a Bird

-16. Turn Off the Light

-17. Trynna Finda Way

-18. Party

-19. Well, Well

-110. My Love Grows Deeper, Pt. 1

-111. I Will Make U Cry

-112. Scared of You

-113 Oude Estás (UK album)

Therefore, the Accession Number for the Lyrics of “I’m Like a Bird” would be LY90010-15


AL 90020= Folklore

Artist_Composer and Artist_Performer = Nelly Furtado

-21. One-Trick Pony

-22. Powerless (Say What You Want)

-23. Explode

-24. Try

-25. Fresh Off the Boat

-26. Forca

-27. Saturdays

-28. Picture Perfect

-29. The Grass Is Green

-210. Build You Up

-211. Island of Wonder

-212. Childhood Dreams


Figure 5.2.2.B Example of the Deconstructed Full-Text Record

Lyrics of “I’m Like a Bird”

Individual Full-Text Record Deconstructed

90010-15


AR: Artist: Nelly Furtado

AL Album: 90010 Whoa Nelly


LY90010-15 "I'm Like A Bird"
L1 L2 L3 L4 L5

You’re beautiful, that's for sure


L6 L7 L8 L9

You'll never ever fade


L10 L11 L12 L13 L14 L15 L16

You're lovely but it's not for sure


L17 L18 L19 L20 L21

That I won't ever change


L22 L23 L24 L25 L26 L27

And though my love is rare


L28 L29 L30 L31 L32

Though my love is true


LC

L33 =L34 through L75 (L33= phrase, the chorus)



[Chorus:]1
L34 L35 L36 L37 L38 L39 L40 L41

I'm like a bird, I'll only fly away


L42 L43 L44 L45 L46 L47 L48 L49 L50 L51

I don't know where my soul is, I don't know

L52 L53 L54 L55

where my home is


L56 L57 L58 L59 L60 L61 L62 L63 L64 L65

(and baby all I need for you to know is)


L66(Repeat)

L34 L35 L36 L37 L38 L39 L40 L41

I'm like a bird, I'll only fly away
L67 (Repeat) (this phrase= repeat of L42 to L55)

L42 L43 L44 L45 L46 L47 L48 L49 L50 L51

I don't know where my soul is, I don't know

L52 L53 L54 L55

where my home is
L68 L69 L70 L71 L72 L73 L74 L75

All I need for you to know is


L76 L77 L78 L79 L80 L81 L82 L83

Your faith in me brings me to tears


L84 L85 L86 L87 L88

Even after all these years


L89 L90 L91 L92 L93 L94 L95 L96

And it pains me so much to tell


L97 L98 L99 L100 L101 L102 L103

That you don't know me that well


L104 L105 L106 L107 L108 L109

And though my love is rare


L110 L111 L112 L113 L114

Though my love is true


L115 =L33[Chorus]
L116 L117 L118 L119 L120 L121 L122

It's not that I wanna say goodbye


L123 L124 L125 L126 L127 L128 L129 L130

It's just that every time you try to

L131 L132 L133 L134 L135 L136

tell me that you love me


L137 L138 L139 L140 L141 L142 L143

Each and every single day I know


L144 L145 L146 L147 L148 L149 L150 L151 L152

I'm going to have to eventually give you away


L153 L154 L155 L156 L157 L158

And though my love is rare


L159 L160 L161 L162 L163 L164

And though my love is true


L165 L166 L167 L168

Hey I'm just scared


L169 L170 L171 L172 L173

That we may fall through


L174 = L33 (Chorus)

L175 = L33 (Chorus)

L176 = L33 (Chorus)
Figure 5.2.2.C – Sample Inverted File
Inverted Index—Nelly Furtado: I’m Like a Bird

Album: AL 90010 Whoa Nelly

Song_ID 90010-15 “I’m Like a Bird” (Song_Name)

LY90010-15 I’m Like a Bird (full text lyrics)


(Stop words are shown with Strike through)

Term

Album ID (album)

Song ID (Song)

Lyric ID (Lyrics)

Position

A

90010

90010-15

LY 90010-15

L36

After

90010

90010-15

LY 90010-15

L85

All

90010

90010-15

LY 90010-15

L58 L68 L86

And

90010

90010-15

LY 90010-15

L22 L56 L89 L104 L138 L153 L159

Away

90010

90010-15

LY 90010-15

L41 L152

Baby

90010

90010-15

LY 90010-15

L57

Beautiful

90010

90010-15

LY 90010-15

L2

Bird

90010

90010-15

LY 90010-15

L37

Brings

90010

90010-15

LY 90010-15

L80

but

90010

90010-15

LY 90010-15

L12

change

90010

90010-15

LY 90010-15

L21

Chorus 1

90010

90010-15

LY 90010-15

L33

Chorus 2

90010

90010-15

LY 90010-15

L115

Chorus 3

90010

90010-15

LY 90010-15

L174

Chorus 4

90010

90010-15

LY 90010-15

L175

Chorus 5

90010

90010-15

LY 90010-15

L176

Day

90010

90010-15

LY 90010-15

L141

Don’t

90010

90010-15

LY 90010-15

L43 L50 L99

Each

90010

90010-15

LY 90010-15

L137

Even

90010

90010-15

LY 90010-15

L84

Eventually

90010

90010-15

LY 90010-15

L149

Ever

90010

90010-15

LY 90010-15

L8 L20

Every

90010

90010-15

LY 90010-15

L126 L139

Fade

90010

90010-15

LY 90010-15

L9

Faith

90010

90010-15

LY 90010-15

L77

Fall

90010

90010-15

LY 90010-15

L172

Fly

90010

90010-15

LY 90010-15

L40

For

90010

90010-15

LY 90010-15

L4 L15 L61 L71

Give

90010

90010-15

LY 90010-15

L150

Going

90010

90010-15

LY 90010-15

L145

Goodbye

90010

90010-15

LY 90010-15

L122

Have

90010

90010-15

LY 90010-15

L147

Hey

90010

90010-15

LY 90010-15

L165

home

90010

90010-15

LY 90010-15

L54

I

90010

90010-15

LY 90010-15

L18 L42 L49 L59 L69 L119 L142

I don’t know where my soul is, I don’t know where my home is

90010

90010-15

LY 90010-15

L67

I’ll

90010

90010-15

LY 90010-15

L38

I’m

90010

90010-15

LY 90010-15

L34 L144 L166

I’m like a bird, I’ll only fly away

90010

90010-15

LY 90010-15

L66

In

90010

90010-15

LY 90010-15

L78

Is

90010

90010-15

LY 90010-15

L26 L31 L48 L55

L 65 L75 L108 L113 L157 L163



It

90010

90010-15

LY 90010-15

L90

It’s

90010

90010-15

LY 90010-15

L13 L116 L123

Just

90010

90010-15

LY 90010-15

L124 L167

Know

90010

90010-15

LY 90010-15

L44 L51 L64 L74 L100 L143

Like

90010

90010-15

LY 90010-15

L35

Love

90010

90010-15

LY 90010-15

L25 L30 L107 L112 L135 L156 L162

Lovely

90010

90010-15

LY 90010-15

L11

May

90010

90010-15

LY 90010-15

L171

Me

90010

90010-15

LY 90010-15

L79 L81 L92 L132 L136

Much

90010

90010-15

LY 90010-15

L94

My

90010

90010-15

LY 90010-15

L24 L29 L46 L53 L106 L111 L155 L161

Need

90010

90010-15

LY 90010-15

L60 L70

Never

90010

90010-15

LY 90010-15

L7

Not

90010

90010-15

LY 90010-15

L14 L117

Only

90010

90010-15

LY 90010-15

L39

Pains

90010

90010-15

LY 90010-15

L91

Rare

90010

90010-15

LY 90010-15

L27 L109 L158

Say

90010

90010-15

LY 90010-15

L121

Scared

90010

90010-15

LY 90010-15

L168

Single

90010

90010-15

LY 90010-15

L140

So

90010

90010-15

LY 90010-15

L93

Soul

90010

90010-15

LY 90010-15

L47

Sure

90010

90010-15

LY 90010-15

L5 L16

Tears

90010

90010-15

LY 90010-15

L83

Tell

90010

90010-15

LY 90010-15

L96 L131

That

90010

90010-15

LY 90010-15

L17 L97 L102 L118 L125 L133 L169

That’s

90010

90010-15

LY 90010-15

L3

These

90010

90010-15

LY 90010-15

L87

Though

90010

90010-15

LY 90010-15

L23 L28 L105 L110 L154 L160 L173

Time

90010

90010-15

LY 90010-15

L127

To

90010

90010-15

LY 90010-15

L63 L73 L82 L95 L130 L146 L148

True

90010

90010-15

LY 90010-15

L32 L114 L164

Try

90010

90010-15

LY 90010-15

L129

Wanna

90010

90010-15

LY 90010-15

L120

We

90010

90010-15

LY 90010-15

L170

Well

90010

90010-15

LY 90010-15

L103

Where

90010

90010-15

LY 90010-15

L45 L52

Won’t

90010

90010-15

LY 90010-15

L19

Years

90010

90010-15

LY 90010-15

L88

You

90010

90010-15

LY 90010-15

L62 L72 L98 L128 L134 L151

You’ll

90010

90010-15

LY 90010-15

L6

You’re

90010

90010-15

LY 90010-15

L1 L10

Your

90010

90010-15

LY 90010-15

L76

















5.2.3 Query and Retrieval Structures
The retrieval functions connected with our IR system have been specially designed to address the unique needs of a younger audience. We anticipate that our users will navigate our system with minimal searching or simple searching. In their searching they will have minimal understanding of Boolean operators and frequent misspellings. In some cases, this has made our task easier. For instance, we do not expect our users to use truncation in their searching, so we will not need to prepare for this retrieval function in the Mecca IR system. However, we do expect to utilize other techniques to improve the recall and precision of our IR system. For example, controlled vocabularies for instrument names and musical styles will be searchable with drop-down menus.

In order to facilitate quick lyric searches and proximity searching of the lyrics text file, the system will use an inverted file index to access the full-text data store. We will index four types of free-text documents in our system: song lyrics, artist biographies, album commentaries, and instrument descriptions. The retrieval of full-text will be displayed as in the example for the lyrics of “I’m Like a Bird” found in Section 5.2.1. Prior to creation of the Inverted Index, text will be deconstructed. See the example of deconstruction of the song “I’m Like a Bird” found in 5.2.2. From this deconstruction or preprocessing of text, an inverted index which will be generated (Figure 5.2.2.A). The inverted file will list each word, the document in which the word can be found, and the placement of the word within that field. This last column of the inverted file (the position) will allow us to determine the proximity of multiple words in a query, allowing us to rank results using proximity of the query words. The inverted index will contain an alphabetical listing of the words of the document. Certain words appearing frequently and ones not likely to be used in a search will not be included in the index. The Stop Words for our system are: a, an, and, by, for, in, is, it, of, the, to.

The Inverted Index will be composed mostly of individual words. However, because many of the songs, especially the Pop songs such as those by Nelly Furtado, include repeated phrases and choruses that are repeated, we have chosen to index by both words and phrases. Phrases are limited to repeated lines which are indexed by individual words when they appear initially. Repeats of a lyric line have been assigned an individual index number and position. Choruses are treated in the same manner with the first appearance of the chorus being indexed by individual words and subsequent chorus being assigned a position number for the phrase (chorus). The chorus itself is assigned a grouping identifier (e.g. L33 for the Chorus in “Bird”). Each repeat of the chorus is also assigned a position number so that the lyrics can be reconstructed completely. As mentioned earlier word co-occurrences appearing more than three times in a text record will be considered a phrase and indexed as such. We have chosen this approach since a chorus is quite memorable especially if it is repeated four or more times within a song (such as is the case for the song by Nelly Furtado, “I’m Like a Bird). We expect that our youthful users will tend to remember and use the chorus or at least a large portion of the chorus as a way to search for lyrics. The ability to evaluate proximity of terms will enable the users to retrieve most phrases be they word phrases of repeated lines or chorus phrases.

Because of the youthfulness of our main users, we decided not to include stemming in the index. The high cost of doing this would not be justified for our users. We believe that they also will not use Boolean searching but rather will enter terms or natural language words as free text. We have designed the system with delimited searching through the use of the exclusive-choice type of controls such as radio/option buttons (Search by Artists or Songs or Lyrics) but have provided Advanced Search capabilities through the Advanced Search Features and Functionality. (See Section 7). Because of these limitations, we have decided not to deconstruct contractions in the lyrics but to leave them represented as single words. If the current approach to IR does not meet the needs of our users, this approach can be modified in the future.

We will need to use thesauri to address potential inconsistencies in spelling and terminology. Artist names are often difficult to spell, and we do not want the children using our site to become frustrated because they cannot find a particular artist. We want to both encourage them to learn the correct spelling and direct them towards the artist they want. To meet this need, we will compile a thesaurus of artist names and possible misspelled names. Queries that match a term in the misspelled thesaurus will generate a message such as “Did you mean ___?” We will employ two methods to develop this thesaurus. First, we will develop or obtain a preliminary list of common misspellings of artist names. We will also keep track of search queries that generate no results and try to determine if those queries might be misspellings of artist names. In this way, our thesaurus will expand to include common misspellings by our users.

Another thesaurus will address the multiple names and descriptions for musical instruments. Figures 5.2.3.B and 5.2.3.C list the terms and give example entries. This thesaurus or controlled vocabulary was developed for use with the Mecca system to enable better access to the instruments and information about Instruments by the home-schooled users of the Mecca system. The instruments will be accessed mainly through the Listen to Browse Page by the use of the drop-down Feature/controls. We wished to maximize the access of the users to the instruments by standardizing names since the youthful users of the system might misspell the instrument names if they were accessible by a text box Search. Also some of the instruments have several names and the same word might apply to different types of instruments. For example, “bass” could apply to the string bass or to a bass guitar.


Figure 5.2.3.A - List of Terms in the

Musical Instruments Controlled Vocabulary

(Based on the Thesaurus of Musical Instruments http://alteriseculo.com/instruments)




  • Accordion

  • Alto flute

  • Alto Horn

  • Alto Trombone

  • Autoharp

  • Band organ

  • Bass clarinet

  • Bass drum

  • Bass guitar

  • Bass trombone

  • Bongo

  • Castanets

  • Cello

  • Clarinet

  • Conga (Drum)

  • Cornet

  • Cowbells

  • Cymbals

  • Double bass

  • Drum

  • Drum machine

  • Drum set

  • Electric guitar

  • Electronic keyboard (Synthesizer)

  • Electronic percussion instruments

  • Electronic piano



  • Euphonium Fender guitar

  • Flute

  • Frame drums

  • Gong

  • Guitar

  • Guitara portuguesa

  • Harmonica

  • Horn (Musical instrument)

  • Jawbone (Musical instrument)

  • Kazoo

  • Keyboard controller (Musical instrument)

  • Keyboard instruments

  • Keyboards (Music)

  • Keyed fiddle

  • Lute

  • Lyre

  • Mandolin

  • Marimba

  • Martin guitar

  • Mechanical musical instruments

  • Mechanical pianos

  • MIDI controllers

  • Musical instruments

  • Musical saw

  • Notched rattle

  • Oboe

  • Ocarina

  • Organ

  • Pedal piano

  • Penny whistle

  • Percussion controller (Musical instrument)

  • Percussion instruments

  • Piano

  • Piccolo

  • Pipe (Musical instrument)

  • Player piano

  • Plucked instruments

  • Racket (Musical instrument)




  • Rattle (Musical instrument)

  • Saxophone

  • Snare drum

  • Steel drum (Musical instrument)

  • Stringed instruments

  • Stringed Instruments, Bowed

  • Tam-tam

  • Tambourine

  • Timpani

  • Triangle (Musical instrument)

  • Trombone

  • Trumpet

  • Tuba

  • Viola

  • Violin

  • Violoncello

  • Whistles

  • Wind instruments

  • Woodwind instruments

  • Xylophone

  • Zither

Figure 5.2.3.B

Example Listings in the Musical Instrument Thesaurus

Examples of Thesaurus listings




Brass instruments

UF Brasses (Musical instruments)


BT Wind instruments
NT Baritone (Musical instrument)

Bugle


Cornet

Cornett


Euphonium

Flügelhorn

Helicon

Horn (Musical instrument)



Post horn

Sarrusophone

Saxhorn

Trombone


Trumpet

Tuba
Brasses (Musical instruments)

USE Brass instruments

Drum

BT Percussion instruments

NT Base drum

Bongo


Frame Drums

Snare drum

Steel drum (Musical instrument)

Timpani


Drum kit

USE Drum set


Drum machine

BT Electronic percussion instruments


Drum set

UF Drum kit

Drumset

Trap kit


BT Percussion instruments

Guitarra portuguesa

BT Lute


Musical instruments -- Portugal
Guitar

UF Spanish guitar

BT Plucked instruments

NT Electric guitar

English guitar

Gretsch guitar

Hawaiian guitar

Martin guitar

Ukelele

Viola d’arame


Guitar, Electric

USE Electric guitar


Guitar, Steel

USE Hawaiian guitar






Download 0.56 Mb.

Share with your friends:
1   2   3   4   5   6   7   8   9   10   ...   13




The database is protected by copyright ©ininet.org 2024
send message

    Main page