Dmlis 540 Spring 2004 Information System Project mecca pop and Jazz Music For Family Learning and Enjoyment All Team members participated in the development of the spec Project Manager: Carolyn Karis Information Architect: Emily Wheeler
The Mecca Information system includes several types of full-text structures. In addition to the lyrics of songs, the system includes biographies and background of the artists, reviews (of song recordings, individual or as part of albums, and of the artists or their work), and informational text about instruments. Although Image_Description contains full-text, this text will not be searchable but rather will be presented to the user as part of a Results or Details Page that includes the image. This Image_Description text will allow the user, the home-schooled children in particular, to learn information without needing to perform an active Search query. Style_Description is also non-searchable text. Because the main user of our system will be young, the system will allow access to the text data information in a number of ways. For example, Style_Description contains text that is not searchable but will be returned to the user through the processing codes that uses the RDB Index. Indexing will be accomplished with a combination of human and machine indexing as outlined in the above table. In addition to the machine indexing of the text in Style_Description and Instrument_Background, human indexers will select important phrases for indexing in the Inverted Files and in the creation of the Controlled Vocabularies. The system will include several Controlled Vocabularies—for the Recording Style_Name, for Style_Name, for Style_Description, and for Musical_Instrument_Name. We have included an example of a Controlled Vocabulary, the Musical Instrument Thesaurus. (See Figures 5.2.3.A and 5.2.3.B.). The indexing of the song, lyrics, biographies and background of the artists, album commentary, and background on the instruments will mainly use machine indexing but a person will determine important phrases. Repeated word co-occurrences will be indexed as a phrase if they occur more than three times in the same text record. For example, the chorus of “I’ll Like a Bird” appears five times in the song lyrics of the text record. In this instance, the chorus would be indexed as a phrase. The phrase will become an additional searchable field in the text base. The Recording is central to the Mecca Information System, as can be seen in the RDB. The structure is created in this way because of the characteristics of our users, the home-schooled children and their parents. We anticipate that much of the use of the system, especially the full-text items will be accessed by the users through the exclusive option search features and controls which appear on the homepage or other main pages. (See the basic Search feature on the Mecca homepage and the descriptions in Page Cycles, Section 6.1.1.). For example, the Featured Artists and other items featured on the homepage (See mockup found in Section 6.1.1) may lead the user to click to listen or to learn information from the various types of full-text (detailed in the previous paragraph). We expect that the young users may access the lyrics by clicking on the title name of the song recording appearing on a results page. The full song lyrics will be retrieved using the passing of the Accession number from the results page to the details page. The Accession number in the IR index and text data store connects to the Song_Lyrics ID in the RDB. In all cases, the Accession number will not be displayed on details pages. We do not think that this information be of user or interest to our users. Since many of the lyrics of the songs will be added to the text data store as groupings from the album on which they appear and also to speed retrieval of the lyrics of all songs from an album (something we expect our young users to do), our numbering/Accession Number system should have a pattern. To provide an example of the schema, Figure 5.2.2.A applies the number system to the “Whoa, Nelly! album, the one which appears in the scenario and in many of the examples found through this prospectus for the Mecca Information System. Figure 5.2.2.A Example of the Suggested Schema for Numbering/Accession numbers As applied to two albums-- Albums: “900nn” Songs: Album + “-nn” [the song accession number begins with a hyphen and the “n*” number from the album. The number of the song on the album (in its order of appearance on the album) is then added.] Note: this is merely a suggested schema, which may be modified. However, the purpose of the schema-- providing a means to know the connection of the song lyrics to an album and the ordering of the songs on the album-- should be retained. AL 90010= Whoa, Nelly! Artist_Composer and Artist_Performer = Nelly Furtado (from RDB) -11. Hey, Man! -12. Shit on the Radio (Remember the Days) -13. Baby Girl -14. Legend -15. I'm Like a Bird -16. Turn Off the Light -17. Trynna Finda Way -18. Party -19. Well, Well -110. My Love Grows Deeper, Pt. 1 -111. I Will Make U Cry -112. Scared of You -113 Oude Estás (UK album) Therefore, the Accession Number for the Lyrics of “I’m Like a Bird” would be LY90010-15 AL 90020= Folklore Artist_Composer and Artist_Performer = Nelly Furtado -21. One-Trick Pony -22. Powerless (Say What You Want) -23. Explode -24. Try -25. Fresh Off the Boat -26. Forca -27. Saturdays -28. Picture Perfect -29. The Grass Is Green -210. Build You Up -211. Island of Wonder -212. Childhood Dreams Figure 5.2.2.B Example of the Deconstructed Full-Text Record Lyrics of “I’m Like a Bird” Individual Full-Text Record Deconstructed 90010-15
AR: Artist: Nelly Furtado AL Album: 90010 Whoa Nelly LY90010-15 "I'm Like A Bird" L1 L2 L3 L4 L5 You’re beautiful, that's for sure L6 L7 L8 L9 You'll never ever fade L10 L11 L12 L13 L14 L15 L16 You're lovely but it's not for sure L17 L18 L19 L20 L21 That I won't ever change L22 L23 L24 L25 L26 L27 And though my love is rare L28 L29 L30 L31 L32 Though my love is true LC L33 =L34 through L75 (L33= phrase, the chorus) [Chorus:]1 L34 L35 L36 L37 L38 L39 L40 L41 I'm like a bird, I'll only fly away L42 L43 L44 L45 L46 L47 L48 L49 L50 L51 I don't know where my soul is, I don't know L52 L53 L54 L55 where my home is L56 L57 L58 L59 L60 L61 L62 L63 L64 L65 (and baby all I need for you to know is) L66(Repeat) L34 L35 L36 L37 L38 L39 L40 L41 I'm like a bird, I'll only fly away
L42 L43 L44 L45 L46 L47 L48 L49 L50 L51 I don't know where my soul is, I don't know L52 L53 L54 L55 where my home is
All I need for you to know is L76 L77 L78 L79 L80 L81 L82 L83 Your faith in me brings me to tears L84 L85 L86 L87 L88 Even after all these years L89 L90 L91 L92 L93 L94 L95 L96 And it pains me so much to tell L97 L98 L99 L100 L101 L102 L103 That you don't know me that well L104 L105 L106 L107 L108 L109 And though my love is rare L110 L111 L112 L113 L114 Though my love is true L115 =L33[Chorus] L116 L117 L118 L119 L120 L121 L122 It's not that I wanna say goodbye L123 L124 L125 L126 L127 L128 L129 L130 It's just that every time you try to L131 L132 L133 L134 L135 L136 tell me that you love me L137 L138 L139 L140 L141 L142 L143 Each and every single day I know L144 L145 L146 L147 L148 L149 L150 L151 L152 I'm going to have to eventually give you away L153 L154 L155 L156 L157 L158 And though my love is rare L159 L160 L161 L162 L163 L164 And though my love is true L165 L166 L167 L168 Hey I'm just scared L169 L170 L171 L172 L173 That we may fall through L174 = L33 (Chorus) L175 = L33 (Chorus) L176 = L33 (Chorus)
Album: AL 90010 Whoa Nelly Song_ID 90010-15 “I’m Like a Bird” (Song_Name) LY90010-15 I’m Like a Bird (full text lyrics) (Stop words are shown with Strike through)
5.2.3 Query and Retrieval Structures The retrieval functions connected with our IR system have been specially designed to address the unique needs of a younger audience. We anticipate that our users will navigate our system with minimal searching or simple searching. In their searching they will have minimal understanding of Boolean operators and frequent misspellings. In some cases, this has made our task easier. For instance, we do not expect our users to use truncation in their searching, so we will not need to prepare for this retrieval function in the Mecca IR system. However, we do expect to utilize other techniques to improve the recall and precision of our IR system. For example, controlled vocabularies for instrument names and musical styles will be searchable with drop-down menus. In order to facilitate quick lyric searches and proximity searching of the lyrics text file, the system will use an inverted file index to access the full-text data store. We will index four types of free-text documents in our system: song lyrics, artist biographies, album commentaries, and instrument descriptions. The retrieval of full-text will be displayed as in the example for the lyrics of “I’m Like a Bird” found in Section 5.2.1. Prior to creation of the Inverted Index, text will be deconstructed. See the example of deconstruction of the song “I’m Like a Bird” found in 5.2.2. From this deconstruction or preprocessing of text, an inverted index which will be generated (Figure 5.2.2.A). The inverted file will list each word, the document in which the word can be found, and the placement of the word within that field. This last column of the inverted file (the position) will allow us to determine the proximity of multiple words in a query, allowing us to rank results using proximity of the query words. The inverted index will contain an alphabetical listing of the words of the document. Certain words appearing frequently and ones not likely to be used in a search will not be included in the index. The Stop Words for our system are: a, an, and, by, for, in, is, it, of, the, to. The Inverted Index will be composed mostly of individual words. However, because many of the songs, especially the Pop songs such as those by Nelly Furtado, include repeated phrases and choruses that are repeated, we have chosen to index by both words and phrases. Phrases are limited to repeated lines which are indexed by individual words when they appear initially. Repeats of a lyric line have been assigned an individual index number and position. Choruses are treated in the same manner with the first appearance of the chorus being indexed by individual words and subsequent chorus being assigned a position number for the phrase (chorus). The chorus itself is assigned a grouping identifier (e.g. L33 for the Chorus in “Bird”). Each repeat of the chorus is also assigned a position number so that the lyrics can be reconstructed completely. As mentioned earlier word co-occurrences appearing more than three times in a text record will be considered a phrase and indexed as such. We have chosen this approach since a chorus is quite memorable especially if it is repeated four or more times within a song (such as is the case for the song by Nelly Furtado, “I’m Like a Bird). We expect that our youthful users will tend to remember and use the chorus or at least a large portion of the chorus as a way to search for lyrics. The ability to evaluate proximity of terms will enable the users to retrieve most phrases be they word phrases of repeated lines or chorus phrases. Because of the youthfulness of our main users, we decided not to include stemming in the index. The high cost of doing this would not be justified for our users. We believe that they also will not use Boolean searching but rather will enter terms or natural language words as free text. We have designed the system with delimited searching through the use of the exclusive-choice type of controls such as radio/option buttons (Search by Artists or Songs or Lyrics) but have provided Advanced Search capabilities through the Advanced Search Features and Functionality. (See Section 7). Because of these limitations, we have decided not to deconstruct contractions in the lyrics but to leave them represented as single words. If the current approach to IR does not meet the needs of our users, this approach can be modified in the future. We will need to use thesauri to address potential inconsistencies in spelling and terminology. Artist names are often difficult to spell, and we do not want the children using our site to become frustrated because they cannot find a particular artist. We want to both encourage them to learn the correct spelling and direct them towards the artist they want. To meet this need, we will compile a thesaurus of artist names and possible misspelled names. Queries that match a term in the misspelled thesaurus will generate a message such as “Did you mean ___?” We will employ two methods to develop this thesaurus. First, we will develop or obtain a preliminary list of common misspellings of artist names. We will also keep track of search queries that generate no results and try to determine if those queries might be misspellings of artist names. In this way, our thesaurus will expand to include common misspellings by our users. Another thesaurus will address the multiple names and descriptions for musical instruments. Figures 5.2.3.B and 5.2.3.C list the terms and give example entries. This thesaurus or controlled vocabulary was developed for use with the Mecca system to enable better access to the instruments and information about Instruments by the home-schooled users of the Mecca system. The instruments will be accessed mainly through the Listen to Browse Page by the use of the drop-down Feature/controls. We wished to maximize the access of the users to the instruments by standardizing names since the youthful users of the system might misspell the instrument names if they were accessible by a text box Search. Also some of the instruments have several names and the same word might apply to different types of instruments. For example, “bass” could apply to the string bass or to a bass guitar. Figure 5.2.3.A - List of Terms in the Musical Instruments Controlled Vocabulary (Based on the Thesaurus of Musical Instruments http://alteriseculo.com/instruments)
Figure 5.2.3.B Example Listings in the Musical Instrument Thesaurus Examples of Thesaurus listingsBrass instrumentsUF Brasses (Musical instruments) BT Wind instruments NT Baritone (Musical instrument) Bugle
Cornet Cornett
Euphonium Flügelhorn Helicon Horn (Musical instrument) Post horn Sarrusophone Saxhorn Trombone
Trumpet Tuba
USE Brass instruments
BT Percussion instruments NT Base drum Bongo
Frame Drums Snare drum Steel drum (Musical instrument) Timpani
Drum kit USE Drum set Drum machine BT Electronic percussion instruments Drum set UF Drum kit Drumset Trap kit
BT Percussion instruments Guitarra portuguesa BT Lute
Musical instruments -- Portugal Guitar UF Spanish guitar BT Plucked instruments NT Electric guitar English guitar Gretsch guitar Hawaiian guitar Martin guitar Ukelele Viola d’arame Guitar, Electric USE Electric guitar Guitar, Steel USE Hawaiian guitar Directory: portfolio portfolio -> Relationships Between Eye Size and Intensity In North Atlantic Hurricanes portfolio -> Guide to James Bond portfolio -> End of semester report portfolio -> End of semester report portfolio -> Varner Elementary School Media Center Policies and Procedures Handbook Spring 2011 Stephanie Warmoth table of contents portfolio -> Gresham Portland,OR,us columbus Museum of Art Columbus,OH,US portfolio -> Contact and Email portfolio -> Computer Science Program Review December 15, 2008 A. Introduction portfolio -> Executive summary portfolio -> Happily Ever After…Or not Download 0.56 Mb. Share with your friends: |