instructor: Geunbae Lee, Eng 2-211, firstname.lastname@example.org, 279-2254
1. Course objectives
This course introduces various recent statistical methods in natural language processing.
We will cover basic statistical tools for computational linguistics and their application to part-of-speech tagging,
statistical parsing, word sense disambiguation, machine translation, information retrieval and statistical discourse processing.
If time permits, we will briefly touch on some topics of statistical language models for speech recognition and text-to-speech systems.
2. Course prerequisites
cse561 linguistic fundamentals for natural language processing OR instructor approval
class participation 10%
4. Required texts or references
Text : Manning, C. D., & Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
References : Brigitte Krenn and Christer Samuelsson. The Linguist's Guide to Statistics. Internet shareware, http://www.coli.uni-sb.de/~krenn/edu.html
- E. Brill. A simple rule-based part-of-speech tagger. Proceedings of the 3rd conference on applied NLP, 1992
- E. Roche and Y. Schabes. Deterministic part-of-speech tagging with finite state transducers. Computational linguistics 21, 1995.
- B. Merialdo. Tagging English text with a probabilistic model. Computational linguistics 20, 1994.
- Jeongwon Cha, Geunbae Lee, Jong-Hyeok Lee. Generalized unknown morpheme guessing for hybrid POS tagging of Korean. Proceedings of SIXTH WORKSHOP ON VERY LARGE CORPORA in Coling-Acl 98, Montreal, 1998.
- K. Lari and S. Young. The estimation of stochastic context-free grammar using the inside-outside
algorithm. Computer speech and language 4, 1990
- F. Pereira and Y. Schabes. Inside-outside reestimation frm partially bracketed corpora. ACL 30, 1992
- T. Briscoe and J. Carroll. Generalized probabilistic LR parsing of natural language (corpora) with unification-based grammars. Computational linguistics 19, 1993.
- E. Black et. al. Towards history-based grammars: using richer models for probablistic parsing. ACL 31, 1993.
- D. Margerman. Statistical decision-tree models for parsing, ACL 33, 1995.
- Brill. Automatic grammar induction and parsing free text: a transformation-based approach. ACL 31, 1993
- D. Hindle and M. Rooth. Structural ambiguity and lexical relations. Computational linguistics 19, 1993
- Brown et al. A statiscal approach to machine translation. Computational linguistics 16, 1990
- Wu. Aligning a parallel English-Chinese corpus statistically with lexical criteria. ACL 32, 1994
- Church. Char_align: A program for aligning parall디 texts at the character level. ACL 31, 1993
- Sproat et al. A stochstic finite-state word segmentation algorithm for Chinese, ACL 32, 1994
(lexical knowledge acquisition)
- Manning. Automatic acquistion of a large subcategorization dictionary from corpora. ACL 31, 1993
- Smadja. Retrieving collocations from text: Xtract, Computational linguistics, 1993
(speech and others)
- Brown et al. Class-based n-gram models of natural language, computational linguistics, 18(4), 1992
- Chien et al. A best-first language processing model integrating the unification-based grammar and markov language model for speech recognition applications. IEEE trans. on speech and audio processing, 1(2), 1993
- Derouault and Meialdo. Natural language modeling for phoneme-to-text transcription, IEEE trans. PAMI, 8(6), 1986
- Seneff. Tina: a natural language system for spoken langugage applications, computational linguistics, 18(1), 1992
- Gupta et al. a language model for very large-vocabulary speech recognition, computer speech and language 6, 1992