Corpora in the classroom without scaring the students

Corpora in the classroom without concordances

Download 168.28 Kb.
Size168.28 Kb.
1   2   3

Corpora in the classroom without concordances

Concordance tools typically give the user lots of options as to how the search is specified and displayed, whereas dictionaries provide less: the lexicographers have decided what the learner should know about the word, so there is no need to complicate the learner’s experience by giving lots of options. So when disguising a corpus as a dictionary, the designer should make choices and not leave them to the user.

We can use the word sketch machinery for finding grammatical and collocational patterns, and GDEX for selecting examples for those patterns. That then gives us a fully automatic collocations dictionary. The entry for space in this dictionary is shown in Fig 3. 6

space (n)


watch :    

We are also hoping to hold another Dinner in the Autumn – watch this space !

confine :    

Out of the bag the process will be quicker , but badges in confined spaces should last better for these reasons .

occupy :    

A separate music library was formed in 1975 utilising space formerly occupied by committee rooms .

allocate :    

Externally there is a single garage which is situated en bloc plus an allocated parking space .

fill :    

Food food safe filler is used to fill any spaces in the container .

enclose :    

Where a temple is found within an enclosed space , this is in most cases a rectangular space aligned with the temple .


open :    

Open Spaces There will be a clean up day in November .

green :    

The Green Flag Award scheme is the national standard for parks and green spaces .

outer :    

Moreover , we shall not be the first to place any weapons in outer space .

ample :    

The bright reception room is a generous size , allowing ample space for a dining table .

empty :    

Void size is a measure of how much of the medium consists of empty space .

public :    

Streets are well overlooked helping to make public spaces feel safe .


shuttle :    

Fly my space shuttle into the sun on my 105th birthday .


Why am I so concerned about Britain 's role in space exploration ?


parking :    

However , there would be a lot of scope for disputes where the number of physical parking spaces exceeded the number in the licence .

disk :    

We do not set any limit on how much disk space is used .

storage :    

There was no storage space for his personal possessions .

exhibition :    

Recently lottery approval was received to completely restructure the existing building to provide considerably increased exhibition space built to the highest standards of design .

office :    

The removal of interior partitions will also allow new office space to be created .

living :    

Building and decorating companies are available now to help you to create the ideal living space for you .

  1. Corpora that motivate the students

Wouldn’t it be nice if each student could work on English texts about a topic they were genuinely interested in? Rather than reading about safe textbook subjects like the family, or holiday traditions, or Harry Potter, they could find and learn from texts on gaming, hip hop, manga, or whatever they find fun. That would go a long way to addressing motivation.

We have a tool that allows students to collect corpora on a topic of their choice: it is called WebBootCaT (Baroni et al 2006) and it uses the vast resource of the web. The user inputs five or six ‘seed terms’ in the domain of interest. Triples of seed terms are then sent to the Yahoo search engine, and Yahoo returns with a page of search hits; the program then gathers these pages and builds a corpus of them. A corpus of 300,000 words typically takes a few minutes, though, if the corpus is to be used extensively, further rounds of examining and improving the corpus are recommended (and supported by the software).

Bottom of Form

Smith (2009) offered his Taiwanese first-year undergraduates a project in which they used WebBootCaT to build their own corpus. If the students are sufficiently engaged in the topic, they won’t be scared off.

  1. Summary

In this paper I have given some background on what corpora are and how they have been used in a variety of fields, focusing on ELT. Corpus use has become standard in dictionary and textbook preparation, but using them directly with students remains a specialist activity. This is explained as following mainly from the difficulty that learners have in reading concordances. I present an alternative strategy, in which corpora do come into the classroom – to help students where the dictionary does not tell them enough – but presented as dictionaries. Automatic techniques allow us to do this well: a word sketch is midway between corpus and dictionary We show how this can be done with an automatic collocations ‘dictionary’.
We also present a response to the question of motivation: we provide technology for students to build their own corpora. I’ll leave the final words with one of Smith’s (2009) students who took the project:
I find it special to have your own corpus. It is unique! You can make corpuses by your interests. That can make you know words easily because words are about your own interests.

Baroni, M. and S. Bernardini. 2004. BootCaT: Bootstrapping corpora and terms from the web. Proceedings of LREC, Lisbon: ELDA. 1313-1316.

Baroni M., A. Kilgarriff, J. Pomikalek and P. Rychly) WebBootCaT: a web tool for instant corpora Proc. Euralex.  Torino, Italy.

Boulton, A. Testing the limits of data-driven learning: language proficiency and training. 2009. ReCALL 21 (1): 37-54.

Chan, T. P., & Liou, H. C. (2005). Effects of web-based concordancing instruction on EFL students’ learning of verb–noun collocations. Computer Assisted Language Learning, 18(3), 231 – 250.

Chomsky, N, 1957. Syntactic Structures. The Hague: Mouton.

Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, Mass. MIT Press.

Cobb, T. 1999. Breadth and depth of vocabulary acquisition with hands-on concordancing. Computer Assisted Language Learning 12, p. 345 – 360.

Ferraresi A., E. Zanchetta, S. Bernardini and M. Baroni 2008. Introducing and evaluating UKWaC, a very large Web-derived corpus of English. In Proc. 4th Web as Corpus Workshop. Morocco.

Frances N. and H. Ku

era, 1982. Frequency Analysis of English Usage, Houghton Mifflin, Boston

Gabrielatos, C. 2005. Corpora and language teaching: Just a fling, or wedding bells? TESL-EJ, 8 (4). pp. 1-37.

Johns, T. 1991. From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. In Johns & King (Eds.), Classroom concordancing. ELR Journal 4, 27–45. University of Birmingham.

Johns, T., Lee H. C. and Wang L. 2008. Integrating corpus-based CALL programs in teaching English through children's literature. Computer Assisted Language Learning, 21:5, 483 — 506

Kilgarriff, A. 2007. Googleology is bad science. Computational Linguistics 33 (1): 147-151.

Kilgarriff A. and G. Grefenstette 2003. Introduction to the Special Issue on Web as Corpus. Computational Linguistics 29 (3).

Kilgarriff A, P Rychly, P. Smrz and D. Tugwell The Sketch Engine  Proc. Euralex. Lorient, France, July: 105-116.

Kilgarriff A., M. Husák, K. McAdam, M. Rundell and P. Rychlý 2008. GDEX: Automatically finding good dictionary examples in a corpus. Proc EURALEX, Barcelona, Spain.

Kučera H. and N. Francis 1967 Computational Analysis of Present-Day American English. Brown University Press.

Lewis, M. 1993. The Lexical Approach: the state of ELT and the way forward. Language Teaching Publications.

McCarthy, M. 1990. Vocabulary. Oxford University Press

Nation, P. 2001. Learning vocabulary in another language. Cambridge University Press.

Quirk R., S Greebaum, G. Leech and J. Svartvik 1972. A Grammar of Contemporary English. Longman.

Sinclair, J. 1997. Corpus Concordance Collocation. Oxford University Press. 

Sinclair, J. 2003. Reading Concordances. Longman/Pearson, London/New York.

Smith, S. 2009. Learner construction of corpora for General English. In draft.

Sun Y. C. 2007. Learner Perceptions of a Concordancing Tool for Academic Writing. Computer Assisted Language Learning Vol. 20 (4), 323 – 343.

Sun, Y. C., & Wang, L. Y. 2003. Concordancers in the EFL classroom: Cognitive approaches and collocation difficulty. Computer Assisted Language Learning, 16(1), 83 – 94.

Thorndike, E. and I. Lorge 1944. The Teacher’s WordBook of 30,000 words. Teachers College, Columbia University

West, M. 1953. A General Service List of English Words, Longman.

1 See for a summary.



4 Some brief and stimulating examples of Johns’ style of work can be seen at . Sadly, Tim Johns died in 2009.

5 Word sketches are one of the functions in the Sketch Engine (Kilgarriff et al 2004), a leading corpus query tool in use at a number of dictionary publishers and universities worldwide. A free trial is available at .

6 Available at

Download 168.28 Kb.

Share with your friends:
1   2   3

The database is protected by copyright © 2020
send message

    Main page