Corpora in the classroom without scaring the students


Between dictionary and corpus



Download 168.28 Kb.
Page2/3
Date07.08.2017
Size168.28 Kb.
#28587
1   2   3

Between dictionary and corpus

Learners’ dictionaries are designed to help learners understand and use words and phrases. A corpus is another resource to help with the same task. How do they relate to each other?
They are both records of the language. The corpus is a sample of the language in the raw. The dictionary is a highly condensed version of roughly the same material. The relation between the two is easy to see when we consider how modern corpus-based dictionaries are prepared. One of the main inputs, at leading dictionary publishers including Collins, Macmillan and Oxford University Press, is word sketches: one-page corpus-based summaries of a word’s grammatical and collocational behaviour, as in Fig 2.5 Is this more corpus-like or dictionary-like? It is automatically-produced output from the corpus, making it corpus-like, but it is a condensed summary of what was found there, making it dictionary-like. On a continuum from corpus to dictionary, it is somewhere in the middle.
Most learners do not want to be corpus linguists, and concordances are unfamiliar and difficult objects. But dictionaries are familiar from an early age, sometimes even loved. Learners will not be put off if they are expected to look items up in a new kind of dictionary. This suggests a strategy for bringing corpora into the classroom: disguise them as dictionaries.
Dictionary-users often find the examples are the most useful part of a dictionary entry. Moreover, where dictionaries are electronic rather than on paper, the traditional space limitation on examples disappears: there is room for lots of examples. This is an area where the corpus can help: they are nothing but examples. However they are not selected or edited examples. Choosing examples for a dictionary is an advanced lexicographical skill: they should be short, use familiar words, without irrelevant grammatical complexity, and they should give a typical example of the word in use and provide a context which helps the learner understand what it means.
While we cannot yet program computers to do the task anything like as well as people, we can perform some parts of it automatically. We can rule out sentences which are too long, or too short, or which contain obscure words, or which have many words capitalised or lots of numbers or square brackets or other characters which are rare in the kind of simple, straightforward sentences we are looking for. We have done that in a program called GDEX (Good Dictionary Example finder, Kilgarriff et al 2008). The initial project was a collaboration with Macmillan, and the examples were used as a first-pass filter for them to add more examples to their dictionary. The machinery has been embedded into the Sketch Engine, and concordance lines can now be sorted according to GDEX score, so the ‘best’ ones are the ones that the user sees as their search hits.


space    ukWaC freq = 273022    






object_of

86021

2.2

watch

4113

9.21

confine

1200

8.65

occupy

1230

8.22

allocate

912

7.95

limit

1163

7.76

fill

1179

7.68

enclose

566

7.41

create

3038

7.39

save

1097

7.24

reserve

523

7.08

devote

381

6.8

breathe

344

6.79



pp_between-i

2685

10.0

paragraph

44

4.46

particle

22

3.64

row

24

3.5

column

25

3.2

word

143

3.1

tooth

17

3.09

building

83

2.22

seat

20

2.2

letter

39

2.1

star

20

2.07

wall

32

1.97

cell

29

1.73



pp_above-i

205

5.9

shop

29

1.73



pp_per-i

410

4.7

sq.m

16

9.55

dwelling

44

5.48

unit

18

0.57

person

26

0.57



pp_around-i

515

4.6

building

19

0.1



pp_within-i

983

4.2

building

63

1.83

city

28

1.26

area

58

0.18

centre

19

0.14



pp_down-i

76

4.2

left

17

2.75

right

29

0.51





pp_for-i

15742

4.1

wheelchair

109

6.31

fridge/freezer

40

6.13

freezer

49

5.93

fridge

52

5.62

cot

35

5.56

contemplation

27

5.43

reflection

91

5.42

recreation

47

5.4

luggage

35

5.16

update

108

5.01

dryer

24

4.91

storage

99

4.9



a_modifier

106533

2.7

open

10693

9.69

green

3787

9.18

outer

1732

8.75

public

5574

8.18

empty

1268

8.09

ample

970

8.05

enough

1564

7.78

limited

1110

7.54

extra

1325

7.46

urban

959

7.45

short

2051

7.37

much

1740

7.21



n_modifier

64957

2.6

parking

6052

10.16

disk

2762

9.43

storage

3084

9.32

exhibition

1689

8.08

office

2984

7.75

breathing

488

7.59

loft

426

7.57

gallery

872

7.5

roof

812

7.47

floor

1535

7.41

living

798

7.39

studio

755

7.37



adj_subject_of

8065

2.3

available

3224

6.63

adjacent

91

6.37

cramped

20

5.79

tight

70

5.76

limited

130

5.45

finite

17

4.83

scarce

16

4.8

empty

41

4.64

accessible

69

4.59

inadequate

20

4.41

suitable

91

4.41

efficient

44

4.08


Fig 2. Word sketch for space drawn from UKWaC (truncated)


While GDEX could be much better, it already makes it more likely that corpus examples will be readable.



  1. Download 168.28 Kb.

    Share with your friends:
1   2   3




The database is protected by copyright ©ininet.org 2024
send message

    Main page