Karen Spärck Jones (1935-2007)
Professor (emeritus) of Computers and Information University of Cambridge.
Karen Spärck Jones contributed significantly to two separate fields (Information Retrieval (IR) and Natural Language Processing (NLP)) and in her later years was concerned with their relationship within general schemes of representation in AI. She died on 4th April after the return of a cancer and was working until a week before her death. Her major and lasting contributions will almost certainly be her original PhD thesis and the inverse document frequency (idf) measure of the relevance of terms (1972): the notion that a document is relevant not only because key terms are frequent in it, but because those terms are not frequent in other, non-relevant, documents, a notion that is now part of the basics of IR.
Before going to the Computing Laboratory in 1968, she wrote her thesis “Synonymy and Semantic Classification” (1964) at the Cambridge Language Research Unit (CLRU), run by Margaret Masterman, and under the supervision of Masterman’s husband, the philosopher Richard Braithwaite. This work was far ahead of its time (see Wilks and Tait, 2005) but was not published until twenty years later in the Edinburgh University AI series (1986), and she had to be persuaded then that it was still relevant. It was in fact the first application of statistical clustering methods to lexical data—in her case the whole of Roget’s Thesaurus on punched cards—and was an ambitious attempt to create some notion of primitive concepts for machine translation on an empirical basis, and it can now be seen as the ancestor of a range of empirical semantics research, from the semi-synonymous rows of terms (synsets) in WordNet to much later work on statistical clustering to determine semantic relationships. The historian in her produced an extraordinary appendix to the thesis on artificial languages for coding meaning. The algorithms she used were those of the Theory of Clumps, the same ones as had been developed and used by her husband Roger Needham in his own thesis work on automatic classification, and the ones she used when she moved to the University Computer Laboratory to begin work on Information Retrieval (IR), since its then Director would not allow work explicitly on AI or NLP, although IR he deemed respectable and scientific.
Karen was born in 1935 in Huddersfield, Yorkshire, of English and Norwegian parents. She studied history at Cambridge, but moved to philosophy (then called Moral Sciences) in her last year, so that when, after a brief spell school teaching, she accepted Margaret Masterman’s invitation to join the Cambridge Language Research Unit, it was a philosophy doctorate she started. Masterman (2005) remained a major inspiration to her and is, along with Roger Needham, the person thanked at the end of Karen’s acceptance speech on receiving the ACL Lifetime Achievement Award (2005); this last remains the best overview of the many interleaved themes in her work.
Her first published conference paper (1958), with Masterman and Needham, is called “ The analogy between mechanical translation and library retrieval”, a title of great prescience in her career. At that time it referred to the use of thesauri to resolve meaning problems in the two technologies, but it was a link that preoccupied her all her life and to which she returned with her “Information Retrieval and Artificial Intelligence” (1999) where she argued that AI in general, and NLP in particular, should make more use of the statistical methodology of IR.
In 1968 the need for more serious computer facilities took her out of CLRU and into the University Computing Laboratory, by which time she had been a 3-year Research Fellow of Newnham College and then a Royal Society Fellow with which she began her new career in IR, a subject on which she became a world authority. Eventually, Needham became Director of the Laboratory and she was able to revisit her early interests in NLP, taking on students and producing major work in language front-ends to data bases, automatic summarisation, content retrieval from video, evaluation methods, and belief revision.
Academic promotion was slow in coming: most of her career was as an Assistant Director of Research on soft money and it was only in 1999 that she was awarded a personal professorship. Meanwhile she had taken on a wider role: she managed much of the Alvey Research Programme (from 1985) in the UK, she was an outstanding President of the ACL (1994), took leading roles in the US DARPA/NIST evaluation projects (1992), and later was on the Advisory Committee for the DARPA TIDES Program. She gained many later honours, some of which she did not live to receive (though she has recorded acceptance speeches): Fellowships of the American and European AI Societies, the Fellowship of the British Academy, the ACL Lifetime Achievement ward, the Lovelace Medal of the British Computer Society, the SIGIR Salton Award, the American Society for Information Science and Technology’s Award of Merit, and the ACM-AAAI Allen Newell Award.
In retirement she was as active as ever, returning again to issues of representation, to her early interest in semantic primitives (including the last publication on her website, 2007) but always tempered by her powerful slogan “Words stand only for themselves”. She remained finely balanced on the issue of whether or not NLP can help IR, conscious that most claimed non-statistical advantage can be reproduced later by statistical means. And yet, she wanted NLP to matter: although she had attributed statistical influence on AI to IR, she knew well that it was above all Jelinek’s Machine Translation research at IBM that had driven NLP to take up statistical methods, but she remained skeptical that the tasks of machine understanding could all been seen as “recovery processes”, in the way the answer recovers the question, the document recovers the original query, and the transcription recovers the speech signal. She asked (2005), can we really see machine translation of Shakespeare into Spanish as recovering his hidden Spanish within the English!? She produced a stimulating late paper on how the Semantic Web movement faces up to these questions. She never forgot that Masterman had been a student of Wittgenstein, so she was therefore only one step away from him, and how close her slogan above was to his demand to look not for the meaning but the use.
She also campaigned hard for more women to enter computing, and was conscious that she, like Masterman before her, had a husband with a more powerful formal role; and we can now examine, in both cases, on which side the more creative achievements lay. She was, with Needham, an accomplished sailor and they built their house themselves; she made wonderful things from objets trouves.
Masterman, M., Needham, R. M. and K. Spärck Jones (1959) The analogy between mechanical translation and library retrieval, In Proc. International Conference on Scientific Information (1958), National Academy of Sciences - National Research Council, Washington, D.C.
Masterman, M. (2005) Language, Cohesion and Form, (Ed. Y. Wilks, with commentaries by Y. Wilks and K. Spärck Jones), Cambridge University Press: Cambridge.
Spärck Jones, K. (1964/1968) Synonymy and Semantic Classification, Ph.D. thesis, University of Cambridge, reprinted, Edinburgh University Press AI series (Eds. S. Michaelson and Y. Wilks): Edinburgh.
Spärck Jones, K. (1972) A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, 28.
Spärck Jones, K. (1999) Information retrieval and artificial intelligence, Artificial Intelligence, 114.
Spärck Jones, K. (2007) Semantic primitives: the tip of the iceberg, In Words and intelligence: Part II: Essays in Honour of Yorick Wilks, (Eds. K. Ahmad, C. Brewster and M. Stevenson), Springer: Berlin.
Wilks, Y. and J. Tait (2005) A Retrospective View of Synonymy and Semantic Classification, In
Charting a New Course: Natural Language Processing and Information Retrieval, Essays in Honour of Karen Spärck Jones, (Ed. J. Tait) Springer: Berlin.