This is a list of resources available between the various departments. Check also Doug's old CL/MT pages: here. For a list of resources available on the Web, check Chris Manning's Stat NLP pages.


Many of these corpora are installed in a University-accessible server under /ufs/corpora (\\corpora\corpora from Windows machines). For details about accessing, ask Doug.

  • al-Hayat corpus (CS, ask Abdul Goweder)
  • British National Corpus (LAL, ask Doug )
  • Brown Corpus (CS, ask Massimo)
  • Essex Arabic Summaries Corpus (CS, ask Mahmoud)
  • GNOME corpus (CS; ask Massimo)
  • ICAME CD (contains the Brown corpus, LOB, London-Lund, and a few other corpora) (CS, ask Massimo)
  • LOB (CS, LAL; ask Doug or Massimo)
  • London-Lund (CS, LAL; ask Doug or Massimo)
  • MUC6, MUC7 (CS; ask Massimo)
  • Reuters (CS, ask Udo)
  • Switchboard (CS, ask Massimo)
  • TREC (CS; ask Massimo)
  • Verbmobil (CS, ask Massimo)
  • Arabic Human Rights Corpus (CS, ask Ayman) download

Lexical Resources

  • Concise Medical Dictionary (OUP)
  • New Oxford Thesaurus of English (OUP)
  • Oxford English Dictionary (OUP)
  • Oxford Spanish Dictionary (OUP)
  • Pocket Oxford Italian Dictionary (OUP)
  • WordNet (CS, LAL; ask Massimo; accessible from machines in the CS Labs)

(The items marked 'OUP' are available for research purposes under a three-year licence from OUP to Massimo - ask.)


  • Connexor Machinese Syntax (CS, installed in the Labs)
  • GATE (CS, installed in the Labs) - a multi-purpose NL tool.
  • LT-XML tools from LTG, Edinburgh (CS, ask Massimo) - POS tagger, chunker, and tokenizers
  • QTAG (CS, installed in the Labs) - a POS tagger from Birmingham
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License