This is a list of resources available between the various departments. Check also Doug's old CL/MT pages: here. For a list of resources available on the Web, check Chris Manning's Stat NLP pages.
Many of these corpora are installed in a University-accessible server under /ufs/corpora (\\corpora\corpora from Windows machines). For details about accessing, ask Doug.
- al-Hayat corpus (CS, ask Abdul Goweder)
- British National Corpus (LAL, ask Doug )
- Brown Corpus (CS, ask Massimo)
- Essex Arabic Summaries Corpus (CS, ask Mahmoud)
- GNOME corpus (CS; ask Massimo)
- ICAME CD (contains the Brown corpus, LOB, London-Lund, and a few other corpora) (CS, ask Massimo)
- LOB (CS, LAL; ask Doug or Massimo)
- London-Lund (CS, LAL; ask Doug or Massimo)
- MUC6, MUC7 (CS; ask Massimo)
- Reuters (CS, ask Udo)
- Switchboard (CS, ask Massimo)
- TREC (CS; ask Massimo)
- Verbmobil (CS, ask Massimo)
- Arabic Human Rights Corpus (CS, ask Ayman) download
Lexical Resources
- Concise Medical Dictionary (OUP)
- New Oxford Thesaurus of English (OUP)
- Oxford English Dictionary (OUP)
- Oxford Spanish Dictionary (OUP)
- Pocket Oxford Italian Dictionary (OUP)
- WordNet (CS, LAL; ask Massimo; accessible from machines in the CS Labs)
(The items marked 'OUP' are available for research purposes under a three-year licence from OUP to Massimo - ask.)
- Connexor Machinese Syntax (CS, installed in the Labs)
- GATE (CS, installed in the Labs) - a multi-purpose NL tool.
- LT-XML tools from LTG, Edinburgh (CS, ask Massimo) - POS tagger, chunker, and tokenizers
- QTAG (CS, installed in the Labs) - a POS tagger from Birmingham
page revision: 8, last edited: 11 Jul 2017 15:00