Because of the growing availability of large amounts of natural language data in electronic format (in the form of corpora and the Web), computational methods are playing an increasing role in linguistic research, while, at the same time, Natural Language Engineering (NLE) techniques are becoming more widespread in areas such as data mining or web search.
As the problems tackled in the scientific study of language and by developers of Natural Language Engineering applications are often the same, the interaction between researchers using computational methods to study language and researchers interested in NLE applications is likely to be beneficial to both. The University of Essex has a long tradition of research in the area of Language and Computation (which has a variety of names, including computational linguistics, or natural language processing), starting with work on Machine Translation and Parsing in the 1980s, as well as work on formal semantics (e.g., Property Theory).
The Language and Computation group is an interdisciplinary group created to foster such interaction between researchers within the University, and includes staff and students from the School of Computer Science and Electronic Engineering, the Department of Language and Linguistics, and the UK Data Archive.
The research by the members of the Language and Computation group covers most areas of Computational Linguistics and Natural Language Engineering.
- Grammars and Parsing:
- research focuses on Constraint-Based grammars formalisms (Arnold, Borsley, Sadler, Spencer) and statistical and symbolic parsing (Arnold, Borsley)
- Semantics and semantic interpretation:
- our research in this area includes work on the formal semantics of language, and its logical foundations (Fox), on psychologically motivated computational models of semantic processing (Poesio), on vector-based models of lexical and text representation, used, e.g., in anaphora resolution (Poesio).
- Applications:
- some of the areas of interest include generation, e.g., for dynamic web page generation (Poesio), information retrieval and web search (Kruschwitz, Robinson), machine translation (Arnold, Sadler), spoken dialogue systems (Kruschwitz, Poesio).
Current Projects
- ESRC Large Centre on Human Rights in the Era of Big Data (2015-2020). (McGregor, PI. Fox, Kruschwitz, Poesio). The objective of the Centre is to investigate the potential risks and advantages offered by Big Data technology in the context of Human Rights.
- Using NLP to Support the Generation of Human Rights Violations Reports (2014-2017). (Poesio (PI), Alhelbawi, Kruschwitz). A KTP Project with Minority Rights Group (MRG). The objective of the project is to use NLP methods to collect, classify and filter reports of human rights violations reported via social media or SMS to support the reporting activities of MRG.
- SENSEI (2013-2016). (http://www.sensei-conversation.eu/) (Poesio (PI), Kabadjov, Kruschwitz). A EU project on using discourse analysis tools in support of the summarization of conversations - both spoken and online. As part of the project we co-organized the 2015 MULTILING Shared Task on Summarizing Online Forums (http://multiling.iit.demokritos.gr/pages/view/1531/task-onforums-data-and-information)
Past Projects
- Automatic Adaptation of Knowledge Structures for Assisted Information Seeking (AutoAdapt), a project involving the Department of Computing and Electronic Systems, Robert Gordon University Aberdeen and the Open University*. This project aims to develop and evaluate methods for adapting automatically constructed domain models to the population of users' search or browsing behaviour. Application and large-scale evaluation of the developed methods in two information seeking scenarios - namely, interactive search and browsing - will be performed on a number of domains.
- Development of an intelligent mail server, a project involving the Department of Computing and Electronic Systems and Active Web Solutions (AWS), an Ipswich based company devoted to cutting edge enterprise information systems.
- Creating anaphorically annotated resources through semantic wikis (AnaWiki) (EPSRC)
- Learning Disabilities Data and Information Infrastructure Project (ESRC)
- Markup-Based Knowledge Extraction Project (EPSRC)
- Anaphora Resolution and Underspecification (EPSRC)
We welcome PhD applications in any area of research into language and language processing, including computational linguistics, natural language processing, information retrieval and formal semantics. Our research page gives an indication of some of the topics that we currently work on. Feel free to contact us if you have any questions about PhD study, potential topics, or short term research visits.
Both Computer Science & Electronic Engineering and Linguistics offer opportunities for studies in Language and Computation:
- the School of Computer Science and Electronic Engineering offers modules in natural language engineering and projects in NLP/IR areas
- the Department of Languages and Linguistics offers an MA in Computational Linguistics for Linguistics students
Students following these courses are allowed to take modules offered by both Departments. The Language and Computation group also runs joint research seminars attended by both faculty and students in which recent work is discussed. These courses would also give students an excellent background for subsequent PhD studies, particularly in areas such as Constraint-Based Grammars, formal and corpus-based semantics, and the acquisition of lexical and commonsense information.
This is a list of resources available between the various departments. Check also Doug's old CL/MT pages: here. For a list of resources available on the Web, check Chris Manning's Stat NLP pages.
Corpora
Many of these corpora are installed in a University-accessible server under /ufs/corpora (\\corpora\corpora from Windows machines). For details about accessing, ask Doug.
- al-Hayat corpus (CS, ask Abdul Goweder)
- British National Corpus (LAL, ask Doug )
- Brown Corpus (CS, ask Massimo)
- Essex Arabic Summaries Corpus (CS, ask Mahmoud)
- GNOME corpus (CS; ask Massimo)
- ICAME CD (contains the Brown corpus, LOB, London-Lund, and a few other corpora) (CS, ask Massimo)
- LOB (CS, LAL; ask Doug or Massimo)
- London-Lund (CS, LAL; ask Doug or Massimo)
- MUC6, MUC7 (CS; ask Massimo)
- Reuters (CS, ask Udo)
- Switchboard (CS, ask Massimo)
- TREC (CS; ask Massimo)
- Verbmobil (CS, ask Massimo)
- Arabic Human Rights Corpus (CS, ask Ayman) download
Lexical Resources
- Concise Medical Dictionary (OUP)
- New Oxford Thesaurus of English (OUP)
- Oxford English Dictionary (OUP)
- Oxford Spanish Dictionary (OUP)
- Pocket Oxford Italian Dictionary (OUP)
- WordNet (CS, LAL; ask Massimo; accessible from machines in the CS Labs)
(The items marked 'OUP' are available for research purposes under a three-year licence from OUP to Massimo - ask.)
Software
- Connexor Machinese Syntax (CS, installed in the Labs)
- GATE (CS, installed in the Labs) - a multi-purpose NL tool.
- LT-XML tools from LTG, Edinburgh (CS, ask Massimo) - POS tagger, chunker, and tokenizers
- QTAG (CS, installed in the Labs) - a POS tagger from Birmingham
Language and Computation Day
A good way to learn about the activities of the group is to attend the annual Language and Computation Day, in which we all present the research done in the past year. This year's Language and Computation Day (2011), takes place on 7th October 2011. Last year's event took place on Friday, 8th October 2010.
The Flatlands meetings
Once a year we meet with the members of he other Computational Linguistics groups of the area - Cambridge, the Open University, and Oxford - for what has become known as the Flatlands meetings. These are workshops where our PhD students can present their work. This year's event was held at the University of Essex.
The Language and Computation Reading Seminar
The Language and Computation Reading Seminar meets weekly from October each year, breaking in for the summer holidays following the University of Essex's timetable.
Information about upcoming meetings is posted on this wiki and circulated via the LACWORKSHOP mailing list, that also reaches all members of the group.
If you are a University of Essex student or member of staff, you can subscribe via the university's mailing lists page.
Past Seminars
2014/152013/14
2012/13
2011/12
- 2010/11
- We went through part of Koller's book on Probabilistic graphical Models and Bayesian Methods
- 2009/10
- R workshop
- 2008/09
- We went through various papers in ACL 2008, and members presented details of their current research.
- 2007/08
- We went through Nugues' book on NLP using Prolog and Perl.
- 2005/06
- Empirical Methods in NLP.
- 2004/05
- Language and Computation Seminar: Underspecification and Incrementality in Semantic Processing.
- 2003/04
- The Acquisition of Lexical and Ontological Knowledge.
Language and Computation Speakers in Computer Science and Linguistics
The seminar series in both Computing Science & Electronic Engineering and Language & Linguistics include a number of seminars of interest for L&C types.
The Language and Computation group includes researchers working on language using computational methods from Computer Science & Electronic Engineering, the Data Archive, and Language & Linguistics.
To get in touch with the people below, add @ and essex.ac.uk to the listed email address (given in parenthesis).
Computer Science and Electronic Engineering
(Language, Logic and Information/Natural Language Engineering Group)
- Alghamdi, Ans (adalgh)
- information extraction, active learning
- AlHelbawi, Ayman
- entity disambiguation, Arabic language processing
- Chamberlain, Jon (jchamb)
- games development, user interfaces, anaphora resolution
- Fox, Chris (foxcj)
- formal semantics, property theory, plurals & mass terms, anaphora, underspecification, intensional representations, imperatives, deontic reasoning, philosophy of language
- Garcia, Alba (alba.garcia)
- computer vision for biomedical imaging, image retrieval and evaluation
- Kruschwitz, Udo (udo)
- intelligent web search, information retrieval, dialogue, concept hierarchies, markup
- Martinez-Alvarez, Miguel (mmartid)
- adaptive search, topic classification, user profiling, information extraction
- Poesio, Massimo (poesio)
- anaphora and anaphora resolution; ambiguity and underspecification; conceptual models combining neural evidence and evidence from corpora; lexical acquisition and distributional models; deception detection; Arabic NLP; the semantics of dialogue and spoken dialogue systems, and applications of the latter e.g., in Intelligent buildings; semantics and semantic processing; computational psycholinguistics
- Scherp, Ansgar (ansgar.scherp)
- work on NLP and graph analytics with neural networks, IR methods, and the Semantic Web
- Sutcliffe, Richard (rsutcl)
- crosslingual IR, question-answering
- Villavicencio, Aline (avill)
- lexical semantics, multilinguality, and cognitively motivated NLP
Data Archive
- Balkan, Lorna (balka)
- thesauri, machine translation
- Chatsiou, Kakia
- Lungley, Deirdre (dmlung)
- intelligent web search, information retrieval, topic classification, sentiment analysis, named entity extraction, biomedical NLP, lattices, formal concept analysis, concept hierarchies
Language and Linguistics
- Arnold, Doug (doug)
- constraint-based grammar, statistical parsing, semantics, machine translation
- Borsley, Bob (rborsley)
- constraint-based grammar, Welsh linguistics
- Eisenbeiss, Sonja (seisen)
- language acquisition, language and cognition, morphology and the mental lexicon, corpus linguistics
- Sadler, Louisa (louisa)
- lexicalist syntactic theories (principally LFG and HPSG), Welsh syntax, argument structure and the syntax-lexical semantics interface, computational linguistics, machine translation
Alumni
- Al-Bakour, Hala
- intelligent web search
- Al-Bakour, M-Dyaa (malbak)
- natural language engineering, knowledge acquisition
- Alarfaj, Fawaz
- Searching entities at web scale
- Almuhareb, Abdulrahman
- the acquisition of lexical and ontological knowledge
- Althobaiti, Maha (mjaltha)
- Arabic language processing, named entity extraction, semi-supervised learning, distant learning
- Andrikou, Elina
- spell checking
- Bailey, Carolina (cmbail)
- user profiles, adaptation
- Češka, Zdeněk
- plagiarism detection
- Chatsiou, Kakia (achats)
- LFG, computational grammars, Greek linguistics
- El-Haj, Mahmoud (melhaj)
- Arabic multi-document summarisation
- Flouraki, Maria
- semantics, tense
- Ghotsoulia, Voula (vghotsoulia at yahoo.co.uk)
- shallow and deep methods in language processing; semantic role labelling
- Glover, Kevin
- animacy in anaphora resolution
- Goweder, Abduelbaset
- Arabic NLP, morphology, stemming, information retrieval
- Kabadjov, Mijail
- coreference, summarization, semantic processing in social media
- Kawata, Yasuhiro
- part of speech tagging
- Linardaki, Evita
- statistical parsing, data-oriented parsing
- Sanchez-Graillet, Olivia (osanch)
- text mining, acquisition of causal knowledge, Bayesian Nets
- Urgelles, Miriam (murgel)
- HPSG
Former Members
- Al-Haddad, Mohammed
- Artstein,Ron
- anaphora; reliability statistics for corpus annotation; formal semantics and semantics-prosody interaction; compositional semantics below the word level; focus; coordination; temporal quantification
- Reynolds, Jeff
- Machine learning, statistical machine translation, speech