LAC Day Talk by Annie Louis
Date: 29th September, 2015
When we read articles, we spontaneously make judgements about whether it is well-written or not, boring or interesting, too dense or not contentful enough. The goal of text quality prediction is to enable automatic systems to make similar predictions about the quality of texts. The capacity to make such predictions has great potential for article recommendation, educational assessment and improving text generation systems. Computational work on this topic was previously limited to spelling, grammar and organization quality detection where models operate on words, single sentences or a pair of adjacent sentences. My interests lie in predicting text quality aspects which require discourse or document-level understanding and modeling of text properties. In this talk, I will present some of my work along these lines.
First, I will introduce a model for assessing if a science journalism article will be perceived as interesting or not. We have created a corpus of science journalism articles categorized for interest value. I will describe how we developed metrics related to visual nature, story-telling format, beautiful and surprising language use and study how these measures are related to and indicative of the quality categories in our corpus. In the second part of the talk, I will present an approach for measuring text verbosity. Here we propose an approximate model which captures the relationship between an article's length and what type of content is appropriate for an article of that length. Using this model’s behaviour on test articles, we identify whether an article is verbose or on the other hand laconic.
Annie Louis is currently a Research Associate in the Institute for Language, Cognition and Computation at the University of Edinburgh. Previously she spent two years as a Newton International Fellow at Edinburgh. In January 2013, she completed her PhD at the University of Pennsylvania. Annie is joining the University of Essex as a lecturer in January 2016.