The Oxford Children’s Corpus: Lessons for Learning to Read
This is a new project, funded by The Leverhulme Trust. The scientific study of reading has taught us much about the early stages of learning to read. Critically however, little is known about how children develop from novice-to-expert: how do children move from the laborious process of “sounding-out” words to fluent and apparently effortless reading later on? We will take a novel approach to exploring this question by combining corpus-based analyses with empirical studies of children’s reading behaviour. Specifically, we will investigate when and how often children encounter words, and in what types of contexts, to reveal how different experiences with words drive the development of reading.
In collaboration with Professor Stephen Pulman (Department of Computer Science) and Professor Victoria Murphy (Department of Education) we are exploring the nature and contents of children's reading materials and relating this to how children learn to read words. We are working closely with the the Children's Dictionaries Department at Oxford University Press. Our project is funded by the Leverhulme Trust.
The project will combine two established methodologies in a novel and original way to allow us to quantify the richness of reading experience, and help us understand how experience shapes learning. The first methodology is corpus linguistics – a computational approach that analyses very large datasets of natural language, looking for patterns, distributions and statistical regularities which can tell us about how language works. We will develop powerful computer algorithms to interrogate a large corpus of children’s written language; a corpus is a “word bank” – an electronic record of natural language samples. The Oxford Children’s Corpus is a dynamic and growing corpus, currently comprising more than185-million words written by and for children. To our knowledge, it is a world-first, being the only large corpus of children’s written language. By computing various statistics across this vast and complex database, informed by psychological and linguistic theory (e.g., the number of words and number of times they have been repeated, co-occurrence between different words, nature of the context in which particular words or phrases appear), we will gain a unique insight into what children are likely to have been exposed to during their reading experiences. Our second methodology stems from psycholinguistics – the study of the psychological or cognitive processes that underpin our capacity for language. We will measure how well children read and understand sets of words, and relate this to specific properties of those words, as identified in our corpus analyses. This will allow us to make connections between children’s experience with written language, what they learn from their experience and how this relates to reading development.
Read about our new project on the Leverhulme Trust's website.