Diversifying Digital Writing Archive to Include Spanish Heritage Speakers

May 29, 2019

Funded by a Digital Extension Grant from the American Council of Learned Societies, a UA-led project will diversify a digital writing archive and improve training in academic writing for Spanish heritage speakers.

UA English Professor Shelley Staples with students

UA English Associate Professor Shelley Staples meets with graduate students Aleksey Novikov and Adriana Picoral about the ACLS-funded project.

Photo by Anna Augustowska


Many variables influence how a college student learns how to write in their first-year English course: Is English their second language? What course materials did they receive? What genre are they writing in?

A cross-university team of researchers, including UA English Associate Professor Shelley Staples, has spent several years developing a Corpus and Repository of Writing, or Crow, to answer these and other questions.

Thanks to a new Digital Extension Grant of $150,000 from the American Council of Learned Societies, or ACLS, the team will expand their collection of texts from multilingual writers to include heritage Spanish speakers at the University of Arizona.

Staples is the principal investigator of the project “Expanding the Corpus and Repository of Writing: An Archive of Multilingual Writing in English.” Bradley Dilger, an associate professor of English at Purdue University, is co-principal investigator, and graduate and undergraduate students from both universities will contribute to the project.

Only five projects across the country were awarded an ACLS Digital Extension Grant.

“Our team is thrilled to be the first writing research project funded by ACLS,” Staples said.

According to the ACLS website, the Digital Extension Grant program aims to advance the digital transformation of humanities scholarship by extending the reach of existing digital projects to new communities of users and by adding diversity to the digital record. This program is made possible by a grant from The Andrew W. Mellon Foundation.

The funded project leveraged internal grants from the Office of Research, Discovery and Innovation and from the Social and Behavioral Sciences Research Institute.

“This is the UA’s first Digital Extension Grant since ACLS started the program in 2016,” said Interim Vice President for Research Kim Ogden. “Research at the University of Arizona is transformative. This project will use a model of mentoring to train underrepresented teachers and researchers.”

"Dr. Staples is an extraordinarily innovative scholar, as well as a gifted teacher, who brings profound expertise in corpus linguistics, second language writing, and global Englishes to the UA,” said Jane Zavisca, associate dean of research and graduate studies for the College of Social and Behavioral Sciences. “The ACLS-funded project will advance knowledge and improve training in academic writing for Spanish heritage speakers, an important and timely focus given the UA's recent designation as a Hispanic Serving Institution."

Facilitating data-driven learning

Staples, an applied linguist, is always looking for ways to improve teaching and learning of second languages, which is one reason she has been working on Crow since 2015 in collaboration with researchers from Purdue University and Michigan State.


Crow facilitates data-driven learning, Staples said. “It’s based on the idea that if you can see lots of examples of something, it will activate your learning. You will see patterns emerging from the data that can be used to inform your own use of language.”

Crow, which currently receives funding from the Humanities Without Walls Consortium, is the first web-based archive linking a corpus of English texts produced by undergraduate, multilingual writers with a repository of resources used to write those texts.

A corpus is a collection of texts, and, in this case, the texts are a collection of student assignments, from first to final draft, from first-year college writing courses. Right now the corpus is created primarily from Chinese students, because they are one of the largest international groups studying English. However, the end goal of the project is to help teach all students, not just international ones.

The repository refers to the pedagogical materials – such as the syllabi, lesson plans, and activity sheets – associated with the student writing assignment.

The researchers look at the relationships between the writing and the course materials. Staples calls this intertexuality, connections between texts. How is first-year writing being taught? How are the course materials impacting the student writing?

Expanding project to heritage Spanish speakers

The ACLS-funded project, which will run from August 2019 until December 2020, will diversify the Crow project by expanding the data collection to include heritage Spanish speakers, meaning students who learned Spanish in the home.

The first part of the project will be data collection at the UA, which will be coordinated by Aleksey Novikov, a graduate student in Second Language Acquisition and Teaching. With the students’ consent, teachers give the researcher access to the students’ writing and to their course materials. The student texts are “de-identified,” so that they contain no personal information.

Led by Adriana Picoral, a graduate student in Second Language Acquisition and Teaching, the second part of the project is to develop a machine-learning tool to speed up the coding and classification of the texts.

The third goal for the project is outreach. Staples and others in the Crow team will conduct training workshops to train underrepresented teachers and researchers on how to both use and add their own texts to the Crow platform. The outreach efforts will include teachers in high schools, community colleges, and Universidad de Sonora.

The team will also reach out to developers who are interested in replicating the Crow interface.

“While we're already publishing the Crow source code through GitHub, we recognize that providing code is only the first step towards inviting developers to use it,” Dilger said. “So we're grateful to have support which will allow us to engage with developers to ensure Crow's application program interface (API) is robust and well-documented.”

The ACLS-funded project supports the researchers’ goal of analyzing how writing is impacted by variations, including cultural background of the student, linguistic variations, and writing different genres (e.g., autobiography, academic proposal, research paper) since genres have different conventions and audiences. The researchers also look at how academic writing has changed over time and across disciplines.

“One of the goals of the project, in addition to creating tools for teachers, is to understand the writing itself,” Staples said.

Dilger also points to the broader applications of the technology.

“Crow currently focuses on first-year writing because there's a strong need to study it with data-driven methods,” Dilger said. “But many other types of texts could benefit from Crow's linkage of corpus and repository. Any context where writers approach a prompt or assignment, broadly speaking, could be engaged: grant applications and corresponding RFPs, or job applications and the position announcements. We'd love to see other teams use Crow to study the texts which interest them.”