A team of UA researchers is sifting through thousands of research papers to improve treatment for cancer patients, one algorithm at a time.
"They'll be the Microsofts and Googles of biomedicine," Morrison said.
Its potential has mass appeal and big implications: fast, individualized and precise biomedical care.
"The REACH project is applied to cancer biology, but we have an even bigger vision than that, although cancer biology is big enough," Morrison said.
If big data is a two-part challenge, Morrison said, then storing it and moving it around is the first part. The second part is understanding it.
REACH works on the understanding part in three phases: extraction, assembly and inference.
Extraction was put to the test this summer. Over the course of a year, researchers led by Mihai Surdeanu, associate professor in the School of Information and REACH's principal investigator, trained a computer system to read papers using hundreds of algorithms. One, for example, allows it to understand that "mouse," "mice" and "Mus musculus" all refer to the same thing.
Others on the UA research team include Ryan Gutenkunst, assistant professor of molecular and cellular biology; Guang Yao, assistant professor of molecular and cellular biology; and Kobus Barnard, professor of computer science.
Morrison, who also has a strong, academic background in developmental psychology, said, "I think that collaborative computers are going to be like children, and we'll have to raise them, in a way. They’ll be as smart as we’re able to teach them, and we need them to be able to communicate with us."
In the recent evaluation of this first phase of REACH, the system was able to process 1,000 papers on RAS-related cancers in a matter of hours, yielding results that exceeded state-of-the-art predecessors — all by relying on algorithms. Asking a human scientist to do the same would be outrageous.
Focusing their efforts on modeling how RAS functions in cancer cells was an easy choice, for a couple of reasons.
RAS proteins control the chemical pathways responsible for growth, migration and survival within a cell. Basically, they've got a big job. Secondly, RAS oncogenes are mutated in 33 percent of all human cancers, making them one of the most highly researched classes of oncogenes. And when you need thousands of papers on one subject, highly researched is important.
Now that the REACH system knows how to read, it needs context. Morrison is currently building that in, by teaching it to differentiate between species (a yeast cell is different from a mouse). As of now, REACH is already familiar with 30 different species affected by RAS-related cancers. It also will need to understand differences among cell types, organs and tissue types. This is all part of the project's assembly phase.
By the end of the four-year project, REACH should be able to make inferences. In other words, it will hypothesize much as a scientist or a doctor might.
"I would like to see this usher in computers understanding complex things at a level that we just can't," Morrison said.
"It's awesome. I can't tell you how excited and passionate I get that I'm able to take things I've developed and apply them to something that could potentially, directly improve peoples' lives."