Assistant Professor of Computer Science Matteo Riondato works at the intersection of computer science and statistics, in an area known today as data science. Specifically, he figures out how to extract the best (i.e., most accurate) information from enormous data sets. This burgeoning specialty has implications for cancer research, virology and more. This fall, the National Science Foundation awarded him a new, $370,000 grant for research and course development.
How has modern data collection changed the scientific process?
It’s become easier and cheaper to collect data in essentially all disciplines, including in cell biology and the health sciences, like cancer research and genomics. Having a lot of data allows scientists to test more hypotheses and more complex hypotheses. For example, in cancer research, it doesn’t make sense to test the impact of just one specific mutation. Rather, you want to test billions of combinations of mutations. The combination of large data and a growing number of ever-more complex hypotheses requires efficient computational methods to perform these tests.
Is that where you come in?
Science has gone through a period of self-reflection as published studies have turned out not to be reproducible, because of errors in the data analysis. For example, there were papers on the coronavirus that got retracted because of data manipulations; they were essentially published on fake data. I develop computational methods to address these issues. Computer science is a young discipline, but it is now mature enough that it can be helpful to other sciences.
How new is this area of computer science?
You would not have found a computer scientist working in this area 10 years ago. Computer science has always been about working with data, but the idea that data analyses should be statistically sound is quite novel, for computer scientists. There are maybe three groups in the world that do this kind of research.
Are other disciplines receptive to help?
Yes. We have seen the creation of fields such as computational biology and computational chemistry. There’s an emerging anticipation that all sciences and almost any discipline will have a computational component in the future. This is an amazing time to be a computer scientist, because everybody’s using a lot of data. The only way to process it is with computers. This is our moment to help others and be of service to others.
What will the NSF grant mean for students?
Part of the project is for developing educational material, including courses. My Spring 2021 data mining class will teach the basics of robust data analysis to obtain trustworthy results. I will have funding for three research assistants every semester for three years, including the summers. So the grant will allow me to really expand the research opportunities in computer science, and not just for majors. My hope is that more students in other disciplines will become aware of the statistical and computational challenges that exist in doing science.
What’s it like to build those cross-disciplinary relationships at Amherst?
It's great to do this kind of interdisciplinary work at a liberal arts college that welcomes the idea of doing something that benefits the whole of society and the world. My research and learning group, called the Amherst College Data Mammoths, already has a fruitful collaboration with Professor Kate Follette in the physics and astronomy department, and I hope to establish more such relationships. In a time when science is under attack, I like the idea of giving weapons to scientists to do their jobs better.