Brown University (2011), Ph.D.
Brown University (2008), M.A.
Amherst College (2004), B.A.
I am interested in computational problems involving big data. Computer science is an incredibly broad and dynamic field that is becoming increasingly interdisciplinary in nature. My publications range from parallelizing large matrix computations to developing efficient algorithms for analyzing genomic data. But always, the common question in my research has been how to store and analyze massive data sets efficiently.
Before arriving at Amherst, I worked as a software consultant in Boston, collaborating with Fortune 100 clients on big data enterprise computing problems. It was during that period that I truly appreciated that working with big data is perhaps the most compelling challenge to the development of modern computing systems.
The tremendous advances in computational power we have witnessed over the past few decades owe primarily to strong and sustained growth in processing power, often referred to as Moore’s Law. Historically, as processors have gotten faster, computer applications have sped up proportionally. Today, however, we are at a tipping point. Scientists and corporations are collecting and storing vast amounts of data at an ever-increasing rate. As a result, the performance characteristics of modern applications are changing; big data applications are no longer dominated by processing time, but instead by the time needed to read data from storage into main memory and the time needed for inter-processor communication. As such, many canonical algorithms that were developed before this shift are no longer sufficient.
My current focus is on leveraging open-source big data platforms, such as Apache Hadoop and Apache Spark, for scientific computing applications. Although these platforms were designed primarily to address enterprise computing requirements, there are several interesting scientific problems that admit efficient solutions using them. I am examining canonical serial scientific algorithms, such as genome sequence alignment algorithms, and reformulating them to run in parallel on larger data sets utilizing a compute cluster or cloud. In so doing, we could enable the analysis of genomic data sets that are orders of magnitude greater than those that can be analyzed today, yielding new insights previously unattainable.
I enjoy teaching courses that emphasize and build upon computer science fundamentals. In my Big Data course, we discuss the design of various computational platforms and their use of data structures and algorithms. In this conversation, we explore the tradeoffs between memory usage, compute time, communication costs, and read access time for big data applications. In my course on the Principles of Database Design we ground our consideration in an understanding of relational algebra before exploring the data structures and algorithms that database management systems employ in order to achieve transactional integrity and efficiency. My seminar on Computational Biology also cultivates an appreciation for efficient data structures and algorithms, this time in the context of solving computational problems that arise from the analysis of genomic data. Finally, I am passionate about and really enjoy teaching the first semester in the Introduction to Computer Science series, where students have their first experience solving problems using computers, which is often a transformative intellectual moment.
Awards and Honors
National Science Foundation Graduate Research Fellowship, 2005-2008
Fulbright Scholarship, Italy, 2004-2005
Phi Beta Kappa, Amherst College, 2004
B. A., magna cum laude, with honors, Amherst College, 2004
Amherst College Computer Science Award, 2004
Scholarly and Professional Activities
Five-College Affiliate, UMass Center for Data Science