My undergraduate degree at Stanford was in “Symbolic Systems,” exploring the connection between computer science, psychology, philosophy and linguistics. I then worked as a software engineer at a financial engineering company helping hedge funds or institutional investors find opportunities. I learned about web development and optimization algorithms by getting thrown into the fire. I became an engineer for an online advertising exchange where billions of transactions were happening per day, working as a first responder to any major issues in the exchange. When Yahoo acquired the company, I helped integrate our systems with Yahoo’s overall advertising resources before joining one of the clients of our exchange called Media Math, transitioning to more of a technical role within the client services department.
However, I wanted to do something with more of an impact on the world, and I realized I really enjoyed technology, computer programming and solving problems. There was a need for people to make sense of large-scale biological data. I volunteered as a research assistant with Christopher Mason to learn more. I was introduced to genomics, using sequencing to probe DNA, the genetic architecture of organisms, on a scale not possible before. I started in the graduate program at Yale and ultimately decided to pursue my PhD working with Dr. Mason at Cornell. Flexibility and support for students’ research goals was what drew me, plus the ability to leverage resources from three different institutions and work collaboratively with a large number of faculty members both in Ithaca and New York City. There aren’t rigid course requirements; you design your own course work with the help of the program administration. Some people like myself come in with a strong computational background, some come in with a strong biological background, so the courses we need are going to be different. The most important thing you can do is figure out what interests you. Right now I like the idea of working as a data scientist, analyzing biological date or health records, or information collected on the Internet; there are lots of opportunities to leverage the skills I am learning.
The first project I worked on was the sequencing of a Lemur from Madagascar. You couldn’t make comparisons by hand, you needed an infrastructure that can handle massive amounts of data and make sense of it. In the Mason Lab, we can begin to understand what a genome is supposed to look like and explore what happens when it mutates. So I learned about DNA sequencing, I learnt about assembly and I started thinking about what if we could enter into a world where all known organisms have assembled genomes.
What if we could identify an unknown sample of DNA sequencing data, leveraging the reference genome that we put together? You have to compare billions of nucleotides of DNA to say that a sample of DNA comes from a particular organism. I am working on a method that condenses that information so that when we do that comparison, it doesn’t take too much computational time in order to make a decision on whether or not a particular sample of DNA comes from one of these many organisms.
Another hot topic in biology right now is the human microbiome, the idea that we have more foreign organisms in our body than we do human cells, so we rely on these other organisms to live. If we can identify what’s a healthy microbiome versus the microbiome of a particular disease, then we can try to find ways to make the disease microbiome look more like the healthy one. It would allow us to take a sample from someone who has particularly healthy microbiome for instance and someone who is suffering from a particular disease and make a comparison of which organisms are present and how that informs us about the disease present.