My Projects #
TOSS Partitioner #
Tools - Python, AWS Athena, AWS Sagemaker, PostreSQL
- Built TOSS (Temporally-Oscillating Satellite Schedules), a tool designed to ingest satellite data and deliver a satellite schedule optimized with an objective function of minimizing the maximum capacity of any group on a satellite
- Utilized AWS Athena for easy authentication and data lake querying and PostgreSQL for easier time-series and geospatial analysis
- Implemented dual annealing algorithm on a complex space of time-series dataset to converge on optimal time
- Redesigned workflow with TOSS to streamline data ingestion, aggregation, schedule production, and evaluation
Lyft Price Prediction Model #
- Employed machine learning techniques to predict Lyft prices in Boston based on a variety of variables. Some of the variables used include:
- weather
- time of day
- surge multiplier
- geographic region of Boston
- Utilized Principal Component Analysis for dimensionality reduction and compared various models (kNN, Naïve Bayes, Linear Regression, Decision Trees, Neural Networks, Random Forests) to find the best predictive model for Lyft prices
CellWalkR Integration for Genomics Data #
Tools - Python, R, TensorFlow
Patch-sequencing, or patch-seq, is a method for characterizing neurons that combines multiple data modalities, or different types or forms of data. Generally speaking, it can improve cell labeling in noisy data and annotate cell type-specific regulatory elements in bulk data. I have been utilizing CellWalkR, an R package designed to help identify cell type-specific regulatory regions, to integrate patch-seq data with single-cell RNA-seq data to hopefully find similarities between the two.
The significance of my project comes from the development of the newly developed CellWalkR package. Neural tissues are one of most transcriptomically diverse cells, which means that single cell RNA-sequencing is both difficult and time consuming. Utilizing patch-sequencing and the newly developed CellWalkR algorithm, I am able to make computations in record time with an ease never seen before. My research can lead to many biological advancements, such as the development of precision medicine, where the similarities found between the patch-seq data and single-cell RNA-seq data can help identify potential targets for more effective treatment.
Covid Policy Analysis with Microsoft Azure #
This project was an exciting opportunity to put some of the data engineering skills learned in Data Mechanics (BU’s DS 310) to work and combine them with my existing knowledge and skills for data analysis. My team and I were able to gain more experience with several Microsoft Azure services including:
- Cosmos DB
- Data Factory for creating data pipelines to perform ETL
- Synapse Analytics
To create visualizations for various data sources, we used Microsoft Power BI which was installed and running on an Azure VM.
Capital One Coding For Good Hackathon Finalist #
On a team with three other undergraduate students from across the country, we developed Collegiate Coin, a competitive savings app for mobile devices targeting younger generations, specifically college-aged banking clients. Our mobile app was packed with features including:
- A Financial Literacy Chatbot
- Financial savings and spending visualizations and infographics
- Gamification towards positive financial practices of our users