
Siddharth Sabata
Machine Learning Engineer
I'm a master's student at Carnegie Mellon University developing ML solutions for understanding cancer.
Skills
Projects

codedeck
I built CodeDeck to make LeetCode practice smarter—like Anki cards, but for coding problems. It tracks my attempts, lets me log insights, and keeps everything versioned with Git. Built with Next.js, Prisma, and Tailwind, it's my personal interview prep companion.

Mase-phi HPC
I took an early-stage prototype from my lab and turned it into a fully automated, modular pipeline for selecting genetic markers from multi-region cancer sequencing data. I set up a robust HPC workflow with Slurm, standardized and refactored the codebase, and made the system flexible for future needs. It was a great experience in transforming innovative research ideas into reliable, production-ready software that helps advance personalized cancer monitoring.

Population-Specific Multiomics Graph Analysis of ACE Protein Expression
I built a graph-based multiomics pipeline to pinpoint genetic variants that shape ACE protein expression in different populations. Using PyTorch Geometric, I put together GPU-friendly graphs and mapped out regulatory relationships. The project won "Most Innovative Project!"

Medical Reasoning with Distilled Models
We tackled all sorts of technical challenges—fine-tuning a huge LLM (DeepSeek-R1-Distill-Llama-8B), running big jobs on HPC (Slurm), and building an automated evaluation pipeline from scratch. The coolest part? Our model showed super interesting pass@k gains on medical benchmarks—proving it could search out the right answer even if it struggled with strict ranking. Working on this was super fun and gave me a front-row seat to how language models learn from both RL and fine-tuning in real clinical settings.
Education
Carnegie Mellon University
Master of Science, Quantitative Biology and Bioinformatics
University of California, Santa Barbara
Bachelor of Science, Statistics and Data Science