Siddharth Sabata - Profile Picture

Siddharth Sabata

Machine Learning Engineer

I'm a master's student at Carnegie Mellon University developing ML solutions for understanding cancer.

Skills

PythonSQLGoRGitSlurmDockerGoogle Cloud PlatformPyTorchTransformersAcceleratescikit-learnpandasnumpymatplotlibseaborn

Projects

codedeck - LeetCode flashcard manager

codedeck

DockerNext.js
Role: Developer
Organization: Personal Project
Timeline: May 2025

I built CodeDeck to make LeetCode practice smarter—like Anki cards, but for coding problems. It tracks my attempts, lets me log insights, and keeps everything versioned with Git. Built with Next.js, Prisma, and Tailwind, it's my personal interview prep companion.

Mase-phi HPC - Phylogenetic tree visualization

Mase-phi HPC

SlurmNumPyGurobi
Role: Research Assistant
Organization: Schwartz Lab @ CMU
Timeline: September 2024 — Present

I took an early-stage prototype from my lab and turned it into a fully automated, modular pipeline for selecting genetic markers from multi-region cancer sequencing data. I set up a robust HPC workflow with Slurm, standardized and refactored the codebase, and made the system flexible for future needs. It was a great experience in transforming innovative research ideas into reliable, production-ready software that helps advance personalized cancer monitoring.

Multiomics Graph Analysis - DNA and network visualization

Population-Specific Multiomics Graph Analysis of ACE Protein Expression

PyTorch Geometric
Role: ML Engineer
Organization: ML & AI Approaches to Multimodal Problems in Computational Biology Hackathon
Timeline: March 2025

I built a graph-based multiomics pipeline to pinpoint genetic variants that shape ACE protein expression in different populations. Using PyTorch Geometric, I put together GPU-friendly graphs and mapped out regulatory relationships. The project won "Most Innovative Project!"

Medical Reasoning - AI brain and neural network visualization

Medical Reasoning with Distilled Models

SlurmTransformersAccelerate
Role: ML Engineer
Organization: Introduction to Deep Learning (CMU 11-785)
Timeline: January 2025 — May 2025

We tackled all sorts of technical challenges—fine-tuning a huge LLM (DeepSeek-R1-Distill-Llama-8B), running big jobs on HPC (Slurm), and building an automated evaluation pipeline from scratch. The coolest part? Our model showed super interesting pass@k gains on medical benchmarks—proving it could search out the right answer even if it struggled with strict ranking. Working on this was super fun and gave me a front-row seat to how language models learn from both RL and fine-tuning in real clinical settings.

Education

Carnegie Mellon University

Master of Science, Quantitative Biology and Bioinformatics

2025

University of California, Santa Barbara

Bachelor of Science, Statistics and Data Science

2024