Ph.D. Data Scientist and AI Researcher with 8+ years of combined academic and industry experience designing, building, and delivering machine learning systems across biomedical AI, scientific computing, high-performance computing, forecasting, and applied analytics. Demonstrated ability to translate complex research into reproducible, production-grade pipelines with measurable outcomes — including a clinical cancer detection model achieving AUC-ROC 0.950 over 277,000+ pathology image patches, an HPC-automated genomics pipeline processing 356 RNA-seq runs, and end-to-end ML delivery for seven SaaS clients with a 35% reduction in pipeline latency. Equally effective in data scientist, machine learning engineer, applied scientist, data analyst, and AI research roles where rigorous analysis, strong quantitative reasoning, and production discipline are required.
I build ML and data systems that are measurable, documented, and repeatable — not one-off experiments. My background spans the full pipeline from raw data ingestion and feature engineering through model training, evaluation, deployment, and monitoring. I bring strong statistical foundations (A/B testing, causal inference, Bayesian methods), hands-on experience with large-scale compute environments (SLURM, GPU clusters, AWS, Azure Databricks), and a research record that bridges academic rigor with engineering pragmatism. I am most effective on teams that value clear metrics, honest evaluation, and systems that actually work in production.
Languages: Python, SQL, PySpark, R, Rust, CUDA, C++, Java
ML / AI Frameworks: PyTorch, TensorFlow, Scikit-learn, XGBoost, LightGBM, Hugging Face Transformers, LangChain, SimCLR, FAISS, ChromaDB
Data & Cloud: AWS, Azure Databricks, GCP, Apache Spark, Airflow, MLflow, Docker, Git, PostgreSQL, MySQL, MongoDB, Neo4j
HPC & Systems: SLURM, MPI, GPU Clusters, HISAT2, featureCounts, Kallisto, bcftools
-
0.95AUC-ROC 0.950 — Built clinical-grade invasive ductal carcinoma detection pipeline processing 277,000+ pathology image patches using SimCLR domain-specific pretraining. False negative rate as low as 0.34%. Explainability via Grad-CAM, UMAP, and t-SNE.
-
356356 RNA-seq SRA runs — Designed and automated end-to-end Ebola outbreak genomics pipeline on Ohio Supercomputer Center SLURM HPC cluster (HISAT2, Kallisto, bcftools). Delivered 14 publication-grade outputs with checkpoint-based resume.
-
35%35% pipeline latency reduction — Led ML and ETL delivery for 7 SaaS clients as Technical Lead at Orcinus IT Solutions. Improved deployment reliability by 30% and reduced pipeline failures across production environments.
-
3.91GPA 3.91/4.00 — M.S. Computer Science (AI Track), University of Toledo. Combined with 28 Google Scholar citations, h-index 3, i10-index 1, and 6 publications under review or published in peer-reviewed international venues.
-
37K37,000+ transportation records — Built ensemble ML pipelines and deployed reproducible AWS and Azure Databricks workflows for large-scale experimentation at the Transportation Systems Research Lab.
Status: Actively seeking full-time opportunities · Available immediately
Work Authorization: Authorized to work in the United States
Location: Ohio / Michigan, USA · Open to relocation nationwide
Work Model: Open to remote, hybrid, or on-site positions