/ jacob ioffe 🚠

Deep Embedded Clustering Analysis of Depression's Proteomic Architecture
Project preview
ML Computational Biology Python UK Biobank

A deep learning system for analyzing depression's proteomic architecture using UK Biobank data (53,000+ participants, 2,900 proteins). Implements Improved Deep Embedded Clustering with a symmetric autoencoder (d-500-500-2000-16 architecture) to investigate biological subtypes. Key findings demonstrate continuous rather than categorical protein expression patterns, with robust demographic signal detection and validation across multiple clustering approaches (PCA+k-means, UMAP+HDBSCAN). Research done with Weill Cornell Medicine under Dr. Logan Grosenick, Dr. Connor Liston, and Elias Scheer.

Waypoint.

Project preview
Technial Co-Founder

An AI-powered research compliance automation system that streamlines the intake and routing of institutional review requests. The platform uses specialized agents to evaluate and process different types of compliance documents (IRB protocols, data use agreements, BAAs), automatically determining appropriate workflows and institutional requirements. Features include document classification, requirement extraction, and intelligent routing to relevant compliance offices, with custom CRM integration for tracking request status and maintaining audit trails.

Optimizing LLM Inference Through CPU-GPU Hybrid Execution
Project preview
Deep Learning ML Systems Python

A hybrid CPU-GPU execution system for enabling ultra-long context LLM inference on consumer hardware. Built for the LLaMA 3.2 1B model, the system intelligently offloads decoding to CPU while maintaining GPU-accelerated prefill, achieving one-third of GPU-only throughput while doubling or quadrupling maximum context length. Features dynamic Key-Value cache management and quantization optimizations for memory-constrained environments.

Sustainably Advancing Health AI

Project preview
Sustainable Computing LLM Inference

Exploring how implementation of LLMs impacts energy, sustainability, and cost in the Stanford Medical System.

BlueTape
Project preview
AgTech Product Design

BlueTape is a real-time, chemical-specific monitoring platform bringing awareness to chemical drift that occurs. Currently piloting with BlueWhite. First place winner of MindState Ideation Competition.

Molecular Dynamics Profiling at J&J Innovative Medicine

Project preview
NVIDIA Gromacs Drug Discovery Linux

A benchmarking and optimization framework for GROMACS molecular dynamics simulations using NVIDIA MPS. The system enables efficient GPU sharing across multiple simulation instances through intelligent resource allocation and workload scheduling. Achieved 30% throughput improvement for molecular simulations ranging from 6k to 12M atoms, with automated performance profiling and cost optimization for drug discovery pipelines.

DNN Primitives
Project preview
TVM Deep Learning

A TVM-based implementation of fundamental deep learning primitives (1D Conv, GEMM, 2D Depthwise) with hardware-specific optimizations. Achieved 220% speedup over baseline TVM implementation for CPU Conv and 40% improvement for GPU Conv/GEMM through systematic application of tiling, vectorization, and memory hierarchy optimizations. Features hardware-aware scheduling strategies with automated parameter tuning for cache sizes, SIMD capabilities, and thread block configurations.

TinyConv

Project preview
NVIDIA Gromacs Drug Discovery Linux

A hardware-optimized speech recognition model for resource-constrained devices, implementing structured pruning and quantization-aware training. Achieved 75% parameter reduction and 18% runtime improvement while maintaining 85% accuracy. Key features include channel-aligned structured pruning for predictable memory access, minifloat quantization with custom exponent/mantissa configurations, and a comprehensive hardware-in-the-loop validation pipeline for Arduino deployment.

Heart Murmur Detection
Project preview
Python Machine Learning Signal Processing

A machine learning system for detecting heart murmurs in phonocardiogram recordings, optimized for resource-constrained healthcare settings. Built on the CirCor DigiScope dataset (5230 recordings), the system uses careful signal processing and traditional ML models to achieve 86.67% recall with 96.03% precision. Key features include location-specific models (PV, AV, TV, MV), MFCC-based feature extraction, and recall-optimized classification thresholds. Deployed as a Streamlit application with SHAP value explanations for clinical interpretability.

MiniTorch

Project preview
PyTorch Numpy Python

MiniTorch is a diy teaching library for machine learning engineers who wish to learn about the internal concepts underlying deep learning systems. It is a pure Python re-implementation of the Torch API designed to be simple, easy-to-read, tested, and incremental.

JournalAI
Project preview
Python RAG

JournalAI was a Hackathon project that won 1st place at (LlamaHack), a 100 person Cornell-Tech Hackathon open to all Ivy League students.

* Its like coffee with a y