I work on neural architectures, training methods, and computing hardware β mostly trying to rebuild things from scratch rather than extend what exists.
Transistor-free computing on hydrogen-passivated silicon. Computation happens via resonant electron absorption in 5-atom dangling-bond clusters. Memory and compute share the same atoms β no cache hierarchy, no DRAM bus. On a 3 cmΒ² die: 14.1 TB in-situ memory at 79 mW.
Three attention projections instead of four. Content routes through a rank-r state β action matrix grounded in RL navigation. 16Γ smaller KV-cache than standard attention at β€1.6 ppl penalty. Trains across text, vision, audio, world states, and cancer genomics under one block class.
Modular multi-agent cognitive architecture. 12 specialized domain experts collaborating through Web-of-Thought reasoning.
Fine-tuning that skips samples the model already knows. Compute is routed to hard samples; mastered ones are frozen. Up to 80% compute savings at scale.

