Junxiong Wang

I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of systems and large language models, with a focus on linear models and their hybrid variants. I lead multiple research projects at Together AI, including adaptive speculative decoding, inference time training, and efficient RL rollouts. If you would like to see my CV, please feel free to contact me by email.

Recent Publications

Woojeong Kim, Junxiong Wang, Jing Nathan Yan, Mohamed S. Abdelfattah, Alexander M. Rush
Overfill: Two-Stage Models for Efficient Language Model Decoding
Conference on Language Modeling (CoLM), 2025
Daniele Paliotta*, Junxiong Wang*, Matteo Pagliardini*, Kevin Y Li*, Aviv Bick, J Zico Kolter, Albert Gu, François Fleuret, Tri Dao
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
A shorter version at ICLR 2024, Workshop on Reasoning and Planning for Large Language Models
In submission
Junxiong Wang*, Daniele Paliotta*, Avner May, Alexander M. Rush, Tri Dao
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Models, Video, Code, Blog
Neural Information Processing Systems (NeurIPS), 2024
A shorter version at ICML 2024, 2nd Workshop on Efficient Systems for Foundation Models (ES-FoMo)
Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
MambaByte: Token-free Selective State Space Model
Models, Video
Conference on Language Modeling (CoLM), 2024

Email:Firstname@cs.cornell.edu / Github / HuggingFace Models / Papers / Twitter