|
Junxiong Wang
I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of large language models and systems, with a focus on linear models and their hybrid variants.
I lead multiple research projects at Together AI, including adaptive speculative decoding, inference-time training, and efficient RL rollouts.
Recent Publications
- Junxiong Wang*†, Fengxiang Bie*†, Jisen Li†, Zhongzhu Zhou†, Zelei Shao†, Yubo Wang†, Yinghui Liu†, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao, Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu†, Xiaoxia Wu†
When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
-
Zelei Shao*, Vikranth Srivatsa*, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue Wang, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang
Beat the long tail: Distribution-Aware Speculative Decoding for RL Training
Conference on Machine Learning and Systems (MLsys), 2026
-
Haojun Xia*, Xiaoxia Wu*, Jisen Li*, Robert Wu, Junxiong Wang, Jue Wang, Chenxi Li, Aman Singhal, Alay Dilipbhai Shah, Alpay Ariyak, Donglin Zhuang, Zhongzhu Zhou, Ben Athiwaratkun, Zhen Zheng, Shuaiwen Leon Song
Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost
Conference on Machine Learning and Systems (MLsys), 2026
-
Jiaqi Leng*, Xiang Hu*, Junxiong Wang, Jianguo Li, Wei Wu, Yucheng Lu
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
International Conference on Learning Representations (ICLR), 2026
-
Zhongzhu Zhou, Fengxiang Bie, Ziyan Chen, Zhenyu Zhang, Yibo Yang, Junxiong Wang, Ben Athiwaratkun, Xiaoxia Wu, Shuaiwen Leon Song
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
International Conference on Learning Representations (ICLR), 2026
-
Woojeong Kim, Junxiong Wang, Jing Nathan Yan, Mohamed S. Abdelfattah, Alexander M. Rush
Overfill: Two-Stage Models for Efficient Language Model Decoding
Conference on Language Modeling (CoLM), 2025
-
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Workshop on Efficient Reasoning (Best Paper Award), Neural Information Processing Systems (NeurIPS), 2025
-
Junxiong Wang*, Daniele Paliotta*, Avner May, Alexander M. Rush, Tri Dao
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Models, Video, Code, Blog
Neural Information Processing Systems (NeurIPS), 2024
A shorter version at ICML 2024, 2nd Workshop on Efficient Systems for Foundation Models (ES-FoMo)
Email:Firstname@cs.cornell.edu /
Github /
HuggingFace Models /
Papers /
Twitter
|
|