Junxiong Wang
I am an AI researcher at TogetherAI, where I work on efficient language models, speculative decoding, sparsity, distillation and related topics in ML system.
I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of systems and large language models (though I'm not sure they were large enough).
If you have any projects or openings or would like to chat about research, feel free to reach out to me by email.
My research focuses on:
ML and system approaches to modeling long sequences:
- We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
- We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
- Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources and outperforms some models trained from scratch using industry scale GPUs.
I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).
Recent Publications
-
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
In submission
-
Daniele Paliotta*, Junxiong Wang*, Matteo Pagliardini*, Kevin Y Li*, Aviv Bick, J Zico Kolter, Albert Gu, François Fleuret, Tri Dao
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
A shorter version at ICLR 2024, Workshop on Reasoning and Planning for Large Language Models
In submission
-
Junxiong Wang*, Daniele Paliotta*, Avner May, Alexander M. Rush, Tri Dao
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Models, Video, Code, Blog
Neural Information Processing Systems (NeurIPS), 2024
A shorter version at ICML 2024, 2nd Workshop on Efficient Systems for Foundation Models (ES-FoMo)
-
Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
MambaByte: Token-free Selective State Space Model
Models, Video
Conference on Language Modeling (CoLM), 2024
-
Junxiong Wang, Ali Mousavi, Omar Attia, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li
Entity Disambiguation via Fusion Entity Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
-
Junxiong Wang*, Kaiwen Wang*, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Reinforcement Learning Conference (RLC), 2024,
Code
-
Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Pretraining Without Attention
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2023
Code,
Models,
Slides
-
Junxiong Wang, Mitchell Gray, Immanuel Trummer, Ahmet Kara, Dan Olteanu
ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement
Learning
International Conference on Very Large Data Bases (VLDB), 2023, Code
-
Junxiong Wang, Immanuel Trummer, Debabrota Basu
UDO: Universal Database Optimization using Reinforcement Learning
International Conference on Very Large Data Bases (VLDB), 2022, Code
Email:Firstname@cs.cornell.edu /
Github /
HuggingFace /
Papers
|
|