Junxiong Wang
I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of systems and large language models (though I'm not sure they were large enough).
If you would like to chat about research, feel free to reach out to me by email.
My research focuses on:
ML and system approaches to modeling long sequences:
- We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
- We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
- Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources and outperforms some models trained from scratch using industry scale GPUs.
I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).
Recent Publications
-
Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
In submission
-
Woojeong Kim, Junxiong Wang, Jing Nathan Yan, Mohamed S. Abdelfattah, Alexander M. Rush
Overfill: Two-Stage Models for Efficient Language Model Decoding
In submission
-
Daniele Paliotta*, Junxiong Wang*, Matteo Pagliardini*, Kevin Y Li*, Aviv Bick, J Zico Kolter, Albert Gu, François Fleuret, Tri Dao
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
A shorter version at ICLR 2024, Workshop on Reasoning and Planning for Large Language Models
In submission
-
Junxiong Wang*, Daniele Paliotta*, Avner May, Alexander M. Rush, Tri Dao
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Models, Video, Code, Blog
Neural Information Processing Systems (NeurIPS), 2024
A shorter version at ICML 2024, 2nd Workshop on Efficient Systems for Foundation Models (ES-FoMo)
-
Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
MambaByte: Token-free Selective State Space Model
Models, Video
Conference on Language Modeling (CoLM), 2024
-
Junxiong Wang, Ali Mousavi, Omar Attia, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li
Entity Disambiguation via Fusion Entity Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
-
Junxiong Wang*, Kaiwen Wang*, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Reinforcement Learning Conference (RLC), 2024,
Code
-
Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Pretraining Without Attention
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2023
Code,
Models,
Slides
-
Immanuel Trummer, Junxiong Wang, Deepak Maram, Saehan Jo, Samuel Moseley,
Joseph Antonakakis
SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning
ACM SIGMOD International Conference on Management of Data (SIGMOD), 2019
Best of SIGMOD, extended version in ACM Transactions on Database Systems (TODS), 2021
Email:Firstname@cs.cornell.edu /
Github /
HuggingFace /
Papers
|
|