Junxiong Wang
I am a CS PhD from Cornell University, where I worked at the intersection of systems and large language models (though not sure whether large enough).
My research focuses on:
ML and system approaches to modeling long sequences:
- We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
- We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
- Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources (< 800 GPU hours in H100) and outperforms models trained from scratch using industry scale GPUs (> 1 million GPU hours in H100).
I also had looked at
ML for data system:
I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).
Recent Publications
-
Junxiong Wang*, Daniele Paliotta*, Avner May, Alexander M. Rush, Tri Dao
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Models, Video, Code, Blog
Neural Information Processing Systems (NeurIPS), 2024
A shorter version at ICML 2024, 2nd Workshop on Efficient Systems for Foundation Models (ES-FoMo)
-
Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M. Rush
MambaByte: Token-free Selective State Space Model
Models, Video
Conference on Language Modeling (CoLM), 2024
-
Junxiong Wang, Ali Mousavi, Omar Attia, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li
Entity Disambiguation via Fusion Entity Decoding
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
-
Junxiong Wang*, Kaiwen Wang*, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning
Reinforcement Learning Conference (RLC), 2024,
Code
-
Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush
Pretraining Without Attention
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2023
Code,
Models,
Slides
-
Junxiong Wang, Mitchell Gray, Immanuel Trummer, Ahmet Kara, Dan Olteanu
ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement
Learning
International Conference on Very Large Data Bases (VLDB), 2023, Code
-
Junxiong Wang, Immanuel Trummer, Debabrota Basu
UDO: Universal Database Optimization using Reinforcement Learning
International Conference on Very Large Data Bases (VLDB), 2022, Code
Email:Firstname@cs.cornell.edu /
Github /
HuggingFace /
Papers
|
|