Junxiong Wang

I am an AI researcher at TogetherAI, where I work on efficient language models, speculative decoding, sparsity, distillation and related topics in ML system. I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of systems and large language models (though I'm not sure they were large enough). If you have any projects or openings or would like to chat about research, feel free to reach out to me by email. My research focuses on:

ML and system approaches to modeling long sequences:

  • We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
  • We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
  • Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources and outperforms some models trained from scratch using industry scale GPUs.
I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).

Recent Publications

Email:Firstname@cs.cornell.edu  /  Github  /  HuggingFace  /  Papers

Profile photo of Junxiong Wang