Junxiong Wang

I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of systems and large language models. If you would like to see my CV, please feel free to contact me by email. My research focuses on:

ML and system approaches to modeling long sequences:

  • We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
  • We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
  • Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources and outperforms some models trained from scratch using industry scale GPUs.
I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).

Recent Publications

Email:Firstname@cs.cornell.edu  /  Github  /  HuggingFace Models  /  Papers

Profile photo of Junxiong Wang