Junxiong Wang

I obtained my PhD in Computer Science from Cornell University, where I worked at the intersection of systems and large language models (though I'm not sure they were large enough). If you would like to chat about research, feel free to reach out to me by email. My research focuses on:

ML and system approaches to modeling long sequences:

  • We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
  • We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
  • Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources and outperforms some models trained from scratch using industry scale GPUs.
I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).

Recent Publications

Email:Firstname@cs.cornell.edu  /  Github  /  HuggingFace  /  Papers

Profile photo of Junxiong Wang