Junxiong Wang

I am a CS PhD from Cornell University, where I worked at the intersection of systems and large language models (though not sure whether large enough). My research focuses on:

ML and system approaches to modeling long sequences:

  • We introduce the first bidirectional linear-complexity language model BiGS that matches BERT performance without using attention.
  • We demonstrate that linear RNNs also outperform transformers in byte-level language modeling (high resolution data), enabling the universal representation of different modalities and formats.
  • Training LLMs from scratch is costly, we explore distilling large transformers into linear RNNs. Our distillation approach, MambaInLlama utilizes only academic budget resources (< 800 GPU hours in H100) and outperforms models trained from scratch using industry scale GPUs (> 1 million GPU hours in H100).
I also had looked at

ML for data system:

I was incredibly fortunate to have spent my summers working with outstanding researchers at Apple AI/ML Siri & Information Intelligence (2023), Microsoft Research (2020).

Recent Publications

Email:Firstname@cs.cornell.edu  /  Github  /  HuggingFace  /  Papers

Profile photo of Junxiong Wang