About me

I am a Research Software Engineer at Google Research, Australia, based in Sydney. I am currently focusing on ML efficiency, especially LLM inference optimizations. I obtained my Ph.D. from the School of Computer Science, Faculty of Engineering at The University of Sydney (USYD). Prior to that, I received my Master’s and Bachelor’s degree from the University of Science and Technology of China (USTC).

My research lies at the intersection of Computer Systems and Machine Learning (MLSys). I am passionate about breaking the memory wall and communication bottlenecks in generative AI. My work primarily focuses on algorithm-system co-design for Large Language Models (LLMs), including:

  • High-Performance Kernel Optimization: Designing efficient system acceleration and custom GPU/TPU kernels for Generative AI, including the kernel design of efficient MLP, Attention, and MoE layers.
  • Extreme Model Compression: Developing low-bit weight/KV cache quantization (e.g., FP6, sub-byte formats) and unstructured sparsity exploitation to maximize inference throughput.

Through systematic innovations across the algorithm, runtime, and architecture boundaries, my ultimate goal is to build memory-efficient, high-throughput, and scalable infrastructure to democratize next-generation AI foundation models.