CV
Education
- Ph.D. in Computer Science, The University of Sydney (USYD), 2022 – 2026
- Advisor: Shuaiwen Leon Song
- Thesis: Compression-Driven Memory-Efficient and High-Throughput GPU Systems for LLM Inference.
- M.S. in Computer Architecture, University of Science and Technology of China (USTC), 2018 – 2021
- Thesis: The design and implementation of a lightweight Automata Processor.
- Honors: First Class Scholarship
- B.S. in Computer Science and Technology, University of Science and Technology of China (USTC), 2014 – 2018
- Thesis: FPGA Based CNN Accelerator Design.
- Honors: Talent Program
Work Experience
| Research Software Engineer | Google Research |
- Nov 2025 – Present (Full-time, On-site)
- Duties: Optimizing production-level TPU-based LLM inference systems.
| Research Consultant | Together AI |
- Mar 2024 – Sept 2024 (Part-time, Remote)
- Duties: Optimizing the LLM inference system, identifying and mitigating performance issues in LoRA inference, and developing the FP8 MHA Decoding GPU kernel using OpenAI Triton.
| Research Intern | Alibaba Cloud |
- Feb 2022 – Aug 2023 (Remote, Un-paid)
- Duties: Extension of the former research project on large-scale ML model acceleration frameworks.
| Research Intern | Alibaba Cloud |
- Aug 2021 – Jan 2022 (On-site)
- Duties: Part of the Alibaba Innovative Research (AIR) program. Investigated SOTA system support for LLMs, and R&D a novel large-scale ML model acceleration framework.
Skills
- Research Interests: Performance Optimization, Machine Learning System, Runtime Systems, Computer Architecture, Domain Specific Architectures, GPU/TPU Kernel Design
- Programming Languages: JAX, Pallas (TPU), C/C++, Python, CUDA (GPU), Triton (GPU), Verilog HDL (FPGA)
- Software & Frameworks: Machine Learning Frameworks (e.g., PyTorch, Huggingface Transformers, Faster Transformer), Embedded System Design and Implementation (e.g., Xilinx FPGA + ARM CPUs)
Selected Publications
(Note: Complete list generated automatically from _publications/ folder)
- [MLSys’26] Haojun Xia, Xiaoxia Wu, Jisen Li, et al. “Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost.”
- [ATC’24] Haojun Xia, Zhen Zheng, Xiaoxia Wu, et al. “FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.”
- [VLDB’24] Haojun Xia, Zhen Zheng, Yuchao Li, et al. “Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity.”
- [OSDI’24] Donglin Zhuang, Zhen Zheng, Haojun Xia, et al. “MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures.”
Haojun Xia. "Compression-Driven Memory-Efficient and High-Throughput GPU Systems for LLM Inference." PhD Thesis, The University of Sydney, 2026.
Haojun Xia, Xiaoxia Wu, Jisen Li, Tsai-chuan Wu, Junxiong Wang, Jue Wang, Chenxi Li, Aman Singhal, Alay Dilipbhai Shah, Alpay Ariyak, Donglin Zhuang, Zhongzhu Zhou, Ben Athiwaratkun, Zhen Zheng, Shuaiwen Song. "Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost." Annual Conference on Machine Learning and Systems (MLSys), 2026.
Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song. "Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity." International Conference on Very Large Data Bases (VLDB), 2024.
Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song. "FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design." USENIX Annual Technical Conference (ATC), 2024.
Qiyu Wan, Haojun Xia, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song, Xin Fu. "Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving." In Proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2021.
Haojun Xia. "The design and implementation of a lightweight automata processor." Master's Thesis, University of Science and Technology of China, 2021.
Xingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael Taylor, Shuaiwen Leon Song. "η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities." In Proceedings of International Symposium on Computer Architecture (ISCA), 2021.
Haojun Xia, Lei Gong, Chao Wang, Xianglan Chen, Xuehai Zhou. "LAP: A Lightweight Automata Processor for Pattern Matching Tasks." In Proceedings of Design, Automation and Test in Europe Conference (DATE), 2021.
Talks
July 12, 2024
Talk at USENIX Annual Technical Conference (ATC '24), Santa Clara, CA, USA
Teaching
- Teaching Assistant, USTC
- Mar 2019 – Jun 2019: Computer Architecture 2019
- Mar 2018 – Jun 2018: Computer Architecture 2018
- Sep 2017 – Dec 2017: Digital Circuit Theory 2017
- Sep 2016 – Dec 2016: Digital Circuit Experiments 2016
Honors and Achievements
- 2022 – 2026: Faculty of Engineering Research Scholarship, PhD study, USYD
- 2021: Outstanding Graduates (Top 15%), Master’s study, USTC
- 2020: Suzhou Park Scholarship, Master’s study, USTC
- 2018 – 2021: First Class Academic Scholarship, Master’s study, USTC
- 2018: Yang Yuanqing Education Fund - Top Research Scholarship, Bachelor’s study, USTC
- 2018: Outstanding Graduates (Top 15%), Bachelor’s study, USTC
- 2014 – 2018: Talent program in computer science and technology, USTC