PhD Thesis: Compression-Driven Memory-Efficient and High-Throughput GPU Systems for LLM Inference
Published in The University of Sydney (USYD), 2026
My doctoral dissertation focuses on alleviating the memory wall and communication bottlenecks in Large Language Model (LLM) inference through algorithm-system co-design, including low-bit quantization and unstructured sparsity acceleration.
Recommended citation: Haojun Xia. "Compression-Driven Memory-Efficient and High-Throughput GPU Systems for LLM Inference." PhD Thesis, The University of Sydney, 2026.