Home
Optimizing and Characterizing High-Throughput Low-Latency LLM Inference in MLCEngine
Oct 10, 2024
WebLLM: A High-Performance In-Browser LLM Inference Engine
Jun 13, 2024
MLC-LLM: Universal LLM Deployment Engine with ML Compilation
Jun 7, 2024
GPU-Accelerated LLM on a $100 Orange Pi
Apr 20, 2024
Scalable Language Model Inference on Multiple NVIDIA and AMD GPUs
Oct 19, 2023
Making AMD GPUs competitive for LLM inference
Aug 9, 2023
Bringing Open Large Language Models to Consumer Devices
May 22, 2023
Bringing Hardware Accelerated Language Models to Android Devices
May 8, 2023
Bringing Hardware Accelerated Language Models to Consumer Devices
May 1, 2023