-
Achieving Efficient, Flexible, and Portable Structured Generation with XGrammar
Nov 22, 2024
-
Optimizing and Characterizing High-Throughput Low-Latency LLM Inference in MLCEngine
Oct 10, 2024
-
WebLLM: A High-Performance In-Browser LLM Inference Engine
Jun 13, 2024
-
MLC-LLM: Universal LLM Deployment Engine with ML Compilation
Jun 7, 2024
-
GPU-Accelerated LLM on a $100 Orange Pi
Apr 20, 2024
-
Scalable Language Model Inference on Multiple NVIDIA and AMD GPUs
Oct 19, 2023
-
Making AMD GPUs competitive for LLM inference
Aug 9, 2023
-
Bringing Open Large Language Models to Consumer Devices
May 22, 2023
-
Bringing Hardware Accelerated Language Models to Android Devices
May 8, 2023
-
Bringing Hardware Accelerated Language Models to Consumer Devices
May 1, 2023