Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

AMD's Lemonade is a fast, open-source local LLM server optimized for GPUs and NPUs, featuring a lightweight C++ backend and OpenAI API compatibility.
Refreshingly fast
on GPUs and NPUs
Built by the local AI community for every PC.
Lemonade exists because local AI should be free, open, fast, and private.
Works with great apps.
Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard.
Built for practical local AI workflows.
Everything from install to runtime is optimized for fast setup, broad compatibility, and local-first execution.
Native C++ Backend
Lightweight service that is only 2MB.
One Minute Install
Simple installer that sets up the stack automatically.
OpenAI API Compatible
Works with hundreds of apps out-of-box and integrates in minutes.
Auto-configures for your hardware
Configures dependencies for your GPU and NPU.
Multi-engine compatibility
Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more.
Multiple Models at Once
Run more than one model at the same time.
Cross-platform
A consistent experience across Windows, Linux, and macOS (beta).
Built-in app
A GUI that lets you download, try, and switch models quickly.
One local service for every modality.
Point your app at Lemonade and get chat, vision, image gen, transcription, speech gen, and more with standard APIs.
Always improving.
Track the newest improvements and highlights from the Lemonade release stream.
Source: Hacker News












