Keysmash Local Tech Stack, March 2026
Lemonade Server and Goose on an AMD Strix Halo setup in good old Fedora linux rounds out the March '26 stack. AMD Strix Halo devices offer CPU, GPU, and NPU workloads that Lemonade Server can offer to Goose. A powerhouse (and power efficient) local AI host right on the desktop.
Lemonade Server and Goose on an AMD Strix Halo setup in good old Fedora linux rounds out the March '26 stack. AMD Strix Halo devices offer CPU, GPU, and NPU workloads that Lemonade Server can offer to Goose. A powerhouse (and power efficient) local AI host right on the desktop.
The State of Affairs
OpenClaw has been making the most waves with Pi following up for the trendy, but one AI agent has gone mostly ignored: the humble Goose. Similarly, in a world of vLLM, llama.cpp, and ollama we find Lemonade Server. These pair particularly well in the latest offerings of wonderful Strix Halo machines like the Minisforum MS-S1 and the darling Framework Desktop.
Goose
Goose is written in rust with a console application as well as a desktop application for Windows, Linux, and MacOS. The agent system allows you to create subagents from recipes and each recipe can use a separate model as desired. You can define a recipe to use, for example, SDXL Turbo for image generation and tell your current session using a separate model to use this recipe. Voila! Your top level agent running Gemini CLI or Claude Code can now dispatch to a local model.
Similarly, and more simply, Goose comes with a Lead/Worker configuration allowing you to have a "smart" supervisor spin up cheaper workers.
These configurations not only enable functionality in your workflow that isn't supported by a single model, such as the image generation example, but also allows you to save some tokens and protect your context by using cheaper (and local) workers.
Rounding up the benefits of Goose is the membership in the Linux Foundation's Agentic AI Foundation.
With the ability to orchestrate multiple models across various configurations, all we need now is a high-performance server to back it up.
Strix Halo
AMD's platform for AI, AMD's Strix Halo is a leap forward not only by featuring an NPU but also featuring unified memory.
For systems like the Framework Desktop and Minisforum MS-S1, that means the entire system RAM is available for use with large models. As a bonus, these systems don't require a rack or the 1000W PSU of a desktop PC equivalent - they are power efficient solutions with a small footprint.
To make Strix Halo as an AI host even more efficient you can consider using the NPU. This is an extremely power efficient solution to running models. The performance will not be on par with the standard GPU (or a desktop or server GPU, of course) but the power demand will be trivial in comparison. It is the perfect place to "stuff" an embeddings model or a speech-to-text engine, leaving the GPU free for heavy lifting.
Lemonade Server
The magic of the stack is the open source Lemonade Server. Under the hood, this features llama.cpp and FastFlowLM and offers a complete solution for CPU, GPU (ROCm), GPU (Vulkan), and NPU tasks.
Windows can host the Lemonade Server with full support for the NPU. With some tweaking, you can host Lemonade Server on any linux system with a recent kernel, amdnpu firmware, and amdxdna drivers.
As a bonus, splitting out small and fast models on the NPU does not contend with the GPU scheduling of larger models.
Combining It All Together
Lemonade Server can fully leverage all of the available Strix Halo features. A compact Strix Halo based system on your desktop can be densely packed with great models to be leveraged by an AI agent. Whisper and embeddings can be loaded on the NPU, StableDiffusion on the GPU, and Qwen ready to answer questions, goose is free to coordinate subagents across all the available endpoints and all available hardware.