Introduction

Run AI models locally and expose them to your self-hosted apps.

Local inference is currently in private alpha. It will promote to public beta once the API has stabilized.

Lawn acts as an inference broker for your self-hosted apps. Define providers — local models, machines on your network, cloud APIs — and Lawn routes each request to the right place. One endpoint for your apps, full control over where inference happens.

What to expect

Define your providers — configure local models, cloud APIs like OpenRouter, or machines on your network as inference providers, all in one place
Hardware acceleration — models run on your Mac's GPU using MLX and Metal
Distributed inference — utilize local clusters to spread the load across machines on your network, including NVIDIA GPUs on Linux
Exposed to containers — containerized apps call a single endpoint, Lawn handles the rest
Inference routing — route requests to local models, remote machines, or cloud providers — you control which requests go where
Your rules — keep everything local, use the cloud as a fallback, or mix and match per app

How it works

Apps make inference requests to Lawn's local broker, which routes them to a local model running on Apple Silicon or to a cloud provider like OpenRouter — all behind a single endpoint.

Example setup

An OpenClaw instance routing inference across three providers — a local Mac running Qwen3, a Linux machine with an NVIDIA GPU running GLM-4.5V, and OpenRouter as a cloud fallback.

Inference routing diagram