LiteRT-LM CLI can start a local HTTP server that is compatible with the OpenAI API. This lets you use LiteRT-LM as a drop-in replacement for OpenAI in your existing applications and workflows.
Start the Server
Use the serve command to start the server. By default, it starts an
OpenAI-compatible server on port 9379.
The server dynamically loads and serves any models in your local registry. To manage the models available to the server (such as importing new models or listing existing ones), see Model Management.
litert-lm serve
Configuration Options
You can customize the server using the following options:
--host: The host to listen on (default:0.0.0.0).--port: The port to listen on (default:9379).--verbose: Enable verbose logging.
Example with custom host and port:
litert-lm serve --host 127.0.0.1 --port 8080
Supported Endpoints
The server emulates the following OpenAI API endpoints:
- List Models:
GET /v1/modelsLists the models that are available to the server. - Chat Completions:
POST /v1/chat/completionsGenerates text completions for a given chat conversation. Supports streaming responses.
Choosing the Backend and Configuration
When sending requests to the server, you can dynamically choose the execution
backend (CPU, GPU, or NPU) and configure the maximum number of tokens (context
length) by formatting the model field in your request payload.
The model field supports the following format:
model_id[,backend][,max_tokens]
Where model_id corresponds to any model ID in your local registry (see
Model Management for details on how to list or import
models).
Examples
gemma4-2b: default backend (CPU) with default max tokens.gemma4-2b,gpu: GPU backend with default max tokens.gemma4-2b,gpu,32768: GPU backend with max tokens 32768.
Usage Example
Once the server is running, you can interact with it by sending HTTP requests..
Sending HTTP Requests
Linux/macOS
curl http://localhost:9379/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma4-2b",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Windows
Invoke-RestMethod -Uri "http://localhost:9379/v1/chat/completions" `
-Method Post `
-ContentType "application/json" `
-Body '{"model": "gemma4-2b", "messages": [{"role": "user", "content": "Hello!"}]}'