Gemma 4 released with text, audio and image input and long up to 256K context window! Learn more

Run Gemma with MLX

MLX is an array framework for machine learning on Apple silicon.

Quick start

Install from the Python Package Index (PyPI)

pip install mlx mlx-lm mlx-vlm

Example command:

# Text Generation
mlx_lm.generate --model mlx-community/gemma-4-e2b-it-4bit --prompt "Who are you?"

# Vision Task
mlx_vlm.generate --model mlx-community/gemma-4-e2b-it-4bit --prompt "Describe this image." --image <path_to_image>

You can start the server with:

mlx_vlm.server --port 8080

# Preload a model at startup (Hugging Face repo or local path)
mlx_vlm.server --model mlx-community/gemma-4-e2b-it-4bit

This creates a server that lets you access your model with the OpenAI-compatible endpoint (http://localhost:8080/v1).

For more information and instructions on how to use MLX with Gemma, refer to the official repository:

MLX on GitHub
MLX Community on Hugging Face