Memperkenalkan Google AI Edge Portal: Lakukan benchmark Edge AI dalam skala besar. Daftar untuk meminta akses selama pratinjau pribadi.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

LiteRT-LM Python API

Python API LiteRT-LM untuk Linux dan macOS (dukungan Windows akan segera hadir). Fitur seperti multi-modalitas, penggunaan alat, dan akselerasi GPU didukung.

Pengantar

Berikut adalah contoh aplikasi chat terminal yang dibuat dengan Python API:

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

Memulai

LiteRT-LM tersedia sebagai library Python. Anda dapat menginstal versi nightly dari PyPI:

# Using pip
pip install litert-lm-api-nightly

# Using uv
uv pip install litert-lm-api-nightly

Menginisialisasi Engine

Engine adalah titik entri ke API. Engine menangani pemuatan model dan pengelolaan resource. Menggunakannya sebagai pengelola konteks (dengan pernyataan with) memastikan bahwa resource native dirilis dengan cepat.

Catatan: Menginisialisasi mesin dapat memerlukan waktu beberapa detik untuk memuat model.

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU (default) or Backend.GPU.
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

Membuat Percakapan

Conversation mengelola status dan histori interaksi Anda dengan model.

# Optional: Configure system instruction and initial messages
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

Mengirim Pesan

Anda dapat mengirim pesan secara sinkron atau asinkron (streaming).

Contoh Sinkron:

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with full message structure
# response = conversation.send_message({"role": "user", "content": "..."})

Contoh Asinkron (Streaming):

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

🔴 Baru: Prediksi Multi-Token (MTP)

Prediksi Multi-Token (MTP) adalah pengoptimalan performa yang mempercepat kecepatan dekode secara signifikan. MTP direkomendasikan secara universal untuk semua tugas di backend GPU.

Untuk menggunakan MTP, aktifkan dekode spekulatif saat menginisialisasi mesin.

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

Multi-Modalitas

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU,
    vision_backend=litert_lm.Backend.GPU,
) as engine:
    with engine.create_conversation() as conversation:
        user_message = {
            "role": "user",
            "content": [
                {"type": "audio", "path": "/path/to/audio.wav"},
                {"type": "text", "text": "Describe this audio."},
            ],
        }
        response = conversation.send_message(user_message)
        print(response["content"][0]["text"])

Menentukan dan Menggunakan Alat

Anda dapat menentukan fungsi Python sebagai alat yang dapat dipanggil model secara otomatis.

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

LiteRT-LM menggunakan docstring dan petunjuk jenis fungsi untuk membuat skema alat bagi model.