Ti presentiamo Google AI Edge Portal: esegui il benchmarking dell'IA di Edge su larga scala. Registrati per richiedere l'accesso durante l'anteprima privata.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

API Python LiteRT-LM

L'API Python di LiteRT-LM per Linux e macOS (il supporto di Windows è in arrivo). Sono supportate funzionalità come la multimodalità, l'utilizzo degli strumenti e l'accelerazione GPU.

Introduzione

Ecco un'app di chat del terminale di esempio creata con l'API Python:

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

Per iniziare

LiteRT-LM è disponibile come libreria Python. Puoi installare la versione notturna da PyPI:

# Using pip
pip install litert-lm-api-nightly

# Using uv
uv pip install litert-lm-api-nightly

Inizializzare il motore

Engine è il punto di ingresso dell'API. Gestisce il caricamento dei modelli e la gestione delle risorse. L'utilizzo come gestore di contesto (con l'istruzione with) garantisce che le risorse native vengano rilasciate tempestivamente.

Nota: l'inizializzazione del motore può richiedere diversi secondi per caricare il modello.

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU (default) or Backend.GPU.
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

Creare una conversazione

Una Conversation gestisce lo stato e la cronologia dell'interazione con il modello.

# Optional: Configure system instruction and initial messages
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

Invio di messaggi

Puoi inviare messaggi in modo sincrono o asincrono (streaming).

Esempio sincrono:

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with full message structure
# response = conversation.send_message({"role": "user", "content": "..."})

Esempio asincrono (streaming):

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

🔴 Novità: previsione multi-token (MTP)

La previsione multi-token (MTP) è un'ottimizzazione delle prestazioni che accelera notevolmente le velocità di decodifica. MTP è universalmente consigliato per tutte le attività sui backend GPU.

Per utilizzare MTP, attiva la decodifica speculativa durante l'inizializzazione del motore.

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

Multimodalità

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU,
    vision_backend=litert_lm.Backend.GPU,
) as engine:
    with engine.create_conversation() as conversation:
        user_message = {
            "role": "user",
            "content": [
                {"type": "audio", "path": "/path/to/audio.wav"},
                {"type": "text", "text": "Describe this audio."},
            ],
        }
        response = conversation.send_message(user_message)
        print(response["content"][0]["text"])

Definire e utilizzare gli strumenti

Puoi definire le funzioni Python come strumenti che il modello può chiamare automaticamente.

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

LiteRT-LM utilizza la stringa di documentazione e i suggerimenti sui tipi della funzione per generare lo schema dello strumento per il modello.