Ti presentiamo Google AI Edge Portal: esegui il benchmarking dell'IA di Edge su larga scala. Registrati per richiedere l'accesso durante l'anteprima privata.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

API Python LiteRT-LM

L'API Python di LiteRT-LM per Linux, macOS e Windows. Sono supportate funzionalità come multimodalità, utilizzo di strumenti e accelerazione GPU e NPU.

Introduzione

Ecco un'app di chat terminale di esempio creata con l'API Python:

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

Per iniziare

LiteRT-LM è disponibile come libreria Python. Puoi installare il pacchetto da PyPI:

# Using pip
pip install litert-lm-api

# Using uv
uv pip install litert-lm-api

Inizializzare il motore

Engine è il punto di ingresso dell'API. Gestisce il caricamento dei modelli e la gestione delle risorse. L'utilizzo come gestore di contesto (con l'istruzione with) garantisce il rilascio tempestivo delle risorse.

Nota:l'inizializzazione del motore può richiedere diversi secondi per caricare il modello.

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU() (default), Backend.GPU() or Backend.NPU().
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU(),
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

Creare una conversazione

Un Conversation gestisce lo stato e la cronologia della tua interazione con il modello.

# Optional: Configure system instruction and initial messages
messages = [litert_lm.Message.system("You are a helpful assistant.")]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

Invio di messaggi

Puoi inviare messaggi in modo sincrono o asincrono (streaming).

I metodi send_message e send_message_async accettano:

Un str (inserito automaticamente come messaggio utente).
Un oggetto litert_lm.Contents (per input multimodali).
Un oggetto litert_lm.Message (per la struttura completa del messaggio).
Un oggetto dizionario di tipo JSON come input del modello di prompt.

Esempio sincrono:

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with a Message object
# response = conversation.send_message(litert_lm.Message.user("What is the capital of France?"))

Esempio asincrono (streaming):

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

🔴 Novità: previsione multi-token (MTP)

La previsione di più token (MTP) è un'ottimizzazione delle prestazioni che accelera notevolmente le velocità di decodifica. MTP è universalmente consigliato per tutte le attività sui backend GPU.

Per utilizzare MTP, abilita la decodifica speculativa durante l'inizializzazione del motore.

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU(),
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

Multi-Modality

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU(),
    vision_backend=litert_lm.Backend.GPU(),
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message(
            litert_lm.Contents.of(
                "Describe this audio.",
                litert_lm.Content.AudioFile(absolute_path="/path/to/audio.wav"),
            )
        )
        print(response["content"][0]["text"])

Definizione e utilizzo degli strumenti

Puoi definire le funzioni Python come strumenti che il modello può chiamare automaticamente.

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

LiteRT-LM utilizza la docstring e i suggerimenti sul tipo della funzione per generare lo schema dello strumento per il modello.