אנחנו גאים להציג את Google AI Edge Portal: כלי למדידת ביצועים של AI לקצה (Edge AI) בקנה מידה נרחב. להירשם כדי לבקש גישה במהלך התצוגה המקדימה הפרטית.

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

LiteRT-LM Python API

API בשפת Python של LiteRT-LM ל-Linux ו-macOS (תמיכה ב-Windows תגיע בהמשך). התכונות הבאות נתמכות: ריבוי מודאליות, שימוש בכלים והאצת GPU.

מבוא

הנה דוגמה לאפליקציית צ'אט בטרמינל שפותחה באמצעות API בשפת Python:

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

תחילת העבודה

‫LiteRT-LM זמין כספריית Python. אפשר להתקין את גרסת הלילה מ-PyPI:

# Using pip
pip install litert-lm-api-nightly

# Using uv
uv pip install litert-lm-api-nightly

הפעלת המנוע

‫Engine היא נקודת הכניסה ל-API. הוא מטפל בטעינת המודלים ובניהול המשאבים. שימוש בו כמנהל הקשר (עם ההצהרה with) מבטיח שמשאבים מקוריים ישוחררו באופן מיידי.

הערה: טעינת המודל יכולה להימשך כמה שניות.

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU (default) or Backend.GPU.
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

יצירת שיחה

Conversation מנהל את המצב ואת היסטוריית האינטראקציה שלכם עם המודל.

# Optional: Configure system instruction and initial messages
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

שליחת הודעות

אתם יכולים לשלוח הודעות באופן סינכרוני או אסינכרוני (סטרימינג).

דוגמה סינכרונית:

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with full message structure
# response = conversation.send_message({"role": "user", "content": "..."})

דוגמה אסינכרונית (סטרימינג):

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

‫🔴 חדש: חיזוי מרובה טוקנים (MTP)

תחזית מרובת טוקנים (MTP) היא אופטימיזציה של הביצועים שמאיצה באופן משמעותי את מהירויות הפענוח. מומלץ להשתמש ב-MTP לכל המשימות ב-GPU backends.

כדי להשתמש ב-MTP, צריך להפעיל פענוח ספקולטיבי כשמאתחלים את המנוע.

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU,
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

מולטי-מודאליות

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU,
    vision_backend=litert_lm.Backend.GPU,
) as engine:
    with engine.create_conversation() as conversation:
        user_message = {
            "role": "user",
            "content": [
                {"type": "audio", "path": "/path/to/audio.wav"},
                {"type": "text", "text": "Describe this audio."},
            ],
        }
        response = conversation.send_message(user_message)
        print(response["content"][0]["text"])

הגדרה ושימוש בכלים

אפשר להגדיר פונקציות של Python ככלים שהמודל יכול להפעיל באופן אוטומטי.

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

‫LiteRT-LM משתמש במחרוזת התיעוד של הפונקציה וברמזים לגבי סוג כדי ליצור את סכימת הכלי עבור המודל.