رابط برنامه‌نویسی پایتون LiteRT-LM

رابط برنامه‌نویسی پایتون LiteRT-LM برای لینوکس، macOS و ویندوز . ویژگی‌هایی مانند چندوجهی بودن ، استفاده از ابزارها و شتاب‌دهی GPU و NPU پشتیبانی می‌شوند.

مقدمه

در اینجا یک نمونه برنامه چت ترمینال ساخته شده با API پایتون آورده شده است:

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

شروع کار

LiteRT-LM به عنوان یک کتابخانه پایتون در دسترس است. می‌توانید این بسته را از PyPI نصب کنید:

# Using pip
pip install litert-lm-api

# Using uv
uv pip install litert-lm-api

موتور را مقداردهی اولیه کنید

Engine نقطه ورود به API است. بارگذاری مدل و مدیریت منابع را مدیریت می‌کند. استفاده از آن به عنوان مدیر زمینه (context manager) (با دستور with ) تضمین می‌کند که منابع به سرعت آزاد شوند.

توجه: مقداردهی اولیه موتور برای بارگذاری مدل می‌تواند چند ثانیه طول بکشد.

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU() (default), Backend.GPU() or Backend.NPU().
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU(),
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

ایجاد مکالمه

یک Conversation وضعیت و تاریخچه تعامل شما با مدل را مدیریت می‌کند.

# Optional: Configure system instruction and initial messages
messages = [litert_lm.Message.system("You are a helpful assistant.")]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

ارسال پیام

شما می‌توانید پیام‌ها را به صورت همزمان یا غیرهمزمان (استریمینگ) ارسال کنید.

متدهای send_message و send_message_async مقادیر زیر را می‌پذیرند:

یک str (به طور خودکار به عنوان یک پیام کاربر بسته‌بندی می‌شود).
یک شیء litert_lm.Contents (برای ورودی‌های چندوجهی).
یک شیء litert_lm.Message (برای ساختار کامل پیام).
یک شیء دیکشنری شبیه به json به عنوان ورودی قالب سریع.

مثال همگام:

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with a Message object
# response = conversation.send_message(litert_lm.Message.user("What is the capital of France?"))

مثال ناهمزمان (استریمینگ):

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

🔴 جدید: پیش‌بینی چند توکنی (MTP)

پیش‌بینی چند توکنی (MTP) یک بهینه‌سازی عملکرد است که سرعت رمزگشایی را به میزان قابل توجهی افزایش می‌دهد. MTP به طور جهانی برای همه وظایف در پشت صحنه GPU توصیه می‌شود.

برای استفاده از MTP، هنگام مقداردهی اولیه موتور، رمزگشایی حدسی را فعال کنید.

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU(),
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

چندوجهی

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU(),
    vision_backend=litert_lm.Backend.GPU(),
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message(
            litert_lm.Contents.of(
                "Describe this audio.",
                litert_lm.Content.AudioFile(absolute_path="/path/to/audio.wav"),
            )
        )
        print(response["content"][0]["text"])

تعریف و استفاده از ابزارها

شما می‌توانید توابع پایتون را به عنوان ابزارهایی تعریف کنید که مدل می‌تواند به طور خودکار آنها را فراخوانی کند.

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

LiteRT-LM از رشته سند و نکات نوع تابع برای تولید طرحواره ابزار برای مدل استفاده می‌کند.