نقدّم لك Google AI Edge Portal: أداة لقياس أداء الذكاء الاصطناعي على الأجهزة الطرفية على نطاق واسع. الاشتراك لطلب الوصول أثناء فترة المعاينة الخاصة

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

LiteRT-LM Python API

واجهة برمجة التطبيقات Python الخاصة بـ LiteRT-LM لأنظمة التشغيل Linux وmacOS وWindows تتوفّر ميزات مثل الوسائط المتعددة واستخدام الأدوات وتسريع وحدة معالجة الرسومات ووحدة المعالجة العصبية.

مقدمة

إليك نموذجًا لتطبيق محادثة على الجهاز الطرفي تم إنشاؤه باستخدام Python API:

import litert_lm

litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

with litert_lm.Engine("path/to/model.litertlm") as engine:
  with engine.create_conversation() as conversation:
    while True:
      user_input = input("\n>>> ")
      for chunk in conversation.send_message_async(user_input):
        print(chunk["content"][0]["text"], end="", flush=True)

البدء

تتوفّر LiteRT-LM كمكتبة Python. يمكنك تثبيت الحزمة من PyPI باتّباع الخطوات التالية:

# Using pip
pip install litert-lm-api

# Using uv
uv pip install litert-lm-api

تهيئة المحرّك

‫Engine هي نقطة الدخول إلى واجهة برمجة التطبيقات. يتولّى هذا الصف معالجة تحميل النماذج وإدارة الموارد. يضمن استخدامها كأداة لإدارة السياق (مع عبارة with) تحرير الموارد على الفور.

ملاحظة: قد يستغرق تحميل النموذج عدة ثوانٍ عند بدء تشغيل المحرّك.

import litert_lm

# Initialize with the model path and optionally specify the backend.
# backend can be Backend.CPU() (default), Backend.GPU() or Backend.NPU().
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU(),
    # Optional: Pick a writable dir for caching compiled artifacts.
    # cache_dir="/tmp/litert-lm-cache"
) as engine:
    # ... Use the engine to create a conversation ...
    pass

إنشاء محادثة

يدير Conversation حالة وتاريخ تفاعلك مع النموذج.

# Optional: Configure system instruction and initial messages
messages = [litert_lm.Message.system("You are a helpful assistant.")]

# Create the conversation
with engine.create_conversation(messages=messages) as conversation:
    # ... Interact with the conversation ...
    pass

إرسال الرسائل

يمكنك إرسال الرسائل بشكل متزامن أو غير متزامن (بث مباشر).

تقبل طريقتَي الدفع send_message وsend_message_async ما يلي:

str (يتم تضمينها تلقائيًا كرسالة مستخدم).
عنصر litert_lm.Contents (للمدخلات المتعددة الوسائط)
عنصر litert_lm.Message (لبنية الرسالة الكاملة)
كائن قاموس يشبه JSON كإدخال لنموذج الطلب

مثال على التزامن:

# Simple string input
response = conversation.send_message("What is the capital of France?")
print(response["content"][0]["text"])

# Or with a Message object
# response = conversation.send_message(litert_lm.Message.user("What is the capital of France?"))

مثال على البث غير المتزامن:

# sendMessageAsync returns an iterator of response chunks
stream = conversation.send_message_async("Tell me a long story.")
for chunk in stream:
    # Chunks are dictionaries containing pieces of the response
    for item in chunk.get("content", []):
      if item.get("type") == "text":
        print(item["text"], end="", flush=True)
print()

🔴 ميزة جديدة: توقّع رموز مميّزة متعددة (MTP)

تُعدّ ميزة "توقّع الرموز المتعددة" (MTP) تحسينًا للأداء يؤدي إلى تسريع عمليات فك الترميز بشكل كبير. يُنصح باستخدام MTP بشكل عام لجميع المهام على الخلفيات التي تستخدم وحدة معالجة الرسومات (GPU) ذات الإصدار الأقدم من 0x0A.

لاستخدام MTP، فعِّل ميزة "فك الترميز التخميني" عند تهيئة المحرّك.

import litert_lm

# Enable MTP by setting enable_speculative_decoding=True
with litert_lm.Engine(
    "path/to/your/model.litertlm",
    backend=litert_lm.Backend.GPU(),
    enable_speculative_decoding=True,
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("What is the capital of France?")
        print(response["content"][0]["text"])

تعدُّد الوسائط

# Initialize with vision and/or audio backends if needed
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    audio_backend=litert_lm.Backend.CPU(),
    vision_backend=litert_lm.Backend.GPU(),
) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message(
            litert_lm.Contents.of(
                "Describe this audio.",
                litert_lm.Content.AudioFile(absolute_path="/path/to/audio.wav"),
            )
        )
        print(response["content"][0]["text"])

تحديد الأدوات واستخدامها

يمكنك تحديد دوال Python كأدوات يمكن للنموذج استدعاؤها تلقائيًا.

def add_numbers(a: float, b: float) -> float:
    """Adds two numbers.

    Args:
        a: The first number.
        b: The second number.
    """
    return a + b

# Register the tool in the conversation
tools = [add_numbers]
with engine.create_conversation(tools=tools) as conversation:
    # The model will call add_numbers automatically if it needs to sum values
    response = conversation.send_message("What is 123 + 456?")
    print(response["content"][0]["text"])

تستخدم LiteRT-LM السلسلة الوصفية للدالة وتلميحات الأنواع لإنشاء مخطط الأداة للنموذج.