رابط برنامه‌نویسی کاربردی سویفت LiteRT-LM

رابط برنامه‌نویسی کاربردی Swift در LiteRT-LM به شما امکان می‌دهد مدل‌های زبانی بزرگ را به صورت بومی در برنامه‌های iOS و macOS ادغام کنید. ویژگی‌هایی مانند چندوجهی بودن ، استفاده از ابزار و شتاب‌دهی GPU (از طریق Metal) به طور کامل پشتیبانی می‌شوند.

مقدمه

در اینجا مثالی از استفاده از API سوئیفت برای مقداردهی اولیه یک مدل و ارسال پیام آورده شده است:

import LiteRTLM

// 1. Initialize the Engine with your model
let config = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu, // Use .cpu() for CPU execution
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: config)
try await engine.initialize()

// 2. Start a new Conversation
let conversation = try await engine.createConversation()

// 3. Send a message and print the response
let response = try await conversation.sendMessage(Message("What is the capital of France?"))
print(response.toString)

شروع به کار

این بخش دستورالعمل‌هایی در مورد نحوه ادغام API LiteRT-LM Swift در برنامه شما ارائه می‌دهد.

مدیر بسته سوئیفت (SPM)

شما می‌توانید LiteRT-LM را با استفاده از Swift Package Manager در پروژه Xcode خود ادغام کنید.

پروژه خود را در Xcode باز کنید و به مسیر File > Add Package Dependencies... بروید.
آدرس مخزن بسته را وارد کنید: https://github.com/google-ai-edge/LiteRT-LM
کتابخانه LiteRTLM را برای افزودن به هدف برنامه خود انتخاب کنید.

اگر در حال توسعه یک بسته با استفاده از Package.swift هستید، آن را به وابستگی‌های خود اضافه کنید:

dependencies: [
  .package(url: "https://github.com/google-ai-edge/LiteRT-LM", from: "0.12.0")
]

راهنمای API اصلی

این بخش، اجزای اساسی و گردش‌های کاری برای استفاده از LiteRT-LM Swift API، از جمله راه‌اندازی موتور، مدیریت مکالمه و ارسال پیام‌ها را شرح می‌دهد.

موتور را مقداردهی اولیه کنید

Engine ، بارگذاری مدل، تخصیص منابع و مدیریت چرخه حیات را مدیریت می‌کند.

import LiteRTLM

let engineConfig = try EngineConfig(
  modelPath: "path/to/your/model.litertlm",
  backend: .gpu, // Use .gpu for Metal hardware acceleration
  maxNumTokens: 512, // Size of the KV-cache
  cacheDir: NSTemporaryDirectory() // Writable directory for compilation cache
)

let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

ایجاد مکالمه

یک Conversation تاریخچه چت، دستورالعمل‌های سیستم و پیکربندی‌های نمونه‌برداری را مدیریت می‌کند.

// Configure custom sampling parameters
let samplerConfig = try SamplerConfig(
  topK: 40,
  topP: 0.95,
  temperature: 0.7
)

// Create the conversation config with system instructions
let config = ConversationConfig(
  systemMessage: Message("You are a helpful assistant."),
  samplerConfig: samplerConfig
)

let conversation = try await engine.createConversation(with: config)

ارسال پیام

شما می‌توانید به صورت همزمان یا غیرهمزمان (streaming) با مدل تعامل داشته باشید.

مثال همگام

let response = try await conversation.sendMessage(Message("Hello!"))
print(response.toString)

مثال ناهمزمان (استریمینگ)

let message = Message("Tell me a long story.")

for try await chunk in conversation.sendMessageStream(message) {
  // Output response chunks in real-time
  print(chunk.toString, terminator: "")
}
print()

چندوجهی

برای استفاده از ویژگی‌های بصری یا صوتی، حتماً در هنگام راه‌اندازی اولیه موتور، بک‌اندهای تخصصی را پیکربندی کنید.

let engineConfig = try EngineConfig(
  modelPath: "path/to/multimodal_model.litertlm",
  backend: .gpu,
  visionBackend: .cpu(), // Enable CPU vision executor
  audioBackend: .cpu(), // Enable CPU audio executor
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

ورودی تصویر (بینایی)

ارائه تصویر به صورت مسیر یا بایت‌های خام:

let imagePath = Bundle.main.path(forResource: "scenery", ofType: "jpg")!

let message = Message(contents: [
  Content.imageFile(imagePath),
  Content.text("Describe this image.")
])

let response = try await conversation.sendMessage(message)
print(response.toString)

ورودی صدا

یک مسیر صوتی ارائه دهید:

let audioPath = Bundle.main.path(forResource: "recording", ofType: "wav")!

let message = Message(contents: [
  Content.audioFile(audioPath),
  Content.text("Transcribe this recording.")
])

let response = try await conversation.sendMessage(message)
print(response.toString)

🔴 جدید: پیش‌بینی چند توکنی (MTP)

پیش‌بینی چند توکنی (MTP) یک بهینه‌سازی عملکرد است که سرعت رمزگشایی را به میزان قابل توجهی افزایش می‌دهد. این روش به طور جهانی برای همه کارهایی که از بک‌اندهای GPU/Metal استفاده می‌کنند، توصیه می‌شود.

برای استفاده از MTP، قبل از مقداردهی اولیه موتور، رمزگشایی حدسی را در پرچم‌های آزمایشی فعال کنید.

import LiteRTLM

// Opt into experimental APIs to configure MTP
ExperimentalFlags.optIntoExperimentalAPIs()
ExperimentalFlags.enableSpeculativeDecoding = true

let engineConfig = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu,
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

تعریف و استفاده از ابزارها

شما می‌توانید ساختارهای Swift را به عنوان ابزارهایی تعریف کنید که مدل می‌تواند به طور خودکار برای اجرای منطق فراخوانی کند.

مطابق با پروتکل Tool .
پارامترها را با استفاده از پوشش ویژگی @ToolParam تعریف کنید.
متد run() را پیاده‌سازی کنید.

import LiteRTLM

// 1. Define your custom tool
struct GetCurrentWeatherTool: Tool {
  static let name = "get_current_weather"
  static let description = "Get the current weather for a location."

  @ToolParam(description: "The city and state, e.g. San Francisco, CA")
  var location: String

  @ToolParam(description: "The temperature unit to use (celsius or fahrenheit)")
  var unit: String = "celsius"

  func run() async throws -> Any {
    // Call your weather API here
    return [
      "location": location,
      "temperature": "22",
      "unit": unit,
      "condition": "sunny"
    ]
  }
}

// 2. Register the tool in your conversation configuration
let config = ConversationConfig(
  tools: [GetCurrentWeatherTool()]
)

let conversation = try await engine.createConversation(with: config)

// 3. The model will invoke the tool automatically if needed
let response = try await conversation.sendMessage(Message("What is the weather in Paris right now?"))
print(response.toString)