LiteRT-LM Swift API

LiteRT-LM 的 Swift API 可讓您將大型語言模型原生整合至 iOS 和 macOS 應用程式。完全支援多模態工具使用GPU 加速 (透過 Metal) 等功能。

簡介

以下範例說明如何使用 Swift API 初始化模型並傳送訊息:

import LiteRTLM

// 1. Initialize the Engine with your model
let config = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu, // Use .cpu() for CPU execution
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: config)
try await engine.initialize()

// 2. Start a new Conversation
let conversation = try await engine.createConversation()

// 3. Send a message and print the response
let response = try await conversation.sendMessage(Message("What is the capital of France?"))
print(response.toString)

開始使用

本節提供相關操作說明,協助您將 LiteRT-LM Swift API 整合至應用程式。

Swift Package Manager (SPM)

您可以使用 Swift Package Manager,將 LiteRT-LM 整合至 Xcode 專案。

  1. 在 Xcode 中開啟專案,然後依序前往「File」(檔案) >「Add Package Dependencies…」(新增套件依附元件…)
  2. 輸入套件存放區網址: https://github.com/google-ai-edge/LiteRT-LM
  3. 選取 LiteRTLM 程式庫,將其新增至應用程式目標。

如果您使用 Package.swift 開發套件,請將其新增至依附元件:

dependencies: [
  .package(url: "https://github.com/google-ai-edge/LiteRT-LM", from: "0.12.0")
]

核心 API 指南

本節詳細說明使用 LiteRT-LM Swift API 的基本元件和工作流程,包括引擎初始化、對話管理和傳送訊息。

初始化引擎

Engine 會處理模型載入、資源分配和生命週期管理。

import LiteRTLM

let engineConfig = try EngineConfig(
  modelPath: "path/to/your/model.litertlm",
  backend: .gpu, // Use .gpu for Metal hardware acceleration
  maxNumTokens: 512, // Size of the KV-cache
  cacheDir: NSTemporaryDirectory() // Writable directory for compilation cache
)

let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

建立對話

Conversation 可管理對話記錄、系統指示和取樣器設定。

// Configure custom sampling parameters
let samplerConfig = try SamplerConfig(
  topK: 40,
  topP: 0.95,
  temperature: 0.7
)

// Create the conversation config with system instructions
let config = ConversationConfig(
  systemMessage: Message("You are a helpful assistant."),
  samplerConfig: samplerConfig
)

let conversation = try await engine.createConversation(with: config)

可傳送訊息

您可以同步或非同步 (串流) 與模型互動。

同步範例

let response = try await conversation.sendMessage(Message("Hello!"))
print(response.toString)

非同步 (串流) 範例

let message = Message("Tell me a long story.")

for try await chunk in conversation.sendMessageStream(message) {
  // Output response chunks in real-time
  print(chunk.toString, terminator: "")
}
print()

多模態

如要使用影像或音訊功能,請務必在引擎初始化期間設定專用後端。

let engineConfig = try EngineConfig(
  modelPath: "path/to/multimodal_model.litertlm",
  backend: .gpu,
  visionBackend: .cpu(), // Enable CPU vision executor
  audioBackend: .cpu(), // Enable CPU audio executor
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

輸入圖片 (Vision)

以路徑或原始位元組的形式提供圖片:

let imagePath = Bundle.main.path(forResource: "scenery", ofType: "jpg")!

let message = Message(contents: [
  Content.imageFile(imagePath),
  Content.text("Describe this image.")
])

let response = try await conversation.sendMessage(message)
print(response.toString)

音訊輸入

提供音訊路徑:

let audioPath = Bundle.main.path(forResource: "recording", ofType: "wav")!

let message = Message(contents: [
  Content.audioFile(audioPath),
  Content.text("Transcribe this recording.")
])

let response = try await conversation.sendMessage(message)
print(response.toString)

🔴 新功能:多權杖預測 (MTP)

多權杖預測 (MTP) 是一項效能最佳化功能,可大幅提升解碼速度。建議所有使用 GPU/Metal 後端的作業都採用這個方法。

如要使用 MTP,請先在實驗旗標中啟用推測解碼,再初始化引擎。

import LiteRTLM

// Opt into experimental APIs to configure MTP
ExperimentalFlags.optIntoExperimentalAPIs()
ExperimentalFlags.enableSpeculativeDecoding = true

let engineConfig = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu,
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

定義及使用工具

您可以將 Swift 結構體定義為模型可自動呼叫的工具,藉此執行邏輯。

  1. 遵守 Tool 通訊協定。
  2. 使用 @ToolParam 屬性包裝函式宣告參數。
  3. 實作 run() 方法。
import LiteRTLM

// 1. Define your custom tool
struct GetCurrentWeatherTool: Tool {
  static let name = "get_current_weather"
  static let description = "Get the current weather for a location."

  @ToolParam(description: "The city and state, e.g. San Francisco, CA")
  var location: String

  @ToolParam(description: "The temperature unit to use (celsius or fahrenheit)")
  var unit: String = "celsius"

  func run() async throws -> Any {
    // Call your weather API here
    return [
      "location": location,
      "temperature": "22",
      "unit": unit,
      "condition": "sunny"
    ]
  }
}

// 2. Register the tool in your conversation configuration
let config = ConversationConfig(
  tools: [GetCurrentWeatherTool()]
)

let conversation = try await engine.createConversation(with: config)

// 3. The model will invoke the tool automatically if needed
let response = try await conversation.sendMessage(Message("What is the weather in Paris right now?"))
print(response.toString)