LiteRT-LM Swift API

借助 LiteRT-LM 的 Swift API,您可以将大型语言模型原生集成到 iOS 和 macOS 应用中。**多模态** 、**工具使用** 和 **GPU 加速** (通过 Metal)等功能均受全面支持。

简介

以下示例展示了如何使用 Swift API 初始化模型并发送消息:

import LiteRTLM

// 1. Initialize the Engine with your model
let config = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu, // Use .cpu() for CPU execution
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: config)
try await engine.initialize()

// 2. Start a new Conversation
let conversation = try await engine.createConversation()

// 3. Send a message and print the response
let response = try await conversation.sendMessage(Message("What is the capital of France?"))
print(response.toString)

使用入门

本部分提供了有关如何将 LiteRT-LM Swift API 集成到应用中的说明。

Swift Package Manager (SPM)

您可以使用 Swift Package Manager 将 LiteRT-LM 集成到 Xcode 项目中。

  1. 在 Xcode 中打开项目,然后依次前往 File > Add Package Dependencies...
  2. 输入软件包仓库网址: https://github.com/google-ai-edge/LiteRT-LM
  3. 选择 LiteRTLM 库,将其添加到应用目标。

如果您使用 Package.swift 开发软件包,请将其添加到依赖项:

dependencies: [
  .package(url: "https://github.com/google-ai-edge/LiteRT-LM", from: "0.12.0")
]

核心 API 指南

本部分详细介绍了使用 LiteRT-LM Swift API 的基本组件和工作流,包括引擎初始化、对话管理和发送消息。

初始化引擎

Engine 可处理模型加载、资源分配和生命周期管理。

import LiteRTLM

let engineConfig = try EngineConfig(
  modelPath: "path/to/your/model.litertlm",
  backend: .gpu, // Use .gpu for Metal hardware acceleration
  maxNumTokens: 512, // Size of the KV-cache
  cacheDir: NSTemporaryDirectory() // Writable directory for compilation cache
)

let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

创建对话

Conversation 可管理聊天记录、系统指令和采样器配置。

// Configure custom sampling parameters
let samplerConfig = try SamplerConfig(
  topK: 40,
  topP: 0.95,
  temperature: 0.7
)

// Create the conversation config with system instructions
let config = ConversationConfig(
  systemMessage: Message("You are a helpful assistant."),
  samplerConfig: samplerConfig
)

let conversation = try await engine.createConversation(with: config)

发送消息

您可以同步或异步(流式)与模型互动。

同步示例

let response = try await conversation.sendMessage(Message("Hello!"))
print(response.toString)

异步(流式)示例

let message = Message("Tell me a long story.")

for try await chunk in conversation.sendMessageStream(message) {
  // Output response chunks in real-time
  print(chunk.toString, terminator: "")
}
print()

多模态

如需使用视觉或音频功能,请务必在引擎初始化期间配置专用后端。

let engineConfig = try EngineConfig(
  modelPath: "path/to/multimodal_model.litertlm",
  backend: .gpu,
  visionBackend: .cpu(), // Enable CPU vision executor
  audioBackend: .cpu(), // Enable CPU audio executor
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

图片输入(视觉)

以路径或原始字节的形式提供图片:

let imagePath = Bundle.main.path(forResource: "scenery", ofType: "jpg")!

let message = Message(contents: [
  Content.imageFile(imagePath),
  Content.text("Describe this image.")
])

let response = try await conversation.sendMessage(message)
print(response.toString)

音频输入

提供音频路径:

let audioPath = Bundle.main.path(forResource: "recording", ofType: "wav")!

let message = Message(contents: [
  Content.audioFile(audioPath),
  Content.text("Transcribe this recording.")
])

let response = try await conversation.sendMessage(message)
print(response.toString)

🔴 新功能:多令牌预测 (MTP)

多令牌预测 (MTP) 是一种性能优化,可显著加快解码速度。对于使用 GPU/Metal 后端的所有任务,我们都建议使用此功能。

如需使用 MTP,请先在实验性标志中启用推测性解码,然后再初始化引擎。

import LiteRTLM

// Opt into experimental APIs to configure MTP
ExperimentalFlags.optIntoExperimentalAPIs()
ExperimentalFlags.enableSpeculativeDecoding = true

let engineConfig = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu,
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: engineConfig)
try await engine.initialize()

定义和使用工具

您可以将 Swift 结构定义为模型可以自动调用的工具,以执行逻辑。

  1. 遵循 Tool 协议。
  2. 使用 @ToolParam 属性封装容器声明参数。
  3. 实现 run() 方法。
import LiteRTLM

// 1. Define your custom tool
struct GetCurrentWeatherTool: Tool {
  static let name = "get_current_weather"
  static let description = "Get the current weather for a location."

  @ToolParam(description: "The city and state, e.g. San Francisco, CA")
  var location: String

  @ToolParam(description: "The temperature unit to use (celsius or fahrenheit)")
  var unit: String = "celsius"

  func run() async throws -> Any {
    // Call your weather API here
    return [
      "location": location,
      "temperature": "22",
      "unit": unit,
      "condition": "sunny"
    ]
  }
}

// 2. Register the tool in your conversation configuration
let config = ConversationConfig(
  tools: [GetCurrentWeatherTool()]
)

let conversation = try await engine.createConversation(with: config)

// 3. The model will invoke the tool automatically if needed
let response = try await conversation.sendMessage(Message("What is the weather in Paris right now?"))
print(response.toString)