Google AI Edge Portal 隆重推出：大規模基準測試 Edge AI。申請在非公開預先發布版期間要求存取權。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

LiteRT-LM 跨平台 C++ API

Conversation 是高階 API，代表與 LLM 的單一有狀態對話，建議大多數使用者從這裡開始。這項服務會在內部管理 Session，並處理複雜的資料處理工作。這些工作包括維護初始脈絡、管理工具定義、預先處理多模態資料，以及套用 Jinja 提示範本和角色型訊息格式。

Conversation API 工作流程

使用 Conversation API 的典型生命週期如下：

建立 Engine：使用模型路徑和設定初始化單一 Engine。這是保存模型權重的重量級物件。
建立 Conversation：使用 Engine 建立一或多個輕量型 Conversation 物件。
傳送訊息：使用 Conversation 物件的方法將訊息傳送至 LLM，並接收回覆，有效啟用類似對話的互動。

以下是傳送訊息及取得模型回覆的最簡單方式。建議在多數情況下使用。與 Gemini Chat API 類似。

SendMessage：這個封鎖呼叫會接收使用者輸入內容，並傳回完整的模型回覆。
SendMessageAsync：非封鎖呼叫，會透過回呼逐一將模型的回覆串流傳回。

以下是程式碼片段範例：

僅限文字內容

#include "runtime/engine/engine.h"

// ...

// 1. Define model assets and engine settings.
auto model_assets = ModelAssets::Create(model_path);
CHECK_OK(model_assets);

auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU);

// 2. Create the main Engine object.
absl::StatusOr<std::unique_ptr<Engine>> engine = Engine::CreateEngine(engine_settings);
CHECK_OK(engine);

// 3. Create a Conversation
auto conversation_config = ConversationConfig::CreateDefault(**engine);
CHECK_OK(conversation_config)
absl::StatusOr<std::unique_ptr<Conversation>> conversation = Conversation::Create(**engine, *conversation_config);
CHECK_OK(conversation);

// 4. Send message to the LLM with blocking call.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", "What is the tallest building in the world?"}
    });
CHECK_OK(model_message);

// 5. Print the model message.
std::cout << *model_message << std::endl;

// 6. Send message to the LLM with asynchronous call
// where CreatePrintMessageCallback is a users implemented callback that would
// process the message once a chunk of message output is received.
std::stringstream captured_output;
(*conversation)->SendMessageAsync(
    JsonMessage{
        {"role", "user"},
        {"content", "What is the tallest building in the world?"}
    },
    CreatePrintMessageCallback(std::stringstream& captured_output)
);
// Wait until asynchronous finish or timeout.
*engine->WaitUntilDone(absl::Seconds(10));

範例 CreatePrintMessageCallback

absl::AnyInvocable<void(absl::StatusOr<Message>)> CreatePrintMessageCallback(
    std::stringstream& captured_output) {
  return [&captured_output](absl::StatusOr<Message> message) {
    if (!message.ok()) {
      std::cout << message.status().message() << std::endl;
      return;
    }
    if (auto json_message = std::get_if<JsonMessage>(&(*message))) {
      if (json_message->is_null()) {
        std::cout << std::endl << std::flush;
        return;
      }
      ABSL_CHECK_OK(PrintJsonMessage(*json_message, captured_output,
                                     /*streaming=*/true));
    }
  };
}

absl::Status PrintJsonMessage(const JsonMessage& message,
                              std::stringstream& captured_output,
                              bool streaming = false) {
  if (message["content"].is_array()) {
    for (const auto& content : message["content"]) {
      if (content["type"] == "text") {
        captured_output << content["text"].get<std::string>();
        std::cout << content["text"].get<std::string>();
      }
    }
    if (!streaming) {
      captured_output << std::endl << std::flush;
      std::cout << std::endl << std::flush;
    } else {
      captured_output << std::flush;
      std::cout << std::flush;
    }
  } else if (message["content"]["text"].is_string()) {
    if (!streaming) {
      captured_output << message["content"]["text"].get<std::string>()
                      << std::endl
                      << std::flush;
      std::cout << message["content"]["text"].get<std::string>() << std::endl
                << std::flush;
    } else {
      captured_output << message["content"]["text"].get<std::string>()
                      << std::flush;
      std::cout << message["content"]["text"].get<std::string>() << std::flush;
    }
  } else {
    return absl::InvalidArgumentError("Invalid message: " + message.dump());
  }
  return absl::OkStatus();
}

🔴 新功能：多權杖預測 (MTP)

多權杖預測 (MTP) 是一項效能最佳化功能，可大幅提升解碼速度。建議您一律使用 MTP，在 GPU 後端執行所有工作。

如要使用 MTP，請在引擎設定的進階設定中啟用推測解碼。

// 1. Define model assets and engine settings.
auto model_assets = ModelAssets::Create(model_path);
CHECK_OK(model_assets);

auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::GPU);
CHECK_OK(engine_settings);

// 2. Enable MTP via speculative decoding in advanced settings.
litert::lm::AdvancedSettings advanced_settings;
advanced_settings.enable_speculative_decoding = true;
engine_settings->GetMutableMainExecutorSettings().SetAdvancedSettings(
    advanced_settings);

// 3. Create the main Engine object.
absl::StatusOr<std::unique_ptr<Engine>> engine = Engine::CreateEngine(
    *engine_settings);
CHECK_OK(engine);

// The same steps to create Conversation and send messages as above...

多模態資料內容

// To use multimodality, the engine must be created with vision and audio
// backend depending on the multimodality to be used
auto engine_settings = EngineSettings::CreateDefault(
    model_assets,
    /*backend=*/litert::lm::Backend::CPU,
    /*vision_backend*/litert::lm::Backend::GPU,
    /*audio_backend*/litert::lm::Backend::CPU,
);

// The same steps to create Engine and Conversation as above...

// Send message to the LLM with image data.
absl::StatusOr<Message> model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", { // Now content must be an array.
          {
            {"type", "text"}, {"text", "Describe the following image: "}
          },
          {
            {"type", "image"}, {"path", "/file/path/to/image.jpg"}
          }
        }},
    });
CHECK_OK(model_message);

// Print the model message.
std::cout << *model_message << std::endl;

// Send message to the LLM with audio data.
model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", { // Now content must be an array.
          {
            {"type", "text"}, {"text", "Transcribe the audio: "}
          },
          {
            {"type", "audio"}, {"path", "/file/path/to/audio.wav"}
          }
        }},
    });
CHECK_OK(model_message);

// Print the model message.
std::cout << *model_message << std::endl;

// The content can include multiple image or audio data.
model_message = (*conversation)->SendMessage(
    JsonMessage{
        {"role", "user"},
        {"content", { // Now content must be an array.
          {
            {"type", "text"}, {"text", "First briefly describe the two images "}
          },
          {
            {"type", "image"}, {"path", "/file/path/to/image1.jpg"}
          },
          {
            {"type", "text"}, {"text", "and "}
          },
          {
            {"type", "image"}, {"path", "/file/path/to/image2.jpg"}
          },
          {
            {"type", "text"}, {"text", " then transcribe the content in the audio"}
          },
          {
            {"type", "audio"}, {"path", "/file/path/to/audio.wav"}
          }
        }},
    });
CHECK_OK(model_message);

// Print the model message.
std::cout << *model_message << std::endl;

使用「對話」和「工具」

如要瞭解如何搭配使用 Conversation API 和工具，請參閱「進階用法」一文。

對話中的元件

Conversation 可視為使用者的委派對象，負責維護 Session 和複雜的資料處理作業，然後再將資料傳送至工作階段。

I/O 類型

Conversation API 的核心輸入和輸出格式為 Message。目前，這項功能是以 JsonMessage 實作，這是 ordered_json 的型別別名，屬於彈性的巢狀鍵/值資料結構。

Conversation API 的運作方式是輸入訊息、輸出訊息，模擬一般聊天體驗。Message 的彈性可讓使用者根據特定提示範本或 LLM 模型，視需要加入任意欄位，使 LiteRT-LM 支援各種模型。

雖然沒有單一嚴格的標準，但大多數提示範本和模型都預期 Message 會遵循與 Gemini API 內容或 OpenAI 訊息結構類似的慣例。

Message 必須包含 role，代表訊息的傳送者。content 可以是簡單的文字字串。

{
  "role": "model", // Represent who the message is sent from.
  "content": "Hello World!" // Naive text only content.
}

如果是多模態資料輸入，content 是 part 的清單。同樣地，part 並非預先定義的資料結構，而是排序的鍵/值組合資料類型。具體欄位取決於提示範本和模型預期提供的內容。

{
  "role": "user",
  "content": [  // Multimodal content.
    // Now the content is composed of parts
    {
      "type": "text",
      "text": "Describe the image in details: "
    },
    {
      "type": "image",
      "path": "/path/to/image.jpg"
    }
  ]
}

對於多模態 part，我們支援由 data_utils.h 處理的下列格式：

{
  "type": "text",
  "text": "this is a text"
}

{
  "type": "image",
  "path": "/path/to/image.jpg"
}

{
  "type": "image",
  "blob": "base64 encoded image bytes as string",
}

{
  "type": "audio",
  "path": "/path/to/audio.wav"
}

{
  "type": "audio",
  "blob": "base64 encoded audio bytes as string",
}

提示範本

為維持變體模型的彈性，PromptTemplate 會實作 Minja 的精簡包裝函式。Minja 是 Jinja 範本引擎的 C++ 實作項目，可處理 JSON 輸入內容，生成格式化提示。

Jinja 範本引擎是廣為採用的 LLM 提示範本格式。範例如下：

Jinja 範本引擎格式必須嚴格符合指令微調模型預期的結構。通常模型發布時會附上標準 Jinja 範本，確保模型使用方式正確無誤。

模型使用的 Jinja 範本會由模型檔案中繼資料提供。

注意：格式不正確可能會導致提示出現細微變化，進而大幅降低模型效能。如「Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting」一文所述

前言

Preface 設定對話的初始情境。這可能包括初始訊息、工具定義，以及 LLM 啟動互動所需的任何其他背景資訊。這項功能與 Gemini API system instruction 和 Gemini API Tools 類似

Preface 包含下列欄位

messages序言中的訊息。這些訊息提供對話的初始背景資訊。例如對話記錄、提示工程系統指令、少樣本範例等。
tools模型可在對話中使用的工具。工具的格式同樣不固定，但大多遵循 Gemini API FunctionDeclaration。
extra_context 額外脈絡，可讓模型擴充功能自訂啟動對話所需的脈絡資訊。例如：
- enable_thinking，適用於具有思考模式的模型，例如 Qwen3 或 SmolLM3-3B。

提供初始系統指令、工具，並停用思考模式的序言範例。

Preface preface = JsonPreface({
  .messages = {
      {"role", "system"},
      {"content", {"You are a model that can do function calling."}}
    },
  .tools = {
    {
      {"name", "get_weather"},
      {"description", "Returns the weather for a given location."},
      {"parameters", {
        {"type", "object"},
        {"properties", {
          {"location", {
            {"type", "string"},
            {"description", "The location to get the weather for."}
          }}
        }},
        {"required", {"location"}}
      }}
    },
    {
      {"name", "get_stock_price"},
      {"description", "Returns the stock price for a given stock symbol."},
      {"parameters", {
        {"type", "object"},
        {"properties", {
          {"stock_symbol", {
            {"type", "string"},
            {"description", "The stock symbol to get the price for."}
          }}
        }},
        {"required", {"stock_symbol"}}
      }}
    }
  },
  .extra_context = {
    {"enable_thinking": false}
  }
});

記錄

對話會維護工作階段中所有訊息的清單。這項記錄對於提示範本的算繪作業至關重要，因為 Jinja 提示範本通常需要完整的對話記錄，才能為 LLM 生成正確的提示。

不過，LiteRT-LM Session 是有狀態的，也就是說，它會逐步處理輸入內容。為彌補這項落差，「對話」會將指令範本算繪兩次，藉此生成必要的增量提示：一次是使用前一輪的記錄，另一次則包含目前的訊息。比較這兩個算繪的提示，並擷取要傳送至 Session 的新部分。

ConversationConfig

ConversationConfig 用於初始化 Conversation 例項。您可以透過下列幾種方式建立這項設定：

從 Engine：這個方法會使用與引擎相關聯的預設 SessionConfig。
從特定 SessionConfig：可更精細地控管工作階段設定。

除了工作階段設定，您還可以在 ConversationConfig 中進一步自訂 Conversation 行為。包括：

提供 Preface。
覆寫預設的 PromptTemplate。
覆寫預設的 DataProcessorConfig。

這些覆寫功能特別適合用於微調模型，因為這類模型可能需要與衍生來源基礎模型不同的設定或提示範本。

MessageCallback

MessageCallback 是回呼函式，使用者應在採用非同步 SendMessageAsync 方法時實作此函式。

回呼簽章為 absl::AnyInvocable<void(absl::StatusOr<Message>)>。在下列情況下，系統會觸發這項函式：

從模型收到新的 Message 區塊時。
如果 LiteRT-LM 處理訊息時發生錯誤。
LLM 推論完成後，系統會觸發回呼，並傳送空白的 Message (例如 JsonMessage())，表示回應結束。

如需實作範例，請參閱步驟 6 的非同步呼叫。

注意：回呼函式收到的 Message 僅包含模型的最新輸出內容，不含完整訊息記錄。

舉例來說，如果從封鎖 SendMessage 呼叫預期的完整模型回應為：

{
  "role": "model",
  "content": [
    "type": "text",
    "text": "Hello World!"
  ]
}

SendMessageAsync 中的回呼可能會多次叫用，每次都會提供後續的文字片段：

// 1st Message
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "He"
  ]
}

// 2nd Message
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "llo"
  ]
}

// 3rd Message
{
  "role": "model",
  "content": [
    "type": "text",
    "text": " Wo"
  ]
}

// 4th Message
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "rl"
  ]
}

// 5th Message
{
  "role": "model",
  "content": [
    "type": "text",
    "text": "d!"
  ]
}

如果非同步串流期間需要完整的回應，實作者必須負責累積這些區塊。或者，非同步呼叫完成後，您也可以在 History 的最後一個項目中查看完整的回應。

進階用法

受限解碼

LiteRT-LM 支援受限解碼，可對模型輸出內容強制執行特定結構，例如 JSON 結構定義、Regex 模式或文法規則。

如要啟用這項功能，請在 ConversationConfig 中設定 EnableConstrainedDecoding(true)，並提供 ConstraintProviderConfig (例如 LlGuidanceConfig，支援 regex/JSON/文法)。然後，透過 SendMessage 中的 OptionalArgs 傳遞限制。

範例：規則運算式限制

LlGuidanceConstraintArg constraint_arg;
constraint_arg.constraint_type = LlgConstraintType::kRegex;
constraint_arg.constraint_string = "a+b+"; // Force output to match this regex

auto response = conversation->SendMessage(
    user_message,
    {.decoding_constraint = constraint_arg}
);

如需完整詳細資料，包括 JSON 結構定義和 Lark 文法支援，請參閱受限解碼說明文件。

使用工具

透過工具呼叫，LLM 可以要求執行用戶端函式。您可以在對話的 Preface 中定義工具，並依名稱為工具加上索引鍵。模型輸出工具呼叫時，您會擷取該呼叫、在應用程式中執行對應函式，然後將結果傳回模型。

高階流程：

宣告工具：在 Preface JSON 中定義工具 (名稱、說明、參數)。
偵測通話：檢查回應中的 model_message["tool_calls"]。
執行：針對要求的工具執行應用程式邏輯。
回覆：傳送含有工具輸出內容的 role: "tool" 訊息給模型。

如需完整詳細資料和完整的對話迴圈範例，請參閱工具使用說明文件。