Google AI Edge Portal のご紹介: エッジ AI を大規模にベンチマークします。限定公開プレビュー中にアクセスをリクエストするには、登録してください。

LiteRT コンパイラプラグイン

コンパイラプラグインを作成するタイミング

特定のハードウェアアクセラレータをコンパイラ依存関係とともに LiteRT フレームワークに統合する必要がある場合は、LiteRT コンパイラプラグインが必要です。

コンパイラプラグインは、次のような場合に作成する必要があります。

サポートされていない新しいハードウェアバックエンドをターゲットにしています。
パフォーマンスや電力効率のために、特定のモデルオペレーションをそのハードウェアアクセラレータにオフロードする。
AOT コンパイル（ワークステーション上）またはデバイス上コンパイルのサポートが必要である。

プラグインはブリッジとして機能し、ML モデルの一部を取得して、バックエンドのコンパイラへの呼び出しを使用して、ターゲットハードウェアで実行できる形式に変換します。LiteRT は、プラグインによって生成されたカスタムバイトコードを .tflite モデルにバンドルし、LiteRT ランタイムを使用して実行できるようにします。

コンパイラプラグインの仕組み

LiteRT フレームワークは、モデルの読み込みフェーズまたはオフラインの前処理フェーズでコンパイラプラグインを使用して、ターゲットハードウェアで実行するモデルサブグラフを特定して準備します。

このプロセスには、プラグインのエクスポートされた関数を使用してフレームワークによって調整される 2 つの主要なフェーズが含まれます。

パーティショニング: プラグインはモデルグラフ全体を検査し、サポートされているオペレーションのサブセットを特定して、ターゲットハードウェアで効率的に高速化します。サポートされているサブグラフは、コンパイル用に「パーティショニング」（マーク）され、アウトラインが作成されます。
コンパイル: LiteRT フレームワークは、分割されたサブグラフをプラグインに渡します。次に、プラグインは内部ロジックと、場合によっては外部ツールチェーン（コンパイラ）を使用して、パーティションを実装する 1 つ以上のハードウェア固有のバイトコードモジュールを生成します。このバイトコードは、最終的にターゲットハードウェアのランタイム（HAL/ドライバ）が読み込んで実行するものです。

フレームワークは、元のサブグラフをハードウェアドライバを呼び出すカスタムオペレーションに置き換え、プラグインによって作成されたコンパイル済みバイトコードを渡します。

LiteRT Dispatch は、コンパイラプラグインのランタイムアナログです。コンパイラ出力から HAL を呼び出す手段を提供します。詳細については、ディスパッチのドキュメントをご覧ください。

AOT とオンデバイス

LiteRT は、コンパイラプラグインを使用して、ツールによる AOT コンパイルとデバイス上でのコンパイルをサポートできます。デバイス上でのコンパイルはより柔軟で、LiteRT ランタイム API 内で完全に内部化され、単一のモデルの管理のみを必要とします。AOT フローは、リソース消費が大きすぎてデバイス上で実行できない場合にコンパイルをブロック解除できます。これは、多くの最新の大規模モデルに当てはまる可能性があります。

フォールバック

LiteRT は、異種グラフのサポートを前提に構築されています。プラグインで選択されていないオペレーションは、CPU に残されるか、別のバックエンドでの高速化に利用できるようになります。

コンパイラプラグインの実装

LiteRT コンパイラプラグインは、LiteRT C API で定義された特定の C 関数のセットをエクスポートする共有ライブラリとして実装されます。

Essential Interface Functions

コア機能は、LiteRtCompilerPluginPartition と LiteRtCompilerPluginCompile の 2 つの主要なコンパイルステップを中心に展開されます。

関数	目的
LiteRtCompilerPluginPartition	指定されたモデルサブグラフ内のサポートされているすべてのオペレーションを選択してマークします（Partition ステップ）。
LiteRtCompilerPluginCompile$	事前に選択されたパーティションのハードウェア固有のバイトコードを生成します（コンパイルステップ）。

C API スニペット

// Name associated with the manufacturer this plugin relates to.
LITERT_CAPI_EXPORT const char* LiteRtGetCompilerPluginSocManufacturer();

// Create and initialize the plugin instance.
LITERT_CAPI_EXPORT LiteRtStatus
LiteRtCreateCompilerPlugin(LiteRtCompilerPlugin* compiler_plugin,
                           LiteRtEnvironmentOptions env, LiteRtOptions options);

// Choose ops for compilation.
// This is the PARTITION step.
LITERT_CAPI_EXPORT LiteRtStatus LiteRtCompilerPluginPartition(
    LiteRtCompilerPlugin compiler_plugin, const char* soc_model,
    LiteRtSubgraph subgraph, LiteRtOpList selected_ops);

// Prepare result to pass to the runtime for given model containing partitioned
// subgraphs. This is the COMPILE step.
LITERT_CAPI_EXPORT LiteRtStatus LiteRtCompilerPluginCompile(
    LiteRtCompilerPlugin compiler_plugin, const char* soc_model,
    LiteRtModel partitions, LiteRtCompiledResult* compiled_result);

1. パーティション関数

関数のシグネチャは次のとおりです。

LITERT_CAPI_EXPORT LiteRtStatus LiteRtCompilerPluginPartition(
    LiteRtCompilerPlugin compiler_plugin, const char* soc_model,
    LiteRtSubgraph subgraph, LiteRtOpList selected_ops);

partition 関数の機能: これは選択フェーズです。プラグインは、入力 LiteRtSubgraph のオペレーションを反復処理します。ターゲットハードウェアがサポートし、高速化できるオペレーションごとに、プラグインは selected_ops パラメータで提供される LiteRtOpList$にそのオペレーションを追加します。LiteRt フレームワークは、このリストを使用して、最終コンパイルステップに送信されるパーティションの境界を定義します。

デフォルトでは、LiteRT は選択したすべてのオペレーションを可能な限り大きなサブ DAG にグループ化します。よりきめ細かいパーティショニングを行うために、これらのサブグラフをさらに分割する ops を選択するときにインデックスを関連付けることができます。

2. コンパイル関数

関数のシグネチャは次のとおりです。

LITERT_CAPI_EXPORT LiteRtStatus LiteRtCompilerPluginCompile(
    LiteRtCompilerPlugin compiler_plugin, const char* soc_model,
    LiteRtModel partitions, LiteRtCompiledResult* compiled_result);

compile 関数の機能: これは生成フェーズです。入力 partitions は、選択したすべてのサブグラフが分離されたモデルを表します。プラグインはこれらのパーティションを処理し、特定のツールチェーンを呼び出してターゲットハードウェアのバイトコードを生成します。プラグインの出力は、コンパイル用に渡された各サブグラフのエントリポイントを提供することが想定されています。ほとんどの場合、これは各入力サブグラフの個々のバイトコードモジュールか、複数のエントリポイントを持つ単一のバイトコードモジュールです。

compile が返すデータの型: LiteRtCompilerPluginCompile 関数は、out パラメータ LiteRtCompiledResult を使用して出力を返します。

LiteRtCompiledResult は、プラグインによって管理される構造体への（LiteRT に関して）不透明なハンドルです。これはコンパイルの出力を表し、次の 2 つの主要な情報が含まれています。

バイトコードモジュール: ハードウェア固有の実行可能バイトコード（コンパイルされた命令）を含む 1 つ以上の未加工のメモリバッファ。
通話情報: 各パーティションのメタデータ。これにより、i 番目の入力サブグラフから結果バイトコードモジュールと、そのモジュールへのエントリポイント識別子へのマッピングが提供されます。

導入事例

次のスニペットは、基本的なプラグインがコア関数を実装する方法を示しています。この例は、litert/vendors/examples/ の完全な機能の例から抜粋したものです。

プラグインの識別とセットアップ

これらの関数は、プラグインとハードウェアに関する基本的な情報をフレームワークに提供します。

// Define the plugin's internal state structure
struct LiteRtCompilerPluginT {};

// Identify the manufacturer
const char* LiteRtGetCompilerPluginSocManufacturer() {
  return "AcmeCorp"; // Example manufacturer name
}

// Specify the supported hardware (in this example, it supports kLiteRtHwAcceleratorNpu)
LiteRtStatus LiteRtGetCompilerPluginSupportedHardware(
    LiteRtCompilerPlugin compiler_plugin,
    LiteRtHwAccelerators* supported_hardware) {
  // ... argument checking ...
  *supported_hardware = kLiteRtHwAcceleratorNpu;
  return kLiteRtStatusOk;
}

パーティショニングロジック（`LiteRtCompilerPluginPartition`）

この例では、すべての入力と出力が 32 ビットの浮動小数点数である場合にのみ、プラグインが限定されたオペレーションセット（mul、sub、特定の複合オペレーション）を選択しています。通常、オペレーションを選択するかどうかを判断する際には、バックエンドのコンパイラツールチェーンで検証フックが呼び出されます。

LiteRtStatus LiteRtCompilerPluginPartition(LiteRtCompilerPlugin compiler_plugin,
                                          const char* soc_model,
                                          LiteRtSubgraph subgraph,
                                          LiteRtOpList selected_ops) {

  // Iterate over ops and check criteria for selection
  // (using a C++ wrapper namespace '::litert' for convenience).
  // `subgraph` is a single subgraph from the original model, as such
  // this function will be called for each subgraph in the original model.

  ::litert::Subgraph main_subgraph(subgraph);
  for (const auto& op : main_subgraph.Ops()) {
    // 1. Check a constraint: require all tensors to be Float32
    bool only_f32 = true;
    // ... logic to check input/output types ...
    if (!only_f32) {
      continue;
    }

    // 2. Check op codes and push to selected_ops list
    if (op.Code() == kLiteRtOpCodeTflMul) {
      LITERT_RETURN_IF_ERROR(LiteRtPushOp(selected_ops, op.Get(), 0));
    } else if (op.Code() == kLiteRtOpCodeTflSub) {
      LITERT_RETURN_IF_ERROR(LiteRtPushOp(selected_ops, op.Get(), 0));
    } else if (op.Code() == kLiteRtOpCodeShloComposite) {
      // Example of checking composite op options
      // ... logic to check for "odml.rms_norm" name ...
      LITERT_RETURN_IF_ERROR(LiteRtPushOp(selected_ops, op.Get(), 0));
    }
  }
  return kLiteRtStatusOk;
}

コンパイルを呼び出す前に、LiteRT は選択されたすべてのオペレーションを検証し、新しい中間モデルの新しいサブグラフに「アウトライン」します。この中間モデルがコンパイルに渡されます。

コンパイルロジック（`LiteRtCompilerPluginCompile`）

この関数は、パーティショニングされたサブグラフを取得し、カスタム LiteRtCompiledResult を生成します。この例では、コンパイルする各パーティションのスタンドアロンバイトコードモジュールを生成します。実際には、通常、LiteRT 演算を型に変換してバックエンドコンパイラライブラリに渡す処理が含まれます。機能的なサンプルプラグインの「コンパイル」では、グラフをエンコードする人間が読める文字列が作成されます。

// Internal structure defining the compiled output
struct LiteRtCompiledResultT {
  std::vector<std::string> byte_code;   // The hardware bytecode buffers
  std::vector<std::string> per_op_data; // Per-call metadata (CallInfo)
};

LiteRtStatus LiteRtCompilerPluginCompile(
    LiteRtCompilerPlugin compiler_plugin, const char* soc_model,
    LiteRtModel partitions, LiteRtCompiledResult* compiled_result) {

  // 1. Create the internal result structure
  auto model = litert::Model::CreateFromNonOwnedHandle(partitions);
  const auto num_partitions = model.NumSubgraphs();
  auto result = std::make_unique<LiteRtCompiledResultT>();
  result->byte_code.resize(num_partitions);
  result->per_op_data.resize(num_partitions);

  // 2. Iterate and compile each partition
  for (auto i = 0; i < num_partitions; ++i) {
    // CompileSinglePartition is an internal helper that converts the subgraph
    // into the target hardware's format and stores it in result->byte_code.
    // In the case of the example this is just a stringification of the graph.

    // ... internal call to CompileSinglePartition ...
    // Example: result.byte_code[i] = generated_hw_code;
    // Example: result.per_op_data[i] = absl::StrFormat("Partition_%d", i);

    // The "per_op_data" is a unique identifier associated to the `ith` partition.
    // This is analogous to the name of a function in a library.
    // This is only meaningful when the plugin is preparing single modules with multiple entry points.
  }

  // 3. Pass ownership of the result back to the framework
  *compiled_result = result.release();

  return kLiteRtStatusOk;
}

// Functions to expose the compiled result data to the framework
LiteRtStatus LiteRtGetCompiledResultByteCode(
    LiteRtCompiledResult compiled_result, LiteRtParamIndex byte_code_idx,
    const void** byte_code, size_t* byte_code_size) {
  // ... implementation reads from compiled_result->byte_code ...
}
// ... other LiteRtGetCompiledResult* functions ...

使用量と検証

LiteRT は、コンパイラプラグインをモデルファイルに適用し、結果を実行して検証/ベンチマークを行うためのさまざまなツールを提供します。アクセラレータテストスイートのドキュメントとベンチマークとプロファイリングのドキュメントを参照してください。

LiteRT コンパイラ プラグイン

コンパイラ プラグインを作成するタイミング

コンパイラ プラグインの仕組み