LiteRT のご紹介: Google のオンデバイス AI 用の高性能ランタイム（旧称 TensorFlow Lite）です。

このページは Cloud Translation API によって翻訳されました。

手のランドマーク検出ガイド（iOS）

MediaPipe Hand Landmarker タスクを使用すると、画像内の手のランドマークを検出できます。ここでは、iOS アプリでハンドランドマークを使用する方法について説明します。この手順で説明するコードサンプルは GitHub で入手できます。

このタスクの機能、モデル、構成オプションの詳細については、概要をご覧ください。

サンプルコード

MediaPipe Tasks のサンプルコードは、iOS 向けの手のランドマークアプリの基本的な実装です。この例では、物理的な iOS デバイスのカメラを使用して、連続した動画ストリームで手のランドマークを検出します。このアプリは、デバイスのギャラリーにある画像や動画の手の特徴も検出できます。

このアプリは、独自の iOS アプリの開始点として使用できます。また、既存のアプリを変更する際にも参照できます。ハンドランドマークの例コードは GitHub でホストされています。

コードをダウンロードする

次の手順では、git コマンドラインツールを使用してサンプルコードのローカルコピーを作成する方法について説明します。

サンプルコードをダウンロードするには:

次のコマンドを使用して、Git リポジトリのクローンを作成します。
```
git clone https://github.com/google-ai-edge/mediapipe-samples
```
必要に応じて、スパースチェックアウトを使用するように Git インスタンスを構成して、Hand Landmarker サンプルアプリのファイルのみを取得します。
```
cd mediapipe-samples
git sparse-checkout init --cone
git sparse-checkout set examples/hand_landmarker/ios/
```

ローカルバージョンのサンプルコードを作成したら、MediaPipe タスクライブラリをインストールし、Xcode を使用してプロジェクトを開いてアプリを実行できます。手順については、iOS 用セットアップガイドをご覧ください。

主要コンポーネント

次のファイルには、Hand Landmarker サンプルアプリケーションの重要なコードが含まれています。

HandLandmarkerService.swift: Hand Landmarker を初期化し、モデル選択を処理し、入力データに対して推論を実行します。
CameraViewController.swift: ライブカメラフィード入力モードの UI を実装し、結果を可視化します。
MediaLibraryViewController.swift: 静止画像と動画ファイルの入力モードの UI を実装し、結果を可視化します。

セットアップ

このセクションでは、Hand Landmarker を使用するように開発環境とコードプロジェクトを設定する主な手順について説明します。プラットフォームバージョンの要件など、MediaPipe タスクを使用する開発環境の設定に関する一般的な情報については、iOS 用セットアップガイドをご覧ください。

依存関係

Hand Landmarker は MediaPipeTasksVision ライブラリを使用します。このライブラリは CocoaPods を使用してインストールする必要があります。このライブラリは Swift アプリと Objective-C アプリの両方に対応しており、言語固有の追加設定は必要ありません。

macOS に CocoaPods をインストールする手順については、CocoaPods インストールガイドをご覧ください。アプリに必要な Pod を使用して Podfile を作成する方法については、CocoaPods の使用をご覧ください。

次のコードを使用して、Podfile に MediaPipeTasksVision Pod を追加します。

target 'MyHandLandmarkerApp' do
  use_frameworks!
  pod 'MediaPipeTasksVision'
end

アプリに単体テストターゲットが含まれている場合は、Podfile の設定について詳しくは、iOS 用セットアップガイドをご覧ください。

モデル

MediaPipe Hand Landmarker タスクには、このタスクに対応したトレーニング済みモデルが必要です。Hand Landmarker で使用可能なトレーニング済みモデルの詳細については、タスクの概要のモデルセクションをご覧ください。

モデルを選択してダウンロードし、Xcode を使用してプロジェクトディレクトリに追加します。Xcode プロジェクトにファイルを追加する方法については、Xcode プロジェクト内のファイルとフォルダを管理するをご覧ください。

BaseOptions.modelAssetPath プロパティを使用して、アプリバンドルのモデルのパスを指定します。コード例については、次のセクションをご覧ください。

タスクを作成する

Hand Landmarker タスクを作成するには、いずれかの初期化子を呼び出します。HandLandmarker(options:) イニシャライザは、構成オプションの値を受け入れます。

カスタマイズされた構成オプションで初期化されたハンドランドマークが不要な場合は、HandLandmarker(modelPath:) イニシャライザを使用して、デフォルトオプションでハンドランドマークを作成できます。構成オプションの詳細については、構成の概要をご覧ください。

Hand Landmarker タスクは、静止画像、動画ファイル、ライブ動画ストリーミングの 3 つの入力データ型をサポートしています。デフォルトでは、HandLandmarker(modelPath:) は静止画像のタスクを初期化します。動画ファイルまたはライブ動画ストリームを処理するようにタスクを初期化する場合は、HandLandmarker(options:) を使用して動画またはライブ配信の実行モードを指定します。ライブ配信モードでは、ハンドランドマークがハンドランドマークの結果をデリゲートに非同期で配信できるように、handLandmarkerLiveStreamDelegate 構成オプションも追加する必要があります。

実行モードに対応するタブを選択して、タスクを作成し推論を実行する方法を確認します。

Swift

画像

import MediaPipeTasksVision

let modelPath = Bundle.main.path(forResource: "hand_landmarker",
                                      ofType: "task")

let options = HandLandmarkerOptions()
options.baseOptions.modelAssetPath = modelPath
options.runningMode = .image
options.minHandDetectionConfidence = minHandDetectionConfidence
options.minHandPresenceConfidence = minHandPresenceConfidence
options.minTrackingConfidence = minHandTrackingConfidence
options.numHands = numHands

let handLandmarker = try HandLandmarker(options: options)

動画

import MediaPipeTasksVision

let modelPath = Bundle.main.path(forResource: "hand_landmarker",
                                      ofType: "task")

let options = HandLandmarkerOptions()
options.baseOptions.modelAssetPath = modelPath
options.runningMode = .video
options.minHandDetectionConfidence = minHandDetectionConfidence
options.minHandPresenceConfidence = minHandPresenceConfidence
options.minTrackingConfidence = minHandTrackingConfidence
options.numHands = numHands

let handLandmarker = try HandLandmarker(options: options)

ライブ配信

import MediaPipeTasksVision

// Class that conforms to the `HandLandmarkerLiveStreamDelegate` protocol and
// implements the method that the hand landmarker calls once it finishes
// performing landmarks detection in each input frame.
class HandLandmarkerResultProcessor: NSObject, HandLandmarkerLiveStreamDelegate {

  func handLandmarker(
    _ handLandmarker: HandLandmarker,
    didFinishDetection result: HandLandmarkerResult?,
    timestampInMilliseconds: Int,
    error: Error?) {

    // Process the hand landmarker result or errors here.

  }
}

let modelPath = Bundle.main.path(
  forResource: "hand_landmarker",
  ofType: "task")

let options = HandLandmarkerOptions()
options.baseOptions.modelAssetPath = modelPath
options.runningMode = .liveStream
options.minHandDetectionConfidence = minHandDetectionConfidence
options.minHandPresenceConfidence = minHandPresenceConfidence
options.minTrackingConfidence = minHandTrackingConfidence
options.numHands = numHands

// Assign an object of the class to the `handLandmarkerLiveStreamDelegate`
// property.
let processor = HandLandmarkerResultProcessor()
options.handLandmarkerLiveStreamDelegate = processor

let handLandmarker = try HandLandmarker(options: options)

Objective-C

画像

@import MediaPipeTasksVision;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"hand_landmarker"
                                                      ofType:@"task"];

MPPHandLandmarkerOptions *options = [[MPPHandLandmarkerOptions alloc] init];
options.baseOptions.modelAssetPath = modelPath;
options.runningMode = MPPRunningModeImage;
options.minHandDetectionConfidence = minHandDetectionConfidence;
options.minHandPresenceConfidence = minHandPresenceConfidence;
options.minTrackingConfidence = minHandTrackingConfidence;
options.numHands = numHands;

MPPHandLandmarker *handLandmarker =
  [[MPPHandLandmarker alloc] initWithOptions:options error:nil];

動画

@import MediaPipeTasksVision;

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"hand_landmarker"
                                                      ofType:@"task"];

MPPHandLandmarkerOptions *options = [[MPPHandLandmarkerOptions alloc] init];
options.baseOptions.modelAssetPath = modelPath;
options.runningMode = MPPRunningModeVideo;
options.minHandDetectionConfidence = minHandDetectionConfidence;
options.minHandPresenceConfidence = minHandPresenceConfidence;
options.minTrackingConfidence = minHandTrackingConfidence;
options.numHands = numHands;

MPPHandLandmarker *handLandmarker =
  [[MPPHandLandmarker alloc] initWithOptions:options error:nil];

ライブ配信

@import MediaPipeTasksVision;

// Class that conforms to the `MPPHandLandmarkerLiveStreamDelegate` protocol
// and implements the method that the hand landmarker calls once it finishes
// performing landmarks detection in each input frame.

@interface APPHandLandmarkerResultProcessor : NSObject 

@end

@implementation APPHandLandmarkerResultProcessor

-   (void)handLandmarker:(MPPHandLandmarker *)handLandmarker
    didFinishDetectionWithResult:(MPPHandLandmarkerResult *)handLandmarkerResult
         timestampInMilliseconds:(NSInteger)timestampInMilliseconds
                           error:(NSError *)error {

    // Process the hand landmarker result or errors here.

}

@end

NSString *modelPath = [[NSBundle mainBundle] pathForResource:@"hand_landmarker"
                                                      ofType:@"task"];

MPPHandLandmarkerOptions *options = [[MPPHandLandmarkerOptions alloc] init];
options.baseOptions.modelAssetPath = modelPath;
options.runningMode = MPPRunningModeLiveStream;
options.minHandDetectionConfidence = minHandDetectionConfidence;
options.minHandPresenceConfidence = minHandPresenceConfidence;
options.minTrackingConfidence = minHandTrackingConfidence;
options.numHands = numHands;

// Assign an object of the class to the `handLandmarkerLiveStreamDelegate`
// property.
APPHandLandmarkerResultProcessor *processor =
  [APPHandLandmarkerResultProcessor new];
options.handLandmarkerLiveStreamDelegate = processor;

MPPHandLandmarker *handLandmarker =
  [[MPPHandLandmarker alloc] initWithOptions:options error:nil];

設定オプション

このタスクには、iOS アプリ用の次の構成オプションがあります。

オプション名	説明	値の範囲	デフォルト値
`running_mode`	タスクの実行モードを設定します。モードは次の 3 つです。 IMAGE: 単一画像入力のモード。動画: 動画のデコードされたフレームのモード。 LIVE_STREAM: カメラなどからの入力データのライブ配信モード。このモードでは、resultListener を呼び出して、結果を非同期で受信するリスナーを設定する必要があります。このモードでは、`handLandmarkerLiveStreamDelegate` は、手のマーカー検出結果を非同期で受信する `HandLandmarkerLiveStreamDelegate` を実装するクラスのインスタンスに設定する必要があります。	{`RunningMode.image, RunningMode.video, RunningMode.liveStream`}	`RunningMode.image`
`numHands`	ハンドランドマーク検出機能によって検出される手の最大数。	`Any integer > 0`	`1`
`minHandDetectionConfidence`	手の検出が成功と見なされるために必要な、手のひら検出モデルの信頼度の最小スコア。	`0.0 - 1.0`	`0.5`
`minHandPresenceConfidence`	手ランドマーク検出モデルの手の存在スコアの最小信頼度スコア。動画モードとライブ配信モードでは、手形モデルの手の存在の信頼スコアがこのしきい値を下回ると、ハンドランドマークが手のひら検出モデルをトリガーします。それ以外の場合は、軽量の手トラッキングアルゴリズムが、その後のランドマーク検出のために手の位置を決定します。	`0.0 - 1.0`	`0.5`
`minTrackingConfidence`	ハンドトラッキングが成功とみなされるための最小信頼スコア。これは、現在のフレームと最後のフレームの手の境界ボックスの IoU しきい値です。Hand Landmarker の動画モードとストリーミングモードでは、トラッキングに失敗すると、Hand Landmarker が手の検出をトリガーします。それ以外の場合、手検出はスキップされます。	`0.0 - 1.0`	`0.5`
`result_listener`	ハンドランドマークがライブ配信モードのときに検出結果を非同期で受信するように結果リスナーを設定します。実行モードが `LIVE_STREAM` に設定されている場合にのみ適用されます。	なし	なし

実行モードがライブ配信に設定されている場合、Hand Landmarker には handLandmarkerLiveStreamDelegate 構成オプションが必要です。これにより、Hand Landmarker は手のマーカー検出結果を非同期で提供できます。デリゲートには handLandmarker(_:didFinishDetection:timestampInMilliseconds:error:) メソッドを実装する必要があります。このメソッドは、Hand Landmarker が各フレームの手マーカー検出結果を処理した後に呼び出されます。

オプション名	説明	値の範囲	デフォルト値
`handLandmarkerLiveStreamDelegate`	Hand Landmarker がライブ配信モードで手のアバター検出結果を非同期的に受信できるようにします。このプロパティにインスタンスが設定されているクラスは、`handLandmarker(_:didFinishDetection:timestampInMilliseconds:error:)` メソッドを実装する必要があります。	該当なし	未設定

データの準備

入力画像またはフレームを Hand Landmarker に渡す前に、MPImage オブジェクトに変換する必要があります。MPImage はさまざまな種類の iOS 画像形式をサポートしており、推論の実行モードで使用できます。MPImage の詳細については、MPImage API をご覧ください。

ユースケースと、アプリに必要な実行モードに基づいて iOS イメージ形式を選択します。MPImage は、UIImage、CVPixelBuffer、CMSampleBuffer の iOS イメージ形式を受け入れます。

UIImage

UIImage 形式は、次の実行モードに適しています。

画像: アプリバンドル、ユーザーギャラリー、または UIImage イメージとしてフォーマットされたファイルシステムの画像は、MPImage オブジェクトに変換できます。
動画: AVAssetImageGenerator を使用して動画フレームを CGImage 形式に抽出し、UIImage 画像に変換します。

Swift

// Load an image on the user's device as an iOS `UIImage` object.

// Convert the `UIImage` object to a MediaPipe's Image object having the default
// orientation `UIImage.Orientation.up`.
let image = try MPImage(uiImage: image)

Objective-C

// Load an image on the user's device as an iOS `UIImage` object.

// Convert the `UIImage` object to a MediaPipe's Image object having the default
// orientation `UIImageOrientationUp`.
MPImage *image = [[MPPImage alloc] initWithUIImage:image error:nil];

この例では、デフォルトの UIImage.Orientation.Up の向きで MPImage を初期化しています。MPImage は、サポートされている UIImage.Orientation 値のいずれかで初期化できます。ハンドランドマークは、.upMirrored、.downMirrored、.leftMirrored、.rightMirrored などのミラーリングされた向きをサポートしていません。

UIImage の詳細については、UIImage Apple デベロッパードキュメントをご覧ください。

CVPixelBuffer

CVPixelBuffer 形式は、フレームを生成し、iOS の CoreImage フレームワークを使用して処理するアプリケーションに適しています。

CVPixelBuffer 形式は、次の実行モードに適しています。

画像: iOS の CoreImage フレームワークを使用して処理を行った後に CVPixelBuffer 画像を生成するアプリは、画像実行モードでハンドランドマークに送信できます。
動画: 動画フレームは処理用に CVPixelBuffer 形式に変換し、動画モードでハンドランドマークに送信できます。
ライブ配信: iOS カメラを使用してフレームを生成するアプリは、ライブ配信モードでハンドランドマークに送信される前に、処理のために CVPixelBuffer 形式に変換される場合があります。

Swift

// Obtain a CVPixelBuffer.

// Convert the `CVPixelBuffer` object to a MediaPipe's Image object having the default
// orientation `UIImage.Orientation.up`.
let image = try MPImage(pixelBuffer: pixelBuffer)

Objective-C

// Obtain a CVPixelBuffer.

// Convert the `CVPixelBuffer` object to a MediaPipe's Image object having the
// default orientation `UIImageOrientationUp`.
MPImage *image = [[MPPImage alloc] initWithUIImage:image error:nil];

CVPixelBuffer の詳細については、CVPixelBuffer Apple デベロッパードキュメントをご覧ください。

CMSampleBuffer

CMSampleBuffer 形式は、統一されたメディアタイプのメディアサンプルを保存し、ライブ配信の実行モードに適しています。iOS カメラのライブフレームは、iOS の AVCaptureVideoDataOutput によって CMSampleBuffer 形式で非同期的に配信されます。

Swift

// Obtain a CMSampleBuffer.

// Convert the `CMSampleBuffer` object to a MediaPipe's Image object having the default
// orientation `UIImage.Orientation.up`.
let image = try MPImage(sampleBuffer: sampleBuffer)

Objective-C

// Obtain a `CMSampleBuffer`.

// Convert the `CMSampleBuffer` object to a MediaPipe's Image object having the
// default orientation `UIImageOrientationUp`.
MPImage *image = [[MPPImage alloc] initWithSampleBuffer:sampleBuffer error:nil];

CMSampleBuffer の詳細については、CMSampleBuffer Apple デベロッパードキュメントをご覧ください。

タスクを実行する

Hand Landmarker を実行するには、割り当てられた実行モードに固有の detect() メソッドを使用します。

静止画像: detect(image:)
動画: detect(videoFrame:timestampInMilliseconds:)
ライブ配信: detectAsync(image:timestampInMilliseconds:)

Swift

画像

let result = try handLandmarker.detect(image: image)

動画

let result = try handLandmarker.detect(
    videoFrame: image,
    timestampInMilliseconds: timestamp)

ライブ配信

try handLandmarker.detectAsync(
  image: image,
  timestampInMilliseconds: timestamp)

Objective-C

画像

MPPHandLandmarkerResult *result =
  [handLandmarker detectInImage:image error:nil];

動画

MPPHandLandmarkerResult *result =
  [handLandmarker detectInVideoFrame:image
             timestampInMilliseconds:timestamp
                               error:nil];

ライブ配信

BOOL success =
  [handLandmarker detectAsyncInImage:image
             timestampInMilliseconds:timestamp
                               error:nil];

Hand Landmarker のコードサンプルでは、これらの各モードの実装について詳しく説明しています。サンプルコードでは、ユーザーが処理モードを切り替えることができますが、ユースケースでは不要な場合があります。

次の点にご注意ください。

動画モードまたはライブ配信モードで実行する場合は、入力フレームのタイムスタンプを Hand Landmarker タスクに指定する必要があります。
画像モードまたは動画モードで実行する場合、Hand Landmarker タスクは、入力画像またはフレームの処理が完了するまで現在のスレッドをブロックします。現在のスレッドをブロックしないようにするには、iOS の Dispatch フレームワークまたは NSOperation フレームワークを使用して、バックグラウンドスレッドで処理を実行します。
ライブ配信モードで実行すると、Hand Landmarker タスクはすぐに返され、現在のスレッドはブロックされません。各入力フレームを処理した後、ハンドランドマークの結果を使用して handLandmarker(_:didFinishDetection:timestampInMilliseconds:error:) メソッドを呼び出します。Hand Landmarker は、専用のシリアルディスパッチキューでこのメソッドを非同期的に呼び出します。結果をユーザーインターフェースに表示するには、結果を処理した後に結果をメインキューにディスパッチします。Hand Landmarker タスクが別のフレームの処理でビジー状態になっているときに detectAsync 関数が呼び出されると、Hand Landmarker は新しい入力フレームを無視します。

結果を処理して表示する

推論を実行すると、Hand Landmarker タスクは、画像座標のハンドランドマーク、ワールド座標のハンドランドマーク、検出された手の利き手（左手/右手）を含む HandLandmarkerResult を返します。

このタスクの出力データの例を次に示します。

HandLandmarkerResult の出力には 3 つのコンポーネントが含まれます。各コンポーネントは配列で、各要素には検出された 1 つの手に関する次の結果が含まれます。

利き手

利き手は、検出された手が左手か右手かを表します。
ランドマーク

手に関するランドマークは 21 個あり、それぞれ x、y、z の座標で構成されています。x 座標と y 座標は、それぞれ画像の幅と高さで [0.0、1.0] に正規化されます。z 座標はランドマークの深さを表します。手首の深さが原点になります。値が小さいほど、ランドマークはカメラに近くなります。z の振幅は、x とほぼ同じスケールを使用します。
世界の名所

21 個の手のランドマークもワールド座標で表されます。各ランドマークは x、y、z で構成され、手形の幾何学的中心を原点として、メートル単位の現実世界の 3D 座標を表します。

HandLandmarkerResult:
  Handedness:
    Categories #0:
      index        : 0
      score        : 0.98396
      categoryName : Left
  Landmarks:
    Landmark #0:
      x            : 0.638852
      y            : 0.671197
      z            : -3.41E-7
    Landmark #1:
      x            : 0.634599
      y            : 0.536441
      z            : -0.06984
    ... (21 landmarks for a hand)
  WorldLandmarks:
    Landmark #0:
      x            : 0.067485
      y            : 0.031084
      z            : 0.055223
    Landmark #1:
      x            : 0.063209
      y            : -0.00382
      z            : 0.020920
    ... (21 world landmarks for a hand)

次の図は、タスク出力の可視化を示しています。

親指を立てた手の骨格構造がマッピングされた手