The MediaPipe Hand Landmarker task lets you detect the landmarks of the hands in an image.
You can use this task to locate key points of hands and render visual effects on
them. This task operates on image data with a machine learning (ML) model as
static data or a continuous stream and outputs hand landmarks in image
coordinates, hand landmarks in world coordinates and handedness(left/right hand)
of multiple detected hands.
Start using this task by following one of these implementation guides for your
target platform. These platform-specific guides walk you through a basic
implementation of this task, including a recommended model, and code example
with recommended configuration options:
This section describes the capabilities, inputs, outputs, and configuration
options of this task.
Features
Input image processing - Processing includes image rotation, resizing,
normalization, and color space conversion.
Score threshold - Filter results based on prediction scores.
Task inputs
Task outputs
The Hand Landmarker accepts an input of one of the following data types:
Still images
Decoded video frames
Live video feed
The Hand Landmarker outputs the following results:
Handedness of detected hands
Landmarks of detected hands in image coordinates
Landmarks of detected hands in world coordinates
Configurations options
This task has the following configuration options:
Option Name
Description
Value Range
Default Value
running_mode
Sets the running mode for the task. There are three
modes:
IMAGE: The mode for single image inputs.
VIDEO: The mode for decoded frames of a video.
LIVE_STREAM: The mode for a livestream of input
data, such as from a camera. In this mode, resultListener must be
called to set up a listener to receive results
asynchronously.
{IMAGE, VIDEO, LIVE_STREAM}
IMAGE
num_hands
The maximum number of hands detected by the Hand landmark detector.
Any integer > 0
1
min_hand_detection_confidence
The minimum confidence score for the hand detection to be
considered successful in palm detection model.
0.0 - 1.0
0.5
min_hand_presence_confidence
The minimum confidence score for the hand presence score in the hand
landmark detection model. In Video mode and Live stream mode,
if the hand presence confidence score from the hand landmark model is below
this threshold, Hand Landmarker triggers the palm detection model. Otherwise, a
lightweight hand tracking algorithm determines the location of
the hand(s) for subsequent landmark detections.
0.0 - 1.0
0.5
min_tracking_confidence
The minimum confidence score for the hand tracking to be considered
successful. This is the bounding box IoU threshold between hands in the
current frame and the last frame. In Video mode and Stream mode of
Hand Landmarker, if the tracking fails, Hand Landmarker triggers hand
detection. Otherwise, it skips the hand detection.
0.0 - 1.0
0.5
result_callback
Sets the result listener to receive the detection results
asynchronously when the hand landmarker is in live stream mode.
Only applicable when running mode is set to LIVE_STREAM
N/A
N/A
Models
The Hand Landmarker uses a model bundle with two packaged models: a palm detection
model and a hand landmarks detection model. You need a model bundle that
contains both these models to run this task.
The hand landmark model bundle detects the keypoint localization of 21
hand-knuckle coordinates within the detected hand regions. The model was trained
on approximately 30K real-world images, as well as several rendered synthetic
hand models imposed over various backgrounds.
The hand landmarker model bundle contains a palm detection model and a hand
landmarks detection model. The Palm detection model locates hands within the
input image, and the hand landmarks detection model identifies specific hand
landmarks on the cropped hand image defined by the palm detection model.
Since running the palm detection model is time consuming, when in video or live
stream running mode, Hand Landmarker uses the bounding box defined by the hand
landmarks model in one frame to localize the region of hands for subsequent
frames. Hand Landmarker only re-triggers the palm detection model if the hand
landmarks model no longer identifies the presence of hands or fails to track the
hands within the frame. This reduces the number of times Hand Landmarker tiggers
the palm detection model.
Task benchmarks
Here's the task benchmarks for the whole pipeline based on the above pre-trained
models. The latency result is the average latency on Pixel 6 using CPU / GPU.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-01-13 UTC."],[],[],null,["# Hand landmarks detection guide\n\nThe MediaPipe Hand Landmarker task lets you detect the landmarks of the hands in an image.\nYou can use this task to locate key points of hands and render visual effects on\nthem. This task operates on image data with a machine learning (ML) model as\nstatic data or a continuous stream and outputs hand landmarks in image\ncoordinates, hand landmarks in world coordinates and handedness(left/right hand)\nof multiple detected hands.\n\n[Try it!arrow_forward](https://mediapipe-studio.webapps.google.com/demo/hand_landmarker)\n\nGet Started\n-----------\n\nStart using this task by following one of these implementation guides for your\ntarget platform. These platform-specific guides walk you through a basic\nimplementation of this task, including a recommended model, and code example\nwith recommended configuration options:\n\n- **Android** - [Code\n example](https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/hand_landmarker/android)\n - [Guide](./android)\n- **Python** - [Code\n example](https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/hand_landmarker/python/hand_landmarker.ipynb)\n - [Guide](./python)\n- **Web** - [Code example](https://codepen.io/mediapipe-preview/pen/gOKBGPN) - [Guide](./web_js)\n\nTask details\n------------\n\nThis section describes the capabilities, inputs, outputs, and configuration\noptions of this task.\n\n### Features\n\n- **Input image processing** - Processing includes image rotation, resizing, normalization, and color space conversion.\n- **Score threshold** - Filter results based on prediction scores.\n\n| Task inputs | Task outputs |\n|----------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| The Hand Landmarker accepts an input of one of the following data types: - Still images \u003c!-- --\u003e - Decoded video frames \u003c!-- --\u003e - Live video feed | The Hand Landmarker outputs the following results: - Handedness of detected hands \u003c!-- --\u003e - Landmarks of detected hands in image coordinates \u003c!-- --\u003e - Landmarks of detected hands in world coordinates |\n\n### Configurations options\n\nThis task has the following configuration options:\n\n| Option Name | Description | Value Range | Default Value |\n|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|---------------|\n| `running_mode` | Sets the running mode for the task. There are three modes: \u003cbr /\u003e IMAGE: The mode for single image inputs. \u003cbr /\u003e VIDEO: The mode for decoded frames of a video. \u003cbr /\u003e LIVE_STREAM: The mode for a livestream of input data, such as from a camera. In this mode, resultListener must be called to set up a listener to receive results asynchronously. | {`IMAGE, VIDEO, LIVE_STREAM`} | `IMAGE` |\n| `num_hands` | The maximum number of hands detected by the Hand landmark detector. | `Any integer \u003e 0` | `1` |\n| `min_hand_detection_confidence` | The minimum confidence score for the hand detection to be considered successful in palm detection model. | `0.0 - 1.0` | `0.5` |\n| `min_hand_presence_confidence` | The minimum confidence score for the hand presence score in the hand landmark detection model. In Video mode and Live stream mode, if the hand presence confidence score from the hand landmark model is below this threshold, Hand Landmarker triggers the palm detection model. Otherwise, a lightweight hand tracking algorithm determines the location of the hand(s) for subsequent landmark detections. | `0.0 - 1.0` | `0.5` |\n| `min_tracking_confidence` | The minimum confidence score for the hand tracking to be considered successful. This is the bounding box IoU threshold between hands in the current frame and the last frame. In Video mode and Stream mode of Hand Landmarker, if the tracking fails, Hand Landmarker triggers hand detection. Otherwise, it skips the hand detection. | `0.0 - 1.0` | `0.5` |\n| `result_callback` | Sets the result listener to receive the detection results asynchronously when the hand landmarker is in live stream mode. Only applicable when running mode is set to `LIVE_STREAM` | N/A | N/A |\n\nModels\n------\n\nThe Hand Landmarker uses a model bundle with two packaged models: a palm detection\nmodel and a hand landmarks detection model. You need a model bundle that\ncontains both these models to run this task.\n| **Attention:** This MediaPipe Solutions Preview is an early release. [Learn more](/edge/mediapipe/solutions/about#notice).\n\n| Model name | Input shape | Quantization type | Model Card | Versions |\n|----------------------------------------------------------------------------------------------------------------------------------------------|----------------------|-------------------|-----------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|\n| [HandLandmarker (full)](https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.task) | 192 x 192, 224 x 224 | float 16 | [info](https://storage.googleapis.com/mediapipe-assets/Model%20Card%20Hand%20Tracking%20(Lite_Full)%20with%20Fairness%20Oct%202021.pdf) | [Latest](https://storage.googleapis.com/mediapipe-models/hand_landmarker/hand_landmarker/float16/latest/hand_landmarker.task) |\n\nThe hand landmark model bundle detects the keypoint localization of 21\nhand-knuckle coordinates within the detected hand regions. The model was trained\non approximately 30K real-world images, as well as several rendered synthetic\nhand models imposed over various backgrounds.\n\nThe hand landmarker model bundle contains a palm detection model and a hand\nlandmarks detection model. The Palm detection model locates hands within the\ninput image, and the hand landmarks detection model identifies specific hand\nlandmarks on the cropped hand image defined by the palm detection model.\n\nSince running the palm detection model is time consuming, when in video or live\nstream running mode, Hand Landmarker uses the bounding box defined by the hand\nlandmarks model in one frame to localize the region of hands for subsequent\nframes. Hand Landmarker only re-triggers the palm detection model if the hand\nlandmarks model no longer identifies the presence of hands or fails to track the\nhands within the frame. This reduces the number of times Hand Landmarker tiggers\nthe palm detection model.\n\nTask benchmarks\n---------------\n\nHere's the task benchmarks for the whole pipeline based on the above pre-trained\nmodels. The latency result is the average latency on Pixel 6 using CPU / GPU.\n\n| Model Name | CPU Latency | GPU Latency |\n|-----------------------|-------------|-------------|\n| HandLandmarker (full) | 17.12ms | 12.27ms |"]]