隆重推出 Google AI Edge Portal：大规模对边缘 AI 进行基准测试。注册以在非公开预览期间申请访问权限。

适用于 Python 的对象检测指南

借助 MediaPipe 对象检测器任务，您可以检测多类对象的存在情况和位置。以下说明介绍了如何在 Python 中使用对象检测器任务。这些说明中介绍的代码示例可在 GitHub 上找到。

您可以查看Web 演示，了解此任务的实际运作方式。如需详细了解此任务的功能、模型和配置选项，请参阅概览。

代码示例

对象检测器示例代码提供了此任务在 Python 中的完整实现，供您参考。此代码可帮助您测试此任务，并开始构建自己的文本分类应用。您只需使用 Web 浏览器即可查看、运行和修改物体检测器示例代码。

如果您要为 Raspberry Pi 实现对象检测器，请参阅 Raspberry Pi 示例应用。

设置

本部分介绍了专门用于使用对象检测器的开发环境和代码项目设置的关键步骤。如需了解有关设置开发环境以使用 MediaPipe 任务的一般信息（包括平台版本要求），请参阅 Python 设置指南。

软件包

对象检测器任务需要 mediapipe pip 软件包。您可以使用以下命令安装所需的软件包：

$ python -m pip install mediapipe

导入

导入以下类以访问对象检测器任务函数：

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

型号

MediaPipe 对象检测器任务需要与此任务兼容的训练模型。如需详细了解适用于对象检测器的已训练模型，请参阅任务概览的“模型”部分。

选择并下载模型，然后将其存储在本地目录中：

model_path = '/absolute/path/to/lite-model_efficientdet_lite0_detection_metadata_1.tflite'

使用 BaseOptions 对象 model_asset_path 参数指定要使用的模型的路径。如需查看代码示例，请参阅下一部分。

创建任务

使用 create_from_options 函数创建任务。create_from_options 函数接受配置选项，包括运行模式、显示名称语言区域、结果数量上限、置信度阈值、类别许可名单和拒绝名单。如果您未设置配置选项，任务将使用默认值。如需详细了解配置选项，请参阅配置选项部分。

对象检测器任务支持多种输入数据类型：静态图片、视频文件和实时视频流。选择与输入数据类型对应的标签页，了解如何创建任务并运行推理。

Image

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
ObjectDetector = mp.tasks.vision.ObjectDetector
ObjectDetectorOptions = mp.tasks.vision.ObjectDetectorOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = ObjectDetectorOptions(
    base_options=BaseOptions(model_asset_path='/path/to/model.tflite'),
    max_results=5,
    running_mode=VisionRunningMode.IMAGE)

with ObjectDetector.create_from_options(options) as detector:
  # The detector is initialized. Use it here.
  # ...

视频

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
ObjectDetector = mp.tasks.vision.ObjectDetector
ObjectDetectorOptions = mp.tasks.vision.ObjectDetectorOptions
VisionRunningMode = mp.tasks.vision.RunningMode

options = ObjectDetectorOptions(
    base_options=BaseOptions(model_asset_path='/path/to/model.tflite'),
    max_results=5,
    running_mode=VisionRunningMode.VIDEO)

with ObjectDetector.create_from_options(options) as detector:
  # The detector is initialized. Use it here.
  # ...

直播

import mediapipe as mp

BaseOptions = mp.tasks.BaseOptions
DetectionResult = mp.tasks.components.containers.detections.DetectionResult
ObjectDetector = mp.tasks.vision.ObjectDetector
ObjectDetectorOptions = mp.tasks.vision.ObjectDetectorOptions
VisionRunningMode = mp.tasks.vision.RunningMode

def print_result(result: DetectionResult, output_image: mp.Image, timestamp_ms: int):
    print('detection result: {}'.format(result))

options = ObjectDetectorOptions(
    base_options=BaseOptions(model_asset_path='/path/to/model.tflite'),
    running_mode=VisionRunningMode.LIVE_STREAM,
    max_results=5,
    result_callback=print_result)

with ObjectDetector.create_from_options(options) as detector:
  # The detector is initialized. Use it here.
  # ...

如需查看有关创建要与图片搭配使用的对象检测器的完整示例，请参阅代码示例。

配置选项

此任务针对 Python 应用提供了以下配置选项：

选项名称	说明	值范围	默认值
`running_mode`	设置任务的运行模式。共有三种模式： IMAGE：适用于单张图片输入的模式。视频：视频的解码帧的模式。 LIVE_STREAM：输入数据（例如来自摄像头）的直播模式。在此模式下，必须调用 resultListener 以设置监听器以异步接收结果。	{`IMAGE, VIDEO, LIVE_STREAM`}	`IMAGE`
`display_names`	设置要为任务模型的元数据（如果有）中提供的显示名称使用的标签语言。默认值为 `en`（英语）。您可以使用 TensorFlow Lite Metadata Writer API 向自定义模型的元数据添加本地化标签	语言区域代码	en
`max_results`	设置可选的要返回的得分最高的检测结果的数量上限。	任何正数	-1（返回所有结果）
`score_threshold`	设置预测得分阈值，该阈值会替换模型元数据中提供的阈值（如果有）。低于此值的结果将被拒绝。	任何浮点数	未设置
`category_allowlist`	设置允许的类别名称的可选列表。如果不为空，则系统会滤除类别名称不在该集合中的检测结果。系统会忽略重复或未知的类别名称。此选项与 `category_denylist` 互斥，同时使用这两个选项会导致错误。	任何字符串	未设置
`category_denylist`	设置不允许的类别名称的可选列表。如果不为空，系统会滤除类别名称在此集合中的检测结果。系统会忽略重复或未知的类别名称。此选项与 `category_allowlist` 互斥，同时使用这两个选项会导致错误。	任何字符串	未设置

准备数据

将输入准备为图片文件或 NumPy 数组，然后将其转换为 mediapipe.Image 对象。如果输入是视频文件或来自摄像头的直播，您可以使用 OpenCV 等外部库将输入帧加载为 numpy 数组。

以下示例说明了如何为每种可用数据类型准备数据以供处理：

Image

import mediapipe as mp

# Load the input image from an image file.
mp_image = mp.Image.create_from_file('/path/to/image')

# Load the input image from a numpy array.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_image)

视频

import mediapipe as mp

# Use OpenCV’s VideoCapture to load the input video.

# Load the frame rate of the video using OpenCV’s CV_CAP_PROP_FPS
# You’ll need it to calculate the timestamp for each frame.

# Loop through each frame in the video using VideoCapture#read()

# Convert the frame received from OpenCV to a MediaPipe’s Image object.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_frame_from_opencv)

直播

import mediapipe as mp

# Use OpenCV’s VideoCapture to start capturing from the webcam.

# Create a loop to read the latest frame from the camera using VideoCapture#read()

# Convert the frame received from OpenCV to a MediaPipe’s Image object.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=numpy_frame_from_opencv)

运行任务

您可以调用其中一个检测函数来触发推理。对象检测器任务将返回在输入图片或帧中检测到的对象。

Image

# Perform object detection on the provided single image.
detection_result = detector.detect(mp_image)

视频

# Calculate the timestamp of the current frame
frame_timestamp_ms = 1000 * frame_index / video_file_fps

# Perform object detection on the video frame.
detection_result = detector.detect_for_video(mp_image, frame_timestamp_ms)

直播

# Send the latest frame to perform object detection.
# Results are sent to the `result_callback` provided in the `ObjectDetectorOptions`.
detector.detect_async(mp_image, frame_timestamp_ms)

如需查看在图片上运行对象检测器的完整示例，请参阅代码示例了解详情。

请注意以下几点：

在视频模式或直播模式下运行时，您还必须向对象检测器任务提供输入帧的时间戳。
在图片或视频模型中运行时，对象检测器任务会阻塞当前线程，直到其处理完输入图片或帧。
在直播模式下运行时，对象检测器任务不会阻塞当前线程，而是会立即返回。每当它处理完输入帧后，都会调用其结果监听器并传递检测结果。如果在对象检测器任务忙于处理其他帧时调用 detect 函数，系统会忽略新的输入帧。

处理和显示结果

运行推理后，对象检测器任务会返回一个 ObjectDetectionResult 对象，该对象描述了它在输入图片中发现的对象。

以下是此任务的输出数据示例：

ObjectDetectorResult:
 Detection #0:
  Box: (x: 355, y: 133, w: 190, h: 206)
  Categories:
   index       : 17
   score       : 0.73828
   class name  : dog
 Detection #1:
  Box: (x: 103, y: 15, w: 138, h: 369)
  Categories:
   index       : 17
   score       : 0.73047
   class name  : dog

下图显示了任务输出的可视化结果：

两个用边界框突出显示的狗

对象检测器示例代码演示了如何显示从任务返回的检测结果，如需了解详情，请参阅代码示例。