The MediaPipe Python framework grants direct access to the core components of the MediaPipe C++ framework such as Timestamp, Packet, and CalculatorGraph, whereas the ready-to-use Python solutions hide the technical details of the framework and simply return the readable model inference results back to the callers.
MediaPipe framework sits on top of the pybind11 library. The C++ core framework is exposed in Python via a C++/Python language binding. The content below assumes that the reader already has a basic understanding of the MediaPipe C++ framework. Otherwise, you can find useful information in Framework Concepts.
Packet
The packet is the basic data flow unit in MediaPipe. A packet consists of a
numeric timestamp and a shared pointer to an immutable payload. In Python, a
MediaPipe packet can be created by calling one of the packet creator methods in
the
mp.packet_creator
module. Correspondingly, the packet payload can be retrieved by using one of the
packet getter methods in the
mp.packet_getter
module. Note that the packet payload becomes immutable after packet
creation. Thus, the modification of the retrieved packet content doesn't affect
the actual payload in the packet. MediaPipe framework Python API supports the
most commonly used data types of MediaPipe (e.g., ImageFrame, Matrix, Protocol
Buffers, and the primitive data types) in the core binding. The comprehensive
table below shows the type mappings between the Python and the C++ data type
along with the packet creator and the content getter method for each data type
supported by the MediaPipe Python framework API.
Python Data Type | C++ Data Type | Packet Creator | Content Getter |
---|---|---|---|
bool | bool | create_bool(True) | get_bool(packet) |
int or np.intc | int_t | create_int(1) | get_int(packet) |
int or np.int8 | int8_t | create_int8(2**7-1) | get_int(packet) |
int or np.int16 | int16_t | create_int16(2**15-1) | get_int(packet) |
int or np.int32 | int32_t | create_int32(2**31-1) | get_int(packet) |
int or np.int64 | int64_t | create_int64(2**63-1) | get_int(packet) |
int or np.uint8 | uint8_t | create_uint8(2**8-1) | get_uint(packet) |
int or np.uint16 | uint16_t | create_uint16(2**16-1) | get_uint(packet) |
int or np.uint32 | uint32_t | create_uint32(2**32-1) | get_uint(packet) |
int or np.uint64 | uint64_t | create_uint64(2**64-1) | get_uint(packet) |
float or np.float32 | float | create_float(1.1) | get_float(packet) |
float or np.double | double | create_double(1.1) | get_float(packet) |
str (UTF-8) | std::string | create_string('abc') | get_str(packet) |
bytes | std::string | create_string(b'\xd0\xd0\xd0') | get_bytes(packet) |
mp.Packet | mp::Packet | create_packet(p) | get_packet(packet) |
List[bool] | std::vector<bool> | create_bool_vector([True, False]) | get_bool_list(packet) |
List[int] or List[np.intc] | int[] | create_int_array([1, 2, 3]) | get_int_list(packet, size=10) |
List[int] or List[np.intc] | std::vector<int> | create_int_vector([1, 2, 3]) | get_int_list(packet) |
List[float] or List[np.float] | float[] | create_float_arrary([0.1, 0.2]) | get_float_list(packet, size=10) |
List[float] or List[np.float] | std::vector<float> | create_float_vector([0.1, 0.2]) | get_float_list(packet, size=10) |
List[str] | std::vector<std::string> | create_string_vector(['a']) | get_str_list(packet) |
List[mp.Packet] | std::vector<mp::Packet> | create_packet_vector( [packet1, packet2]) |
get_packet_list(p) |
Mapping[str, Packet] | std::map |
create_string_to_packet_map( {'a': packet1, 'b': packet2}) |
get_str_to_packet_dict(packet) |
np.ndarray (cv.mat and PIL.Image) |
mp::ImageFrame | create_image_frame( format=ImageFormat.SRGB, data=mat) |
get_image_frame(packet) |
np.ndarray | mp::Matrix | create_matrix(data) | get_matrix(packet) |
Google Proto Message | Google Proto Message | create_proto(proto) | get_proto(packet) |
List[Proto] | std::vector<Proto> | n/a | get_proto_list(packet) |
It's not uncommon that users create custom C++ classes and send those into the graphs and calculators. To allow the custom classes to be used in Python with MediaPipe Framework, you may extend the Packet API for a new data type in the following steps:
Write the pybind11 class binding code or a custom type caster for the custom type in a cc file.
#include "path/to/my_type/header/file.h" #include "pybind11/pybind11.h" namespace py = pybind11; PYBIND11_MODULE(my_type_binding, m) { // Write binding code or a custom type caster for MyType. py::class_<MyType>(m, "MyType") .def(py::init<>()) .def(...); }
Create a new packet creator and getter method of the custom type in a separate cc file.
#include "path/to/my_type/header/file.h" #include "mediapipe/framework/packet.h" #include "pybind11/pybind11.h" namespace mediapipe { namespace py = pybind11; PYBIND11_MODULE(my_packet_methods, m) { m.def( "create_my_type", [](const MyType& my_type) { return MakePacket<MyType>(my_type); }); m.def( "get_my_type", [](const Packet& packet) { if(!packet.ValidateAsType<MyType>().ok()) { PyErr_SetString(PyExc_ValueError, "Packet data type mismatch."); return py::error_already_set(); } return packet.Get<MyType>(); }); } } // namespace mediapipe
Add two bazel build rules for the custom type binding and the new packet methods in the BUILD file.
load("@pybind11_bazel//:build_defs.bzl", "pybind_extension") pybind_extension( name = "my_type_binding", srcs = ["my_type_binding.cc"], deps = [":my_type"], ) pybind_extension( name = "my_packet_methods", srcs = ["my_packet_methods.cc"], deps = [ ":my_type", "//mediapipe/framework:packet" ], )
Build the pybind extension targets (with the suffix .so) by Bazel and move the generated dynamic libraries into one of the $LD_LIBRARY_PATH dirs.
Use the binding modules in Python.
import my_type_binding import my_packet_methods packet = my_packet_methods.create_my_type(my_type_binding.MyType()) my_type = my_packet_methods.get_my_type(packet)
Timestamp
Each packet contains a timestamp that is in units of microseconds. In Python,
the Packet API provides a convenience method packet.at()
to define the numeric
timestamp of a packet. More generally, packet.timestamp
is the packet class
property for accessing the underlying timestamp. To convert an Unix epoch to a
MediaPipe timestamp,
the Timestamp API
offers a method mp.Timestamp.from_seconds()
for this purpose.
ImageFrame
ImageFrame is the container for storing an image or a video frame. Formats
supported by ImageFrame are listed in
the ImageFormat enum.
Pixels are encoded row-major with interleaved color components, and ImageFrame
supports uint8, uint16, and float as its data types. MediaPipe provides
an ImageFrame Python API
to access the ImageFrame C++ class. In Python, the easiest way to retrieve the
pixel data is to call image_frame.numpy_view()
to get a numpy ndarray. Note
that the returned numpy ndarray, a reference to the internal pixel data, is
unwritable. If the callers need to modify the numpy ndarray, it's required to
explicitly call a copy operation to obtain a copy. When MediaPipe takes a numpy
ndarray to make an ImageFrame, it assumes that the data is stored contiguously.
Correspondingly, the pixel data of an ImageFrame will be realigned to be
contiguous when it's returned to the Python side.
Graph
In the MediaPipe Framework, all processing takes places within the context of a CalculatorGraph. The CalculatorGraph Python API is a direct binding to the C++ CalculatorGraph class. The major difference is the CalculatorGraph Python API raises a Python error instead of returning a non-OK Status when an error occurs. Therefore, as a Python user, you can handle the exceptions as you normally do. The life cycle of a CalculatorGraph contains three stages: initialization and setup, graph run, and graph shutdown.
Initialize a CalculatorGraph with a CalculatorGraphConfig protobuf or binary protobuf file, and provide callback method(s) to observe the output stream(s).
Option 1. Initialize a CalculatorGraph with a CalculatorGraphConfig protobuf or its text representation, and observe the output stream(s):
import mediapipe as mp config_text = """ input_stream: 'in_stream' output_stream: 'out_stream' node { calculator: 'PassThroughCalculator' input_stream: 'in_stream' output_stream: 'out_stream' } """ graph = mp.CalculatorGraph(graph_config=config_text) output_packets = [] graph.observe_output_stream( 'out_stream', lambda stream_name, packet: output_packets.append(mp.packet_getter.get_str(packet)))
Option 2. Initialize a CalculatorGraph with a binary protobuf file, and observe the output stream(s).
import mediapipe as mp # resources dependency graph = mp.CalculatorGraph( binary_graph=os.path.join( resources.GetRunfilesDir(), 'path/to/your/graph.binarypb')) graph.observe_output_stream( 'out_stream', lambda stream_name, packet: print(f'Get {packet} from {stream_name}'))
Start the graph run and feed packets into the graph.
graph.start_run() graph.add_packet_to_input_stream( 'in_stream', mp.packet_creator.create_string('abc').at(0)) rgb_img = cv2.cvtColor(cv2.imread('/path/to/your/image.png'), cv2.COLOR_BGR2RGB) graph.add_packet_to_input_stream( 'in_stream', mp.packet_creator.create_image_frame(image_format=mp.ImageFormat.SRGB, data=rgb_img).at(1))
Close the graph after finish. You may restart the graph for another graph run after the call to
close()
.graph.close()
The Python script can be run by your local Python runtime.