Mediapipe integration
Introduction
Mediapipe is a cross-platform framework to create complex Computer Vision pipelines both for offline and real-time applications. It leverages popular frameworks such as OpenCV and Tensorflow to process audio, video, and run deep learning models. By integrating Lluvia into mediapipe, it is possible to speed up some of those computations by creating a GPU compute pipeline.
Difference 1: project scope
Mediapipe is a more general framework than Lluvia. Mediapipe, at its core, is a compute graph scheduler, where each node can contain any arbitrary processing logic. The integration of third-party frameworks (e.g. OpenCV, Tensorflow, Lluvia) gives the framework its power for developing complex Computer Vision pipelines.
Lluvia, on the other hand, is specialized in creating compute pipelines running efficiently on GPU. Bringing the project to Mediapipe will enable easier integration with other frameworks and increase runtime performance of Computer Vision applications.
On Graphs, Calculators and Packets
Mediapipe uses Directed Acylic Graphs to describe the compute pipeline to be run by the framework. Each node in the graph is denoted a Calculator. Each calculator declares its inputs and outputs contract, establishing the type of packet it can handle, and defines a function to process those packets.
Graphs are described as Protobuffers, with the configuration for each calculator. Mediapipe takes this data at runtime, instantiate each calculator, and connects it to its up and downstream neighbors according to the supplied contracts.
Packets enter the graph through input streams and leave it through output streams. When a new packet arrives, mediapipe schedules the processing of that packet to the corresponding calculator, or enqueues it if it is busy.
The figure below illustrates a mediapipe graph for performing edge detection on the GPU. Each calculator receives GPU image packets and schedules execution on the available device.
Difference 2: packets and graph scheduling
A packet in Mediapipe is an independent piece of data that travels through the calculator graphs. This enables Mediapipe to schedule running several calculators concurrently, thus potentially increasing performance.
In Lluvia, nodes connected through inputs and outputs do not allocate new memory on each run of the node. Instead, all the memory is allocated at node initialization time, and exposed through the node’s ports. Then, the whole graph is scheduled to run on the GPU device in one go. This reduces the delay in computations as avoids cross-talk between the host CPU and the GPU to synchronize individual node execution.
Lluvia as a mediapipe dependency
Mediapipe, as well as Lluvia, are built using Bazel. As a consequence, the integration of Lluvia can be done by including the project as a Bazel dependency into Mediapipe repository. The current approach to achieve this is through the use of an auxiliary repository, lluvia-mediapipe, that contains the LluviaCalculator node to run GPU compute-pipelines as a Mediapipe calculator. The build instructions are available in the mediapipe integration guide. The process is as follows:
- Clone Mediapipe repository alongside Lluvia.
- Configure Mediapipe’s Bazel workspace to build in your host machine.
- Include Lluvia as a dependency to Mediapipe.
- Clone
lluvia-mediapipe
repository inside Mediapipe to enable building its targets. - Run the tests included in the repository to validate the build.
The directory structure of the three projects should look like this:
lluvia <-- lluvia repository
mediapipe <-- mediapipe repository
├── BUILD.bazel
├── LICENSE
├── ...
├── mediapipe <--
│ ├── BUILD
│ ├── calculators
│ ├── examples
│ ├── framework
│ ├── gpu
│ ├── ...
│ ├── lluvia-mediapipe <-- lluvia-mediapipe repository
├── ...
├── .bazelrc
└── WORKSPACE
Once Mediapipe builds correctly, it is possible to create graphs that include the LluviaCalculator
.
The LluviaCalculator
The LluviaCalculator
is in charge of initializing Lluvia, binding input and output streams from mediapipe to lluvia ports, and running a given compute pipeline.
The figure below illustrates a basic mediapipe graph utilizing lluvia, while the code below shows the graph description using Protobuffer text syntax:
|
|
where:
- The
enable_debug
flag tells whether or not the Vulkan debug extensions used by Lluvia should be loaded during session creation. This flag might be set tofalse
in production applications to improve runtime performance. - The
library_path
declare paths to node libraries (a.zip
file) containing Lluvia nodes (Container and Compute). This attribute can be repeated several times. - The
script_path
is the path to alua
script declaring aContainerNode
that Lluvia will instantiate as the “main” node to run inside the calculator. input_port_binding
, maps mediapipe input tags to the mainContainerNode
port. In the example above, mediapipe’s input tagIN_0
is mapped to lluvia’sin_image
port.
Examples
lluvia-mediapipe
includes two applications, single_image
and webcam
to run on the host system. The single_image
app, as the name suggests, reads the content of a single image and feeds it to a Mediapipe graph.
The command below executes the binary with a graph configured to run the lluvia/color/BGRA2Gray
compute node to convert from the BGRA input to gray scale:
|
|
where ${HOME}/git
is the base folder where Lluvia and Mediapipe are cloned. Change this according to your setup.
A more sophisticated example is running the Horn and Schunck optical flow algorithm inside of Mediapipe. The webcam
binary opens the default capture device using OpenCV and transfers the captured frames the compute graph. The graph is a single LluviaCalculator
running several nodes:
|
|
where --graph_file=${HOME}/git/mediapipe/mediapipe/lluvia-mediapipe/examples/desktop/graphs/HornSchunck/graph.pbtxt
is the path to Mediapipe’s graph to be run by the app, and --script_file=${HOME}/git/mediapipe/mediapipe/lluvia-mediapipe/examples/desktop/graphs/HornSchunck/script.lua
points to a Lua script defining the Container node to run inside of the LluviaCalculator
.
@startuml
skinparam linetype ortho
state LluviaCalculator as "LluviaCalculator" {
state input_stream as "IN_0:input_stream" <<inputPin>>
state output_stream as "OUT_0:output_stream" <<outputPin>>
state ContainerNode as "mediapipe/examples/HornSchunck" {
state in_image <<inputPin>>
state BGRA2Gray
state HS as "HornSchunck"
state Flow2RGBA
state RGBA2BGRA
input_stream -down-> in_image
in_image -down-> BGRA2Gray
BGRA2Gray -down-> HS: in_gray
HS -down-> Flow2RGBA: in_flow
Flow2RGBA -down-> RGBA2BGRA: in_rgba
RGBA2BGRA -down-> out_image <<outputPin>>
}
out_image -down-> output_stream <<outputPin>>
}
@enduml
First, the input image is transformed from BGRA color space to gray scale. Next, the images are fed to the HornSchunck
container node to compute optical flow. The estimated flow is then converted to color using the Flow2RGBA
compute node, and finally, the RGBA output is converted to BGRA to proper rendering in the window opened by OpenCV.
Difference 3: calculators as code vs. nodes as data
In Mediapipe, every Calculator must be compiled and integrated into the binary at build time, thus requiring rebuilding every time a new Calculator must be added or modified.
Lluvia describes nodes as a pair of Lua and GLSL (for ComputeNode) files that are compiled and packaged into a node library as a .zip
file. Once packaged, the library can be imported on any runtime where Lluvia runs. This eases the developer experience as one can develop nodes in a higher-level environment, using Python in a Jupyter notebook for instance, package the nodes in a node library and then use them in any environment (Mediapipe for instance).
Discussion
This article presented the integration of Lluvia into the Mediapipe project. By added the project into Mediapipe, it is possible to leverage the GPU compute-pipeline capabilities of Lluvia to speed up parts of complex Computer Vision applications.
The integrations between thw two projects is achieved through the LluviaCalculator
which runs any arbitrary ContainerNode
. This calculator is in early stages of development, and feedback is very welcomed. Some immediate improvements include:
- Support
GPUImageFrame
input and output packets. Currently, the calculator only accepts CPUImageFrame
packets, thus introducing some latency while copying data from CPU memory space to the GPU. - Support Mediapipe side packets to send configuration updates to the calculator.
- Include more configuration attributes (e.g. node parameters) in the Protobuffer type.
And finally, testing the integration in other platforms such as Android.