When should I implement the Disptach API?
A LiteRT Dispatch API is necessary when you need to integrate a specific hardware accelerator runtime into the LiteRT framework. It works with the Compiler Plugin to execute the compiled sub-graphs emiited by the compiler plugin.
The dispatch API is a set of vendor implemented functions LiteRT will use to manage a session with an accelerator's runtime. These functions cover device sessions, subgraph execution and buffer movement between the host and device. Implementations of the dispatch API, in conjunction with a compiler plugin, allow LiteRT to fully utilized an accelerator's capabilities for efficient inference.
More detail regarding "when" to implement the dispatch API (and compiler plugin) can be found in the LiteRT Compiler Plugin page.
How Does the Dispatch API Work?
Dispatch API is used by CompiledModel using the NpuAccelerator. This
internally creates a DispatchDelegate and it is this DispatchDelegate that
uses the Dispatch API to engage the NPU embedded in the running hardware.
Dispatch API Data Types
In the Dispatch API, the following data types are used to execute a model on NPUs.
DispatchDeviceContextIt is used to manage buffers to used by NPU inference.
DispatchInvocationContextThis is the data structure used to execute the model. It works by associating the actual input and output memory registered in
DispatchDeviceContextwith the generatedDispatchGraph.
Dispatch APIs
For the full definition of the Dispatch API, please refer to the
vendors/c/litert_dispatch.h
file.