Image generation guide for Android

The MediaPipe Image Generator task lets you generate images based on a text prompt. This task uses a text-to-image model to generate images using diffusion techniques.

The task accepts a text prompt as input, along with an optional condition image that the model can augment and use as a reference for generation. Image Generator can also generate images based on specific concepts provided to the model during training or retraining. For more information, see customize with LoRA.

The code sample described in these instructions is available on GitHub. For more information about the capabilities, models, and configuration options of this task, see the Overview.

Code example

The MediaPipe Tasks example code is a basic implementation of a Image Generator app for Android. You can use the app as a starting point for your own Android app, or refer to it when modifying an existing app. The Image Generator example code is hosted on GitHub.

Download the code

The following instructions show you how to create a local copy of the example code using the git command line tool.

To download the example code:

  1. Clone the git repository using the following command:
    git clone https://github.com/google-ai-edge/mediapipe-samples
    
  2. Optionally, configure your git instance to use sparse checkout, so you have only the files for the Image Generator example app:
    cd mediapipe
    git sparse-checkout init --cone
    git sparse-checkout set examples/image_generation/android
    

After creating a local version of the example code, you can import the project into Android Studio and run the app. For instructions, see the Setup Guide for Android.

Key components

The following files contain the crucial code for this image generation example application:

  • ImageGenerationHelper.kt: Initializes the task and handles the image generation.
  • DiffusionActivity.kt: Generates images when plugins or LoRA weights are not enabled.
  • PluginActivity.kt: Implements the plugin models, which enables users to provide a condition image as an input.
  • LoRAWeightActivity.kt: Accesses and handles the LoRA weights, which are used to customize foundation models and enable them to generate images of specific concepts.

Setup

This section describes key steps for setting up your development environment and code projects specifically to use Image Generator. For general information on setting up your development environment for using MediaPipe tasks, including platform version requirements, see the Setup guide for Android.

Dependencies

The Image Generator task uses the com.google.mediapipe:tasks-vision-image-generator library. Add this dependency to the build.gradle file of your Android app:

dependencies {
    implementation 'com.google.mediapipe:tasks-vision-image-generator:latest.release'
}

For devices with Android 12 (API 31) or higher, add the native OpenCL library dependency. For more information, see the documentation on the uses-native-library tag.

Add the following uses-native-library tags to the AndroidManifest.xml file:

<uses-native-library android:name="libOpenCL.so" android:required="false" />
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false" />

Model

The MediaPipe Image Generator task requires a trained foundation model that is compatible with this task. After downloading a model, install the required dependencies and convert ther model into a suitable a suitable format. Then, push the converted model to the Android device.

For more information on available trained models for Image Generator, see the task overview Models section.

Download foundation model

The Image Generator requires that the foundation model match the runwayml/stable-diffusion-v1-5 EMA-only model format, based on the following model: runwayml/stable-diffusion-v1-5.

Install dependencies and convert the model

$ pip install torch typing_extensions numpy Pillow requests pytorch_lightning absl-py

Run the convert.py script:

$ python3 convert.py --ckpt_path <ckpt_path> --output_path <output_path>

Push converted model to the device

Push the content of the <output_path> folder to the Android device.

$ adb shell rm -r /data/local/tmp/image_generator/ # Remove any previously loaded weights
$ adb shell mkdir -p /data/local/tmp/image_generator/
$ adb push <output_path>/. /data/local/tmp/image_generator/bins

Download Plugin models and add LoRA weights (Optional)

If you intend to use a plugin model, check whether the model must be downloaded. For plugins that require an additional model, the plugin models must be either bundled in the APK or downloaded on-demand. Plugin models are lightweight (~23MB) and can be bundled directly in the APK. However, we recommended downloading plugin models on-demand.

If you have customized a model with LoRA, download them on-demand. For more information, see LoRA weights plugin model.

Create the task

The MediaPipe Image Generator task uses the createFromOptions() function to set up the task. The createFromOptions() function accepts values for the configuration options. For more information on configuration options, see Configuration options.

Configuration options

This task has the following configuration options for Android apps:

Option Name Description Value Range
imageGeneratorModelDirectory The image generator model directory storing the model weights. PATH
loraWeightsFilePath Sets the path to LoRA weights file. Optional and only applicable if the model was customized with LoRA. PATH
errorListener Sets an optional error listener. N/A

The task also supports plugin models, which lets users include condition images in the task input, which the foundation model can augment and use as a reference for generation. These condition images can be face landmarks, edge outlines, and depth estimates, which the model uses as additional context and information to generate images.

When adding a plugin model to the foundation model, also configure the plugin options. The Face landmark plugin uses faceConditionOptions, the Canny edge plugin uses edgeConditionOptions, and the Depth plugin uses depthConditionOptions.

Canny edge options

Configure the following options in edgeConditionOptions.

Option Name Description Value Range Default Value
threshold1 First threshold for the hysteresis procedure. Float 100
threshold2 Second threshold for the hysteresis procedure. Float 200
apertureSize Aperture size for the Sobel operator. Typical range is between 3-7. Integer 3
l2Gradient Whether the L2 norm is used to calculate the image gradient magnitude, instead of the default L1 norm. BOOLEAN False
EdgePluginModelBaseOptions The BaseOptions object that sets the path for the plugin model. BaseOptions object N/A

For more information on how these configuration options work, see Canny edge detector.

Face landmark options

Configure the following options in faceConditionOptions.

Option Name Description Value Range Default Value
minFaceDetectionConfidence The minimum confidence score for the face detection to be considered successful. Float [0.0,1.0] 0.5
minFacePresenceConfidence The minimum confidence score of face presence score in the face landmark detection. Float [0.0,1.0] 0.5
faceModelBaseOptions The BaseOptions object that sets the path for the model that creates the condition image. BaseOptions object N/A
FacePluginModelBaseOptions The BaseOptions object that sets the path for the plugin model. BaseOptions object N/A

For more information on how these configuration options work, see the Face Landmarker task.

Depth options

Configure the following options in depthConditionOptions.

Option Name Description Value Range Default Value
depthModelBaseOptions The BaseOptions object that sets the path for the model that creates the condition image. BaseOptions object N/A
depthPluginModelBaseOptions The BaseOptions object that sets the path for the plugin model. BaseOptions object N/A

Create with only the foundation model

val options = ImageGeneratorOptions.builder()
    .setImageGeneratorModelDirectory(modelPath)
    .build()

imageGenerator = ImageGenerator.createFromOptions(context, options)

Create with plugins

If you are applying an optional plugin model, set the base options for the plugin model with setPluginModelBaseOptions. If the plugin model requires an additional downloaded model to create the condition image, specify the path in BaseOptions.

Face landmark

val options = ImageGeneratorOptions.builder()
    .setImageGeneratorModelDirectory(modelPath)
    .build()

val faceModelBaseOptions = BaseOptions.builder()
    .setModelAssetPath("face_landmarker.task")
    .build()

val facePluginModelBaseOptions = BaseOptions.builder()
    .setModelAssetPath("face_landmark_plugin.tflite")
    .build()

val faceConditionOptions = FaceConditionOptions.builder()
    .setFaceModelBaseOptions(faceModelBaseOptions)
    .setPluginModelBaseOptions(facePluginModelBaseOptions)
    .setMinFaceDetectionConfidence(0.3f)
    .setMinFacePresenceConfidence(0.3f)
    .build()

val conditionOptions = ConditionOptions.builder()
    .setFaceConditionOptions(faceConditionOptions)
    .build()

imageGenerator =
    ImageGenerator.createFromOptions(context, options, conditionOptions)
    

Canny Edge

val options = ImageGeneratorOptions.builder()
    .setImageGeneratorModelDirectory(modelPath)
    .build()

val edgePluginModelBaseOptions = BaseOptions.builder()
    .setModelAssetPath("canny_edge_plugin.tflite")
    .build()

val edgeConditionOptions = EdgeConditionOptions.builder()
    .setThreshold1(100.0f)
    .setThreshold2(100.0f)
    .setApertureSize(3)
    .setL2Gradient(false)
    .setPluginModelBaseOptions(edgePluginModelBaseOptions)
    .build()

val conditionOptions = ConditionOptions.builder()
    .setEdgeConditionOptions(edgeConditionOptions)
    .build()

imageGenerator =
    ImageGenerator.createFromOptions(context, options, conditionOptions)
    

Depth

val options = ImageGeneratorOptions.builder()
    .setImageGeneratorModelDirectory(modelPath)
    .build()

val depthModelBaseOptions = BaseOptions.builder()
    .setModelAssetPath("depth_model.tflite")
    .build()

val depthPluginModelBaseOptions = BaseOptions.builder()
    .setModelAssetPath("depth_plugin.tflite")
    .build()

val depthConditionOptions =
    ConditionOptions.DepthConditionOptions.builder()
        .setDepthModelBaseOptions(depthModelBaseOptions)
        .setPluginModelBaseOptions(depthPluginModelBaseOptions)
        .build()

val conditionOptions = ConditionOptions.builder()
    .setDepthConditionOptions(depthConditionOptions)
    .build()

imageGenerator =
    ImageGenerator.createFromOptions(context, options, conditionOptions)
    

Create with LoRA weights

If you are including LoRA weights, use the loraWeightsFilePath parameter to point to the path location.

val options = ImageGeneratorOptions.builder()
    .setLoraWeightsFilePath(weightsPath)
    .setImageGeneratorModelDirectory(modelPath)
    .build()

imageGenerator = ImageGenerator.createFromOptions(context, options)

Prepare data

The Image Generator accepts the following inputs:

  • prompt (required): The text prompt describing the image to be generated.
  • iterations (required): The total iterations to generate the image. A good starting point is 20.
  • seed (required): The random seed used during image generation.
  • condition image (optional): The image the model uses as a reference for generation. Only applicable when using a plugin model.
  • condition type (optional): The type of plugin model used with the task. Only applicable when using a plugin model.

Inputs with only the foundation model

fun setInput(prompt: String, iteration: Int, seed: Int) {
    imageGenerator.setInputs(prompt, iteration, seed)
}

Inputs with plugins

If you are applying an optional plugin model, also use the conditionType parameter to choose the plugin model and the sourceConditionImage parameter to generate the condition image.

Option Name Description Value
conditionType The plugin model applied to the foundation model. {"FACE", "EDGE", "DEPTH"}
sourceConditionImage The source image used to create the condition image. MPImage object

If you are using a plugin model, use the createConditionImage to create the condition image:

fun createConditionImage(
    inputImage: MPImage,
    conditionType: ConditionType
): Bitmap {
    val result =
        imageGenerator.createConditionImage(inputImage, conditionType)
    return BitmapExtractor.extract(result)
}

After creating the condition image, include in as an input along with the prompt, seed, and number of iterations.

imageGenerator.setInputs(
    prompt,
    conditionalImage,
    conditionType,
    iteration,
    seed
)

Inputs with LoRA weights

If you are using LoRA weights, ensure that the token is in the text prompt if you intend to generate an image with the specific concept represented by the weights.

fun setInput(prompt: String, iteration: Int, seed: Int) {
    imageGenerator.setInputs(prompt, iteration, seed)
}

Run the task

Use the generate() method to generate an image using the inputs provided in the previous section. This produces a single generated image.

Generate with only the foundation model

fun generate(prompt: String, iteration: Int, seed: Int): Bitmap {
    val result = imageGenerator.generate(prompt, iteration, seed)
    val bitmap = BitmapExtractor.extract(result?.generatedImage())
    return bitmap
}

Generate with plugins

fun generate(
    prompt: String,
    inputImage: MPImage,
    conditionType: ConditionType,
    iteration: Int,
    seed: Int
): Bitmap {
    val result = imageGenerator.generate(
        prompt,
        inputImage,
        conditionType,
        iteration,
        seed
    )
    val bitmap = BitmapExtractor.extract(result?.generatedImage())
    return bitmap
}

Generate with LoRA weights

The process of generating images with a model customized with LoRA weights is similar to the process with a standard foundation model. Ensure that the token is included in the prompt and run the same code.

fun generate(prompt: String, iteration: Int, seed: Int): Bitmap {
    val result = imageGenerator.generate(prompt, iteration, seed)
    val bitmap = BitmapExtractor.extract(result?.generatedImage())
    return bitmap
}

Iterative generation

The Image Generator can also output the generated intermediate images during each iteration, as defined in the iterations input parameter. To view these intermediate results, call the setInputs method, then call execute() to run each step. Set the showResult parameter to true to display the intermediate results.

fun execute(showResult: Boolean): Bitmap {
    val result = imageGenerator.execute(showResult)

    val bitmap =
        BitmapExtractor.extract(result.generatedImage())

    return bitmap
}

Handle and display results

The Image Generator returns a ImageGeneratorResult, which includes the generated image, a timestamp of the time of completion, and the conditional image if one was provided as an input.

val bitmap = BitmapExtractor.extract(result.generatedImage())

The following image was generated from the following inputs, using only a foundation model.

Inputs:

  • Prompt: "a colorful cartoon raccoon wearing a floppy wide brimmed hat holding a stick walking through the forest, animated, three-quarter view, painting"
  • Seed: 312687592
  • Iterations: 20

Generated image: