Model Garden provides a collection of state-of-the-art machine learning (ML) models for vision, text, and audio capabilities. These models are optimized for use with the Google Tensor SDK, allowing you to bring powerful AI features directly to your Pixel devices with seamless on-device performance.
Depth Estimation
| Model | License |
|---|---|
| depth_anything_v2 | Apache-2.0 |
| midas_v2_1 | BSD-3-Clause |
Face Reconstruction
| Model | License |
|---|---|
| facemap_3dmm | BSD-3-Clause |
Image and text understanding
| Model | License |
|---|---|
| clip | MIT |
| mobileclip_image_encoder | MIT |
| mobileclip_text_encoder | MIT |
| tinyclip | MIT |
Image Classification
| Model | License |
|---|---|
| alexnet | BSD-3-Clause |
| beit | BSD-3-Clause |
| convnext_base | BSD-3-Clause |
| convnext_tiny | BSD-3-Clause |
| densenet121 | BSD-3-Clause |
| efficientformer_l1 | Apache-2.0 |
| efficientformerv2_s0 | Apache-2.0 |
| efficientnet_b0 | BSD-3-Clause |
| efficientnet_b1 | BSD-3-Clause |
| efficientnet_b2 | BSD-3-Clause |
| efficientnet_b3 | BSD-3-Clause |
| efficientnet_b4 | BSD-3-Clause |
| efficientnet_b5 | BSD-3-Clause |
| efficientnet_b6 | BSD-3-Clause |
| efficientnet_b7 | BSD-3-Clause |
| efficientnet_v2_s | BSD-3-Clause |
| efficientnetv2_m | APACHE-2.0 |
| efficientvit_cls_b2 | BSD-3-Clause |
| efficientvit_cls_l2 | BSD-3-Clause |
| efficientvit_seg_l2 | APACHE-2.0 |
| googlenet | BSD-3-Clause |
| inception_v3 | BSD-3-Clause |
| levit | APACHE-2.0 |
| maxvit_t | BSD-3-Clause |
| mnasnet0 | BSD-3-Clause |
| mobile_vit | BSD-3-Clause |
| mobilenet_v2 | APACHE-2.0 |
| mobilenet_v3_large | BSD-3-Clause |
| mobilenet_v3_small | BSD-3-Clause |
| mobilenetv4_conv_l | BSD-3-Clause |
| mobilenetv4_conv_m | BSD-3-Clause |
| mobilenetv4_conv_s | BSD-3-Clause |
| mobilenetv4_hybrid_l | BSD-3-Clause |
| mobilenetv4_hybrid_medium | APACHE-2.0 |
| nfnet | BSD-3-Clause |
| pvt_v2_b1 | BSD-3-Clause |
| pvt_v2_b3 | BSD-3-Clause |
| regnety | APACHE-2.0 |
| resnest14d | BSD-3-Clause |
| resnet101 | BSD-3-Clause |
| resnet152 | BSD-3-Clause |
| resnet18 | BSD-3-Clause |
| resnet50 | BSD-3-Clause |
| resnext101 | AI-HUB-MODELS |
| resnext50 | BSD-3-Clause |
| shufflenet_v2 | BSD-3-Clause |
| squeezenet1 | BSD-3-Clause |
| swin_small | BSD-3-Clause |
| swin_tiny | BSD-3-Clause |
| tf_efficientnetv2_m | APACHE-2.0 |
| vgg16 | BSD-3-Clause |
| vit_base_patch16 | APACHE-2.0 |
| vit_small_patch16 | BSD-3-Clause |
| wide_resnet101 | BSD-3-Clause |
| wide_resnet50 | BSD-3-Clause |
Image Segmentation
| Model | License |
|---|---|
| hrnet_w48_ocr | MIT |
| mediapipe_selfie | APACHE-2.0 |
| unet_segmentation | GPL-3.0 |
Image Super Resolution
| Model | License |
|---|---|
| esrgan | APACHE-2.0 |
Object Detection
| Model | License |
|---|---|
| 3d_deep_box | MIT |
| conditional_detr_resnet50 | Apache-2.0 |
| detr_resnet50 | Apache-2.0 |
| detr_resnet50_dc5 | Apache-2.0 |
| detr_resnet101 | Apache-2.0 |
| detr_resnet101_dc5 | Apache-2.0 |
| faceattribnet | AI-HUB-MODELS |
| lightweight_face_detection | AI-HUB-MODELS |
| mediapipe_hand_detection | APACHE-2.0 |
| person_foot_detection | AI-HUB-MODELS |
| ppe_detection | AI-HUB-MODELS |
| yolo_v4 | Apache-2.0 |
| yolo_v6 | GPL-3.0 |
| yolo_v7 | GPL-3.0 |
| yolos_tiny | APACHE-2.0 |
| yolox_tiny | APACHE-2.0 |
Pose Estimation
| Model | License |
|---|---|
| hrnet_pose | MIT |
| lite_hrnet_pose | APACHE-2.0 |
| mediapipe_pose | APACHE-2.0 |
| movenet | MIT |
Question Answering
| Model | License |
|---|---|
| tinyroberta | CC-BY-4.0 |
Semantic Segmentation
| Model | License |
|---|---|
| bgnet | Apache-2.0 |
| bisenet | No license file |
| ddrnet23_slim | MIT |
| deeplabv3_mobilenet_v3_large | BSD-3-Clause |
| deeplabv3_plus_mobilenet | MIT |
| deeplabv3_resnet101 | BSD-3-Clause |
| deeplabv3_resnet50 | BSD-3-Clause |
| fcn_resnet50 | BSD-3-Clause |
| ffnet_122ns_lowres | BSD-3-Clause |
| ffnet_40s | BSD-3-Clause |
| ffnet_54s | BSD-3-Clause |
| ffnet_78s_lowres | BSD-3-Clause |
| isnet | Apache 2.0 |
| lraspp_mobilenet_v3_large | BSD-3-Clause |
| sam_vit_b | APACHE-2.0 |
| sam_vit_l | APACHE-2.0 |
| segformer | NVIDIA-SCSL |
| segment_anything_model | Apache-2.0 |
| u2net_full | APACHE-2.0 |
| u2net_lite | APACHE-2.0 |
Speech Recognition
| Model | License |
|---|---|
| deepspeech | BSD-2-Clause |
| torchaudio_emformer_rnnt_base | BSD-2-Clause |
| wav2vec2_base_960h | APACHE-2.0 |
Super Resolution
| Model | License |
|---|---|
| quicksrnet_large | BSD-3-Clause |
| quicksrnet_small | BSD-3-Clause |
| real_esrgan_general_x4v3 | BSD-3-Clause |
| real_esrgan_x4plus | BSD-3-Clause |
| xlsr | BSD-3-Clause |
Text Classification
| Model | License |
|---|---|
| distilbert | Apache-2.0 |
| mobilebert | Apache-2.0 |