The MediaPipe Language Detector task lets you identify the language of a piece of text. This
task operates on text data with a machine learning (ML) model and outputs a list
of predictions, where each prediction consists of an
ISO 639-1 language code
and a probability.
Start using this task by following one of these implementation guides for your
target platform. These platform-specific guides walk you through a basic
implementation of this task, including a recommended model, and code example
with recommended configuration options:
This section describes the capabilities, inputs, outputs, and configuration
options of this task.
Features
Score threshold - Filter results based on prediction scores
Label allowlist and denylist - Specify the categories detected
Task inputs
Task outputs
Language Detector accepts the following input data type:
String
Language Detector outputs a list of predictions containing:
Language code: An ISO 639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)
language / locale code (e.g. "en" for English, "uz" for Uzbek, "ja-Latn” for
Japanese (romaji)) as a string.
Probability: the confidence score for this prediction, expressed as a
probability between zero and one as floating point value.
Configurations options
This task has the following configuration options:
Option Name
Description
Value Range
Default Value
max_results
Sets the optional maximum number of top-scored language predictions to
return. If this value is less than zero, all available results are returned.
Any positive numbers
-1
score_threshold
Sets the prediction score threshold that overrides the one provided in
the model metadata (if any). Results below this value are rejected.
Any float
Not set
category_allowlist
Sets the optional list of allowed language codes. If non-empty,
language predictions whose language code is not in this set will be
filtered out. This option is mutually exclusive with
category_denylist and using both results in an error.
Any strings
Not set
category_denylist
Sets the optional list of language codes that are not allowed. If
non-empty, language predictions whose language code is in this set will be filtered
out. This option is mutually exclusive with category_allowlist and
using both results in an error.
Any strings
Not set
Models
We offer a default, recommended model when you start developing with this task.
Language detector model (recommended)
This model is built to be lightweight (315 KB) and uses embedding-based, neural
network classification architecture. The model identifies language using an
ISO 639-1 language
code, and can identify 110 languages. For a list of languages supported by the
model, see the
label file,
which lists languages by their ISO 639-1 code.
Here's the task benchmarks for the whole pipeline based on the above
pre-trained models. The latency result is the average latency on Pixel 6 using
CPU / GPU.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-01-13 UTC."],[],[],null,["# Language detection guide\n\nThe MediaPipe Language Detector task lets you identify the language of a piece of text. This\ntask operates on text data with a machine learning (ML) model and outputs a list\nof predictions, where each prediction consists of an\n[ISO 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) language code\nand a probability.\n\n[Try it!arrow_forward](https://mediapipe-studio.webapps.google.com/demo/language_detector)\n\nGet Started\n-----------\n\nStart using this task by following one of these implementation guides for your\ntarget platform. These platform-specific guides walk you through a basic\nimplementation of this task, including a recommended model, and code example\nwith recommended configuration options:\n\n- **Android** - [Code example](https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/language_detector/android) - [Guide](./android)\n- **Python** - [Code example](https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/language_detector/python/%5BMediaPipe_Python_Tasks%5D_Language_Detector.ipynb) - [Guide](./python)\n- **Web** - [Code example](https://codepen.io/mediapipe-preview/pen/RweLdpK) - [Guide](./web_js)\n\nTask details\n------------\n\nThis section describes the capabilities, inputs, outputs, and configuration\noptions of this task.\n\n### Features\n\n- **Score threshold** - Filter results based on prediction scores\n- **Label allowlist and denylist** - Specify the categories detected\n\n| Task inputs | Task outputs |\n|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| Language Detector accepts the following input data type: - String | Language Detector outputs a list of predictions containing: - Language code: An ISO 639-1 (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) language / locale code (e.g. \"en\" for English, \"uz\" for Uzbek, \"ja-Latn\" for Japanese (romaji)) as a string. \u003c!-- --\u003e - Probability: the confidence score for this prediction, expressed as a probability between zero and one as floating point value. |\n\n### Configurations options\n\nThis task has the following configuration options:\n\n| Option Name | Description | Value Range | Default Value |\n|----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|---------------|\n| `max_results` | Sets the optional maximum number of top-scored language predictions to return. If this value is less than zero, all available results are returned. | Any positive numbers | `-1` |\n| `score_threshold` | Sets the prediction score threshold that overrides the one provided in the model metadata (if any). Results below this value are rejected. | Any float | Not set |\n| `category_allowlist` | Sets the optional list of allowed language codes. If non-empty, language predictions whose language code is not in this set will be filtered out. This option is mutually exclusive with `category_denylist` and using both results in an error. | Any strings | Not set |\n| `category_denylist` | Sets the optional list of language codes that are not allowed. If non-empty, language predictions whose language code is in this set will be filtered out. This option is mutually exclusive with `category_allowlist` and using both results in an error. | Any strings | Not set |\n\nModels\n------\n\nWe offer a default, recommended model when you start developing with this task.\n| **Attention:** This MediaPipe Solutions Preview is an early release. [Learn more](/edge/mediapipe/solutions/about#notice).\n\n### Language detector model (recommended)\n\nThis model is built to be lightweight (315 KB) and uses embedding-based, neural\nnetwork classification architecture. The model identifies language using an\n[ISO 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) language\ncode, and can identify 110 languages. For a list of languages supported by the\nmodel, see the\n[label file](https://storage.googleapis.com/mediapipe-tasks/language_detector/labels.txt),\nwhich lists languages by their ISO 639-1 code.\n\n| Model name | Input shape | Quantization type | Model card | Versions |\n|---------------------------------------------------------------------------------------------------------------------------------------------|--------------|-------------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|\n| [Language Detector](https://storage.googleapis.com/mediapipe-models/language_detector/language_detector/float32/1/language_detector.tflite) | string UTF-8 | none (float32) | [info](https://storage.googleapis.com/mediapipe-assets/LanguageDetector%20Model%20Card.pdf) | [Latest](https://storage.googleapis.com/mediapipe-models/language_detector/language_detector/float32/1/language_detector.tflite) |\n\nTask benchmarks\n---------------\n\nHere's the task benchmarks for the whole pipeline based on the above\npre-trained models. The latency result is the average latency on Pixel 6 using\nCPU / GPU.\n\n| Model Name | CPU Latency | GPU Latency |\n|-------------------|-------------|-------------|\n| Language Detector | 0.31ms | - |"]]