Las incorporaciones de texto permiten incorporar texto en un vector de atributos de alta dimensión
que representa su significado semántico, que luego se puede comparar con el atributo
de otros textos para evaluar su similitud semántica.
A diferencia de
búsqueda de texto,
la incorporación de texto permite calcular la similitud entre textos sobre la marcha
en lugar de buscar en un índice predefinido creado a partir de un corpus.
Usa la API de TextEmbedder de la Biblioteca de tareas para implementar tu incorporación de texto personalizada en
tus aplicaciones para dispositivos móviles.
Funciones clave de la API de TextEmbedder
Procesamiento de texto de entrada, ya sea en el gráfico o fuera del gráfico
Wordpiece
o
Oración
las asignaciones de token
en el texto de entrada.
Es una función de utilidad incorporada para calcular
similitud coseno entre
vectores de atributos.
Modelos de incorporaciones de texto compatibles
Se garantiza la compatibilidad de los siguientes modelos con TextEmbedder
en la API de Cloud.
// Initialization.TextEmbedderOptionsoptions:options.mutable_base_options()->mutable_model_file()->set_file_name(model_path);std::unique_ptr<TextEmbedder>text_embedder=TextEmbedder::CreateFromOptions(options).value();// Run inference with your two inputs, `input_text1` and `input_text2`.constEmbeddingResultresult_1=text_embedder->Embed(input_text1);constEmbeddingResultresult_2=text_embedder->Embed(input_text2);// Compute cosine similarity.doublesimilarity=TextEmbedder::CosineSimilarity(result_1.embeddings[0].feature_vector()result_2.embeddings[0].feature_vector());
Consulta la
código fuente
si quieres obtener más opciones para configurar TextEmbedder.
Ejecuta la inferencia en Python
Paso 1: Instala el paquete Pypi de asistencia de TensorFlow Lite
Puedes instalar el paquete de compatibilidad con Pypi de TensorFlow Lite con el siguiente comando:
:
pipinstalltflite-support
Paso 2: Usa el modelo
fromtflite_support.taskimporttext# Initialization.text_embedder=text.TextEmbedder.create_from_file(model_path)# Run inference on two texts.result_1=text_embedder.embed(text_1)result_2=text_embedder.embed(text_2)# Compute cosine similarity.feature_vector_1=result_1.embeddings[0].feature_vectorfeature_vector_2=result_2.embeddings[0].feature_vectorsimilarity=text_embedder.cosine_similarity(result_1.embeddings[0].feature_vector,result_2.embeddings[0].feature_vector)
Consulta la
código fuente
si quieres obtener más opciones para configurar TextEmbedder.
Resultados de ejemplo
La similitud coseno entre los vectores de atributos normalizados devuelve una puntuación entre -1
y 1. Cuanto más alto sea mejor, es decir, una similitud coseno de 1 significa que los dos vectores son
idénticos.
[[["Fácil de comprender","easyToUnderstand","thumb-up"],["Resolvió mi problema","solvedMyProblem","thumb-up"],["Otro","otherUp","thumb-up"]],[["Falta la información que necesito","missingTheInformationINeed","thumb-down"],["Muy complicado o demasiados pasos","tooComplicatedTooManySteps","thumb-down"],["Desactualizado","outOfDate","thumb-down"],["Problema de traducción","translationIssue","thumb-down"],["Problema con las muestras o los códigos","samplesCodeIssue","thumb-down"],["Otro","otherDown","thumb-down"]],["Última actualización: 2025-07-24 (UTC)"],[],[],null,["# Integrate text embedders.\n\nText embedders allow embedding text into a high-dimensional feature vector\nrepresenting its semantic meaning, which can then be compared with the feature\nvector of other texts to evaluate their semantic similarity.\n\nAs opposed to\n[text search](./text_searcher),\nthe text embedder allows computing the similarity between texts on-the-fly\ninstead of searching through a predefined index built from a corpus.\n\nUse the Task Library `TextEmbedder` API to deploy your custom text embedder into\nyour mobile apps.\n\nKey features of the TextEmbedder API\n------------------------------------\n\n- Input text processing, including in-graph or out-of-graph\n [Wordpiece](https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/cc/text/tokenizers/bert_tokenizer.h)\n or\n [Sentencepiece](https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/cc/text/tokenizers/sentencepiece_tokenizer.h)\n tokenizations on input text.\n\n- Built-in utility function to compute the\n [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) between\n feature vectors.\n\nSupported text embedder models\n------------------------------\n\nThe following models are guaranteed to be compatible with the `TextEmbedder`\nAPI.\n\n- The\n [Universal Sentence Encoder TFLite model from TensorFlow Hub](https://www.kaggle.com/models/google/universal-sentence-encoder-qa-ondevice/tfLite/universal-sentence-encoder-qa-ondevice/1)\n\n- Custom models that meet the\n [model compatibility requirements](#model-compatibility-requirements).\n\nRun inference in C++\n--------------------\n\n // Initialization.\n TextEmbedderOptions options:\n options.mutable_base_options()-\u003emutable_model_file()-\u003eset_file_name(model_path);\n std::unique_ptr\u003cTextEmbedder\u003e text_embedder = TextEmbedder::CreateFromOptions(options).value();\n\n // Run inference with your two inputs, `input_text1` and `input_text2`.\n const EmbeddingResult result_1 = text_embedder-\u003eEmbed(input_text1);\n const EmbeddingResult result_2 = text_embedder-\u003eEmbed(input_text2);\n\n // Compute cosine similarity.\n double similarity = TextEmbedder::CosineSimilarity(\n result_1.embeddings[0].feature_vector()\n result_2.embeddings[0].feature_vector());\n\nSee the\n[source code](https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/cc/task/text/text_embedder.h)\nfor more options to configure `TextEmbedder`.\n\nRun inference in Python\n-----------------------\n\n### Step 1: Install TensorFlow Lite Support Pypi package.\n\nYou can install the TensorFlow Lite Support Pypi package using the following\ncommand: \n\n pip install tflite-support\n\n### Step 2: Using the model\n\n from tflite_support.task import text\n\n # Initialization.\n text_embedder = text.TextEmbedder.create_from_file(model_path)\n\n # Run inference on two texts.\n result_1 = text_embedder.embed(text_1)\n result_2 = text_embedder.embed(text_2)\n\n # Compute cosine similarity.\n feature_vector_1 = result_1.embeddings[0].feature_vector\n feature_vector_2 = result_2.embeddings[0].feature_vector\n similarity = text_embedder.cosine_similarity(\n result_1.embeddings[0].feature_vector, result_2.embeddings[0].feature_vector)\n\nSee the\n[source code](https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/python/task/text/text_embedder.py)\nfor more options to configure `TextEmbedder`.\n\nExample results\n---------------\n\nCosine similarity between normalized feature vectors return a score between -1\nand 1. Higher is better, i.e. a cosine similarity of 1 means the two vectors are\nidentical. \n\n Cosine similarity: 0.954312\n\nTry out the simple\n[CLI demo tool for TextEmbedder](https://github.com/tensorflow/tflite-support/tree/master/tensorflow_lite_support/examples/task/text/desktop#textembedder)\nwith your own model and test data.\n\nModel compatibility requirements\n--------------------------------\n\nThe `TextEmbedder` API expects a TFLite model with mandatory\n[TFLite Model Metadata](../../models/metadata).\n\nThree main types of models are supported:\n\n- BERT-based models (see\n [source code](https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/cc/task/text/utils/bert_utils.h)\n for more details):\n\n - Exactly 3 input tensors (kTfLiteString)\n\n - IDs tensor, with metadata name \"ids\",\n - Mask tensor, with metadata name \"mask\".\n - Segment IDs tensor, with metadata name \"segment_ids\"\n - Exactly one output tensor (kTfLiteUInt8/kTfLiteFloat32)\n\n - with `N` components corresponding to the `N` dimensions of the returned feature vector for this output layer.\n - Either 2 or 4 dimensions, i.e. `[1 x N]` or `[1 x 1 x 1 x N]`.\n - An input_process_units for Wordpiece/Sentencepiece Tokenizer\n\n- Universal Sentence Encoder-based models (see\n [source code](https://github.com/tensorflow/tflite-support/blob/master/tensorflow_lite_support/cc/task/text/utils/universal_sentence_encoder_utils.h)\n for more details):\n\n - Exactly 3 input tensors (kTfLiteString)\n\n - Query text tensor, with metadata name \"inp_text\".\n - Response context tensor, with metadata name \"res_context\".\n - Response text tensor, with metadata name \"res_text\".\n - Exactly 2 output tensors (kTfLiteUInt8/kTfLiteFloat32)\n\n - Query encoding tensor, with metadata name \"query_encoding\".\n - Response encoding tensor, with metadata name \"response_encoding\".\n - Both with `N` components corresponding to the `N` dimensions of the returned feature vector for this output layer.\n - Both with either 2 or 4 dimensions, i.e. `[1 x N]` or `[1 x 1 x 1 x\n N]`.\n- Any text embedder model with:\n\n - An input text tensor (kTfLiteString)\n - At least one output embedding tensor (kTfLiteUInt8/kTfLiteFloat32)\n\n - with `N` components corresponding to the `N` dimensions of the returned feature vector for this output layer.\n - Either 2 or 4 dimensions, i.e. `[1 x N]` or `[1 x 1 x 1 x N]`."]]