LiteRT Next adalah kumpulan API baru yang meningkatkan LiteRT, terutama
dalam hal akselerasi hardware dan performa untuk aplikasi
ML dan AI di perangkat. API ini adalah rilis alfa dan tersedia di Kotlin dan C++.
LiteRT Next Compiled Model API dibuat berdasarkan TensorFlow Lite Interpreter
API, dan menyederhanakan proses pemuatan dan eksekusi model untuk machine learning
di perangkat. API baru ini menyediakan cara baru yang disederhanakan untuk menggunakan akselerasi
hardware, sehingga Anda tidak perlu menangani FlatBuffers model, interoperabilitas buffer
I/O, dan delegasi. LiteRT Next API tidak kompatibel dengan
LiteRT API. Untuk menggunakan fitur dari LiteRT Next, lihat panduan Mulai
Mulai.
Menjalankan inferensi dengan LiteRT Next API melibatkan langkah-langkah utama berikut:
Muat model yang kompatibel.
Alokasikan buffer tensor input dan output.
Panggil model yang dikompilasi.
Baca inferensi ke buffer output.
Cuplikan kode berikut menunjukkan implementasi dasar dari seluruh proses di
Kotlin dan C++.
C++
// Load model and initialize runtimeLITERT_ASSIGN_OR_RETURN(automodel,Model::CreateFromFile("mymodel.tflite"));LITERT_ASSIGN_OR_RETURN(autoenv,Environment::Create({}));LITERT_ASSIGN_OR_RETURN(autocompiled_model,CompiledModel::Create(env,model,kLiteRtHwAcceleratorCpu));// Preallocate input/output buffersLITERT_ASSIGN_OR_RETURN(autoinput_buffers,compiled_model.CreateInputBuffers());LITERT_ASSIGN_OR_RETURN(autooutput_buffers,compiled_model.CreateOutputBuffers());// Fill the first inputfloatinput_values[]={/* your data */};input_buffers[0].Write<float>(absl::MakeConstSpan(input_values,/*size*/));// Invokecompiled_model.Run(input_buffers,output_buffers);// Read the outputstd::vector<float>data(output_data_size);output_buffers[0].Read<float>(absl::MakeSpan(data));
Kotlin
// Load model and initialize runtimevalmodel=CompiledModel.create(context.assets,"mymodel.tflite",CompiledModel.Options(Accelerator.CPU))// Preallocate input/output buffersvalinputBuffers=model.createInputBuffers()valoutputBuffers=model.createOutputBuffers()// Fill the first inputinputBuffers[0].writeFloat(FloatArray(data_size){data_value/* your data */})// Invokemodel.run(inputBuffers,outputBuffers)// Read the outputvaloutputFloatArray=outputBuffers[0].readFloat()
LiteRT Next berisi manfaat dan fitur utama berikut:
LiteRT API baru: Menyederhanakan pengembangan dengan pemilihan accelerator otomatis, eksekusi asinkron sejati, dan penanganan buffering I/O yang efisien.
Performa GPU Terbaik di Kelasnya: Menggunakan akselerasi GPU canggih untuk
ML di perangkat. Interoperabilitas buffering baru memungkinkan zero-copy dan
meminimalkan latensi di berbagai jenis buffering GPU.
Inferensi AI Generatif yang unggul: Mengaktifkan integrasi paling sederhana dengan
performa terbaik untuk model GenAI.
Unified NPU Acceleration: Menawarkan akses yang lancar ke NPU dari penyedia
chipset utama dengan pengalaman developer yang konsisten. Akselerasi LiteRT NPU
tersedia melalui Program Akses Awal.
Peningkatan utama
LiteRT Next (Compiled Model API) berisi peningkatan utama berikut pada
LiteRT (TFLite Interpreter API). Untuk panduan lengkap menyiapkan
aplikasi dengan LiteRT Next, lihat panduan Memulai.
Penggunaan akselerator: Menjalankan model di GPU dengan LiteRT memerlukan pembuatan delegator, panggilan fungsi, dan modifikasi grafik eksplisit. Dengan LiteRT
Berikutnya, cukup tentukan akselerator.
Interoperabilitas buffer hardware native: LiteRT tidak menyediakan
opsi buffer, dan memaksa semua data melalui memori CPU. Dengan LiteRT Next,
Anda dapat meneruskan Buffer Hardware Android (AHWB), buffer OpenCL, buffer
OpenGL, atau buffer khusus lainnya.
Eksekusi asinkron: LiteRT Next dilengkapi dengan API asinkron yang didesain ulang,
yang menyediakan mekanisme asinkron sejati berdasarkan pembatasan sinkronisasi. Hal ini memungkinkan waktu eksekusi
keseluruhan yang lebih cepat melalui penggunaan hardware yang beragam – seperti CPU,
GPU, CPU, dan NPU – untuk berbagai tugas.
Pemuatan model: LiteRT Next tidak memerlukan langkah builder terpisah saat
memuat model.
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Informasi yang saya butuhkan tidak ada","missingTheInformationINeed","thumb-down"],["Terlalu rumit/langkahnya terlalu banyak","tooComplicatedTooManySteps","thumb-down"],["Sudah usang","outOfDate","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Masalah kode / contoh","samplesCodeIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-09-03 UTC."],[],[],null,["# LiteRT Next Overview\n\n| **Experimental:** LiteRT Next is an alpha release and under active development.\n\nLiteRT Next is a new set of APIs that improves upon LiteRT, particularly in\nterms of hardware acceleration and performance for on-device ML and AI\napplications. The APIs are an alpha release and available in Kotlin and C++.\n\nThe LiteRT Next Compiled Model API builds on the TensorFlow Lite Interpreter\nAPI, and simplifies the model loading and execution process for on-device\nmachine learning. The new APIs provide a new streamlined way to use hardware\nacceleration, removing the need to deal with model FlatBuffers, I/O buffer\ninteroperability, and delegates. The LiteRT Next APIs are not compatible with\nthe LiteRT APIs. In order to use features from LiteRT Next, see the [Get\nStarted](./get_started) guide.\n\nFor example implementations of LiteRT Next, refer to the following demo\napplications:\n\n- [Image segmentation with Kotlin](https://github.com/google-ai-edge/LiteRT/tree/main/litert/samples/image_segmentation/kotlin_cpu_gpu/android)\n- [Asynchronous segmentation with C++](https://github.com/google-ai-edge/LiteRT/tree/main/litert/samples/async_segmentation)\n\nQuickstart\n----------\n\nRunning inference with the LiteRT Next APIs involves the following key steps:\n\n1. Load a compatible model.\n2. Allocate the input and output tensor buffers.\n3. Invoke the compiled model.\n4. Read the inferences into an output buffer.\n\nThe following code snippets show a basic implementation of the entire process in\nKotlin and C++. \n\n### C++\n\n // Load model and initialize runtime\n LITERT_ASSIGN_OR_RETURN(auto model, Model::CreateFromFile(\"mymodel.tflite\"));\n LITERT_ASSIGN_OR_RETURN(auto env, Environment::Create({}));\n LITERT_ASSIGN_OR_RETURN(auto compiled_model,\n CompiledModel::Create(env, model, kLiteRtHwAcceleratorCpu));\n\n // Preallocate input/output buffers\n LITERT_ASSIGN_OR_RETURN(auto input_buffers, compiled_model.CreateInputBuffers());\n LITERT_ASSIGN_OR_RETURN(auto output_buffers, compiled_model.CreateOutputBuffers());\n\n // Fill the first input\n float input_values[] = { /* your data */ };\n input_buffers[0].Write\u003cfloat\u003e(absl::MakeConstSpan(input_values, /*size*/));\n\n // Invoke\n compiled_model.Run(input_buffers, output_buffers);\n\n // Read the output\n std::vector\u003cfloat\u003e data(output_data_size);\n output_buffers[0].Read\u003cfloat\u003e(absl::MakeSpan(data));\n\n### Kotlin\n\n // Load model and initialize runtime\n val model =\n CompiledModel.create(\n context.assets,\n \"mymodel.tflite\",\n CompiledModel.Options(Accelerator.CPU)\n )\n\n // Preallocate input/output buffers\n val inputBuffers = model.createInputBuffers()\n val outputBuffers = model.createOutputBuffers()\n\n // Fill the first input\n inputBuffers[0].writeFloat(FloatArray(data_size) { data_value /* your data */ })\n\n // Invoke\n model.run(inputBuffers, outputBuffers)\n\n // Read the output\n val outputFloatArray = outputBuffers[0].readFloat()\n\nFor more information, see the [Get Started with Kotlin](./android_kotlin) and\n[Get Started with C++](./android_cpp) guides.\n\nKey features\n------------\n\nLiteRT Next contains the following key benefits and features:\n\n- **New LiteRT API**: Streamline development with automated accelerator selection, true async execution, and efficient I/O buffer handling.\n- **Best-in-class GPU Performance**: Use state-of-the-art GPU acceleration for on-device ML. The new buffer interoperability enables zero-copy and minimizes latency across various GPU buffer types.\n- **Superior Generative AI inference**: Enable the simplest integration with the best performance for GenAI models.\n- **Unified NPU Acceleration** : Offer seamless access to NPUs from major chipset providers with a consistent developer experience. LiteRT NPU acceleration is available through an [Early Access\n Program](https://forms.gle/CoH4jpLwxiEYvDvF6).\n\nKey improvements\n----------------\n\nLiteRT Next (Compiled Model API) contains the following key improvements on\nLiteRT (TFLite Interpreter API). For a comprehensive guide to setting up your\napplication with LiteRT Next, see the [Get Started](./get_started) guide.\n\n- **Accelerator usage**: Running models on GPU with LiteRT requires explicit delegate creation, function calls, and graph modifications. With LiteRT Next, just specify the accelerator.\n- **Native hardware buffer interoperability**: LiteRT does not provide the option of buffers, and forces all data through CPU memory. With LiteRT Next, you can pass in Android Hardware Buffers (AHWB), OpenCL buffers, OpenGL buffers, or other specialized buffers.\n- **Async execution**: LiteRT Next comes with a redesigned async API, providing a true async mechanism based on sync fences. This enables faster overall execution times through the use of diverse hardware -- like CPUs, GPUs, CPUs, and NPUs -- for different tasks.\n- **Model loading**: LiteRT Next does not require a separate builder step when loading a model."]]