ขอแนะนํา Google AI Edge Portal: เปรียบเทียบประสิทธิภาพ AI บนอุปกรณ์ขอบในวงกว้าง ลงชื่อสมัครใช้เพื่อขอสิทธิ์เข้าถึงในช่วงเวอร์ชันตัวอย่างก่อนเปิดตัว

LiteRT CompiledModel Python API

CompiledModel API ของ LiteRT พร้อมใช้งานใน Python ซึ่งมีอินเทอร์เฟซระดับสูง สำหรับการคอมไพล์และเรียกใช้โมเดล TFLite ด้วยรันไทม์ LiteRT

คำแนะนำต่อไปนี้แสดงการอนุมาน CPU ขั้นพื้นฐานด้วย CompiledModel Python API

ติดตั้งแพ็กเกจ pip

ติดตั้งแพ็กเกจ pip ของ LiteRT ในสภาพแวดล้อม Python โดยทำดังนี้

pip install ai-edge-litert

การอนุมานพื้นฐาน

สร้าง`CompiledModel`

สร้างโมเดลที่คอมไพล์แล้วจากไฟล์ .tflite Wrapper ของ Python ปัจจุบัน จะคอมไพล์สำหรับ CPU โดยค่าเริ่มต้น

from ai_edge_litert.compiled_model import CompiledModel

model = CompiledModel.from_file("mymodel.tflite")

นอกจากนี้ คุณยังสร้างโมเดลที่คอมไพล์จากบัฟเฟอร์ในหน่วยความจำได้ด้วย

from ai_edge_litert.compiled_model import CompiledModel

with open("mymodel.tflite", "rb") as f:
  model = CompiledModel.from_buffer(f.read())

สร้างบัฟเฟอร์อินพุตและเอาต์พุต

สร้างโครงสร้างข้อมูล (บัฟเฟอร์) ที่จำเป็นเพื่อจัดเก็บข้อมูลอินพุตที่คุณจะป้อนลงในโมเดลสำหรับการอนุมาน และข้อมูลเอาต์พุตที่โมเดลสร้างขึ้นหลังจากเรียกใช้การอนุมาน

signature_index = 0
input_buffers = model.create_input_buffers(signature_index)
output_buffers = model.create_output_buffers(signature_index)

signature_index ค่าของ 0 จะเลือกอักขระแรกในโมเดล

หากใช้หน่วยความจำ CPU ให้กรอกข้อมูลโดยเขียนอาร์เรย์ NumPy ลงในบัฟเฟอร์อินพุตโดยตรง

import numpy as np

input_data = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffers[0].write(input_data)

เรียกใช้โมเดล

ระบุบัฟเฟอร์อินพุตและเอาต์พุต แล้วเรียกใช้โมเดล

model.run_by_index(signature_index, input_buffers, output_buffers)

เรียกข้อมูลเอาต์พุต

ดึงข้อมูลผลลัพธ์โดยการอ่านเอาต์พุตของโมเดลจากหน่วยความจำโดยตรง

import numpy as np

# Replace num_elements with the size of your model's output tensor.
num_elements = 4
output_array = output_buffers[0].read(num_elements, np.float32)

ใช้ `TensorBuffer`

LiteRT รองรับความสามารถในการทำงานร่วมกันของบัฟเฟอร์ I/O ผ่าน TensorBuffer API ซึ่งรองรับการเขียนอาร์เรย์ NumPy (write) และการอ่านอาร์เรย์ NumPy (read) โดยมี dtypes ที่รองรับคือ np.float32, np.int32 และ np.int8

นอกจากนี้ คุณยังสร้างบัฟเฟอร์ที่สำรองข้อมูลโดยใช้หน่วยความจำโฮสต์ที่มีอยู่ได้ด้วย โดยทำดังนี้

import numpy as np
from ai_edge_litert.tensor_buffer import TensorBuffer

input_array = np.array([[1.0, 2.0, 3.0, 4.0]], dtype=np.float32)
input_buffer = TensorBuffer.create_from_host_memory(input_array)

หากต้องการเรียกใช้ตามชื่อลายเซ็น ให้ตรวจสอบลายเซ็นของโมเดลก่อน แล้วระบุ แมปจากชื่ออินพุต/เอาต์พุตไปยังอินสแตนซ์ TensorBuffer

from ai_edge_litert.tensor_buffer import TensorBuffer

signatures = model.get_signature_list()
# Example signature structure:
# {"serving_default": {"inputs": ["input_0"], "outputs": ["output_0"]}}

input_buffer = TensorBuffer.create_from_host_memory(input_array)
output_buffer = model.create_output_buffer_by_name("serving_default", "output_0")

model.run_by_name(
  "serving_default",
  {"input_0": input_buffer},
  {"output_0": output_buffer},
)

ดูมุมมองที่สมบูรณ์ยิ่งขึ้นเกี่ยวกับวิธีติดตั้งใช้งาน TensorBuffer API ได้ที่ซอร์สโค้ดใน TensorBuffer

ใช้ตัวเร่ง GPU

หากมี GPU คุณจะใช้ได้เพียงแค่เพิ่มตัวเลือก HardwareAccelerator.GPU ไปยัง API การสร้าง CompiledModel

from ai_edge_litert.compiled_model import CompiledModel
from ai_edge_litert.compiled_model import HardwareAccelerator

model = CompiledModel.from_file("mymodel.tflite", HardwareAccelerator.GPU)

ดูว่าแพลตฟอร์มของคุณรองรับแบ็กเอนด์ใดบ้างได้ที่นี่