Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

LiteRT-LM Web API

The Web API of LiteRT-LM for JavaScript and TypeScript in the browser. This is an early preview that supports text-in / text-out running in WebGPU.

Introduction

Here is a sample REPL chat app built with the JavaScript API:

<div id="out" style="white-space: pre-wrap; font-family: monospace;"></div>
<input id="in" onkeydown="if(event.key === 'Enter') repl(this)">

<script type="module">
  import { Engine } from 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm';
  const engine = await Engine.create({ model: '/path/to/model.litertlm' });
  const chat = await engine.createConversation();

  window.repl = async (el) => {
    const text = el.value;
    el.value = ''; // Clear immediately
    out.append(`\n>>> ${text}\nAI: `);

    for await (const chunk of chat.sendMessageStreaming(text)) {
      out.append(chunk.content[0].text);
    }
  };
</script>

Getting Started

LiteRT-LM is available as an npm package. You can install the latest version from npm or directly import it from a CDN:

# From npm
npm i --save @litert-lm/core

# From a CDN (in your JavaScript file)
import * as litertlm from 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm';

Initialize the Engine

The Engine is the entry point to the API. It handles model loading, session creation, and resource management. Remember to delete the engine to release resources when the model is no longer needed.

Note: Initializing the engine can take several seconds to load the model.

import {Engine, EngineSettings} from '@litert-lm/core';

const engineSettings = {
  model: 'url/path/to/model.litertlm', // or a ReadableStream, or a Blob
} satisfies EngineSettings;

const engine = await Engine.create(engineSettings);

// ... Use the engine to create a conversation ...

// Delete the engine when done.
await engine.delete();

Create a Conversation

Once the engine is initialized, create a Conversation instance. You can provide a ConversationConfig to customize its behavior.

const conversation = await engine.createConversation({
  preface: {
    messages: [
      {role: 'system', content: 'You are a helpful assistant'}
    ]
  }
});

conversation.sendMessage({
  role: 'user',
  content: 'Write a poem',
});

Send Messages

You can send messages with or without streaming.

Non-Streaming Example

// Simple string input
let response = await conversation.sendMessage("What is the capital of France?");
console.log(response.content[0].text);

// Or with full message structure
response = await conversation.sendMessage({role: 'user', content: '...'});

Streaming Example

// sendMessageStreaming returns a ReadableStream of response chunks
const stream = conversation.sendMessageStreaming('Tell me a long story.');

for await (const chunk of stream) {
  // Chunks are Records containing pieces of the response
  for (const item of chunk.content) {
    if (item.type === 'text') {
      console.log(item.text);
    }
  }
}

Cancel Generation

You can cancel an ongoing generation explicitly by calling cancel() on the Conversation instance:

// Cancel any ongoing generation
conversation.cancel();

If you are streaming the response, exiting the for await...of loop early (such as with break) will also automatically cancel the ongoing generation:

for await (const chunk of stream) {
  if (shouldStop()) {
    break; // Cancels the stream and underlying generation
  }
}