LlmInference.LlmInferenceOptions

public static abstract class LlmInference.LlmInferenceOptions

Options for setting up an LlmInference.

Nested Classes

class LlmInference.LlmInferenceOptions.Builder Builder for LlmInference.LlmInferenceOptions

Public Constructors

Public Methods

static LlmInference.LlmInferenceOptions.Builder
builder()
Instantiates a new LlmInferenceOptions builder.
abstract Optional<ErrorListener>
errorListener()
The error listener to use for the ERROR(/LlmInference#generateAsync) API.
abstract Optional<String>
loraPath()
The absolute path to the LoRA model asset bundle stored locally on the device.
abstract int
maxTokens()
The total length of the kv-cache.
abstract String
modelPath()
The path that points to the tflite model file.
abstract int
randomSeed()
Random seed for sampling tokens.
abstract Optional<ProgressListener<String>>
resultListener()
The result listener to use for the ERROR(/LlmInference#generateAsync) API.
abstract float
temperature()
Randomness when decoding the next token.
abstract int
topK()
Top K number of tokens to be sampled from for each decoding step.

Inherited Methods

Public Constructors

public LlmInferenceOptions ()

Public Methods

public static LlmInference.LlmInferenceOptions.Builder builder ()

Instantiates a new LlmInferenceOptions builder.

public abstract Optional<ErrorListener> errorListener ()

The error listener to use for the ERROR(/LlmInference#generateAsync) API.

public abstract Optional<String> loraPath ()

The absolute path to the LoRA model asset bundle stored locally on the device. This is only compatible with GPU models.

public abstract int maxTokens ()

The total length of the kv-cache. In other words, this is the total number of input + output tokens the model needs to handle.

public abstract String modelPath ()

The path that points to the tflite model file.

public abstract int randomSeed ()

Random seed for sampling tokens.

public abstract Optional<ProgressListener<String>> resultListener ()

The result listener to use for the ERROR(/LlmInference#generateAsync) API.

public abstract float temperature ()

Randomness when decoding the next token. A value of 0.0f means greedy decoding.

public abstract int topK ()

Top K number of tokens to be sampled from for each decoding step. A value of 1 means greedy decoding.