Options for setting up an LlmInference
.
Nested Classes
class | LlmInference.LlmInferenceOptions.Builder | Builder for LlmInference.LlmInferenceOptions . |
Public Constructors
Public Methods
static LlmInference.LlmInferenceOptions.Builder |
builder()
Instantiates a new LlmInferenceOptions builder.
|
abstract Optional<ErrorListener> |
errorListener()
The error listener to use for the
ERROR(/LlmInference#generateAsync) API. |
abstract Optional<String> |
loraPath()
The absolute path to the LoRA model asset bundle stored locally on the device.
|
abstract int |
maxTokens()
The total length of the kv-cache.
|
abstract String |
modelPath()
The path that points to the tflite model file.
|
abstract int |
randomSeed()
Random seed for sampling tokens.
|
abstract Optional<ProgressListener<String>> |
resultListener()
The result listener to use for the
ERROR(/LlmInference#generateAsync) API. |
abstract float |
temperature()
Randomness when decoding the next token.
|
abstract int |
topK()
Top K number of tokens to be sampled from for each decoding step.
|
Inherited Methods
Public Constructors
public LlmInferenceOptions ()
Public Methods
public static LlmInference.LlmInferenceOptions.Builder builder ()
Instantiates a new LlmInferenceOptions builder.
public abstract Optional<ErrorListener> errorListener ()
The error listener to use for the ERROR(/LlmInference#generateAsync)
API.
public abstract Optional<String> loraPath ()
The absolute path to the LoRA model asset bundle stored locally on the device. This is only compatible with GPU models.
public abstract int maxTokens ()
The total length of the kv-cache. In other words, this is the total number of input + output tokens the model needs to handle.
public abstract int randomSeed ()
Random seed for sampling tokens.
public abstract Optional<ProgressListener<String>> resultListener ()
The result listener to use for the ERROR(/LlmInference#generateAsync)
API.
public abstract float temperature ()
Randomness when decoding the next token. A value of 0.0f means greedy decoding.
public abstract int topK ()
Top K number of tokens to be sampled from for each decoding step. A value of 1 means greedy decoding.