Gemini 1.5 Pro 2M 上下文窗口、代码执行功能和 Gemma 2 现已推出。了解详情

此页面由 Cloud Translation API 翻译。

教程：Gemini API 使用入门

本教程演示了如何使用 Android 版 Google AI 客户端 SDK 直接从 Android 应用访问 Gemini API。如果您不想直接使用 REST API 或服务器端代码（如 Python）访问 Android 应用中的 Gemini 模型，则可以使用此客户端 SDK。

在本教程中，您将了解如何执行以下操作：

设置您的项目，包括您的 API 密钥
根据纯文本输入生成文本
根据文本和图片输入生成文本（多模态）
建立多轮对话（聊天）
使用流式传输加快互动速度

此外，本教程还包含一些有关高级用例（如令牌计数）以及控制内容生成的选项。

考虑在设备端访问 Gemini

借助本教程中所述的 Android 版客户端 SDK，您可以访问在 Google 服务器上运行的 Gemini Pro 模型。对于涉及处理敏感数据、离线可用性或为常用用户流节省费用的用例，您可能需要考虑访问在设备端运行的 Gemini Nano。如需了解详情，请参阅 Android（设备端）教程。

前提条件

本教程假定您已熟悉如何使用 Android Studio 开发 Android 应用。

如需完成本教程，请确保您的开发环境和 Android 应用满足以下要求：

Android Studio（最新版本）
您的 Android 应用必须以 API 级别 21 或更高级别为目标平台。

设置项目

在调用 Gemini API 之前，您需要设置 Android 项目，其中包括设置 API 密钥、将 SDK 依赖项添加到 Android 项目以及初始化模型。

设置您的 API 密钥

如需使用 Gemini API，您需要 API 密钥。如果您还没有密钥，请在 Google AI Studio 中创建一个。

获取 API 密钥

保护您的 API 密钥

强烈建议您不要将 API 密钥签入版本控制系统。请改为将 API 密钥存储在 local.properties 文件中（该文件位于项目的根目录中，但从版本控制中排除），然后使用 Android 版 Secrets Gradle 插件以 build 配置变量的形式读取 API 密钥。

Kotlin

// Access your API key as a Build Configuration variable
val apiKey = BuildConfig.apiKey

Java

// Access your API key as a Build Configuration variable
String apiKey = BuildConfig.apiKey;

本教程中的所有代码段都采用了这一最佳实践。此外，如果您想查看 Secrets Gradle 插件的实现，可以查看此 SDK 的示例应用，或使用 Android Studio Iguana 的最新预览版，该模板具有 Gemini API Starter 模板（包含 local.properties 文件，可帮助您快速入门）。

将 SDK 依赖项添加到项目中

在您的模块（应用级）Gradle 配置文件（如 <project>/<app-module>/build.gradle.kts）中，添加 Google AI SDK for Android 的依赖项：

Kotlin

dependencies {
  // ... other androidx dependencies

  // add the dependency for the Google AI client SDK for Android
  implementation("com.google.ai.client.generativeai:generativeai:0.9.0")
}

Java

对于 Java，您需要添加两个额外的库。

dependencies {
    // ... other androidx dependencies

    // add the dependency for the Google AI client SDK for Android
    implementation("com.google.ai.client.generativeai:generativeai:0.9.0")

    // Required for one-shot operations (to use `ListenableFuture` from Guava Android)
    implementation("com.google.guava:guava:31.0.1-android")

    // Required for streaming operations (to use `Publisher` from Reactive Streams)
    implementation("org.reactivestreams:reactive-streams:1.0.4")
}

将您的 Android 项目与 Gradle 文件同步。

初始化生成模型

在进行任何 API 调用之前，您需要初始化生成模型：

Kotlin

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with most use cases
    modelName = "gemini-1.5-flash",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.apiKey
)

Java

对于 Java，您还需要初始化 GenerativeModelFutures 对象。

// Use a model that's applicable for your use case
// The Gemini 1.5 models are versatile and work with most use cases
GenerativeModel gm = new GenerativeModel(/* modelName */ "gemini-1.5-flash",
// Access your API key as a Build Configuration variable (see "Set up your API key" above)
    /* apiKey */ BuildConfig.apiKey);

// Use the GenerativeModelFutures Java compatibility layer which offers
// support for ListenableFuture and Publisher APIs
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

指定模型时，请注意以下事项：

请使用您的用例专用的模型（例如，gemini-1.5-flash 用于多模态输入）。在本指南中，每种实现的说明列出了每种用例的推荐模型。

注意：如需详细了解可用模型（包括其功能和速率限制），请参阅 Gemini 模型。如果默认设置不够用，我们还提供了请求提高速率限制的选项。

实现常见使用场景

现在您的项目已设置完毕，您可以探索如何使用 Gemini API 来实现不同的用例：

根据纯文本输入生成文本
根据文本和图片输入生成文本（多模态）
建立多轮对话（聊天）
使用流式传输加快互动速度

根据纯文本输入生成文本

当提示输入仅包含文本时，请使用 Gemini 1.5 模型或带有 generateContent 的 Gemini 1.0 Pro 模型生成文本输出：

Kotlin

请注意，generateContent() 是一个挂起函数，需要从协程作用域进行调用。如果您不熟悉协程，请参阅 Android 上的 Kotlin 协程。

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
    modelName = "gemini-1.5-flash",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.apiKey
)

val prompt = "Write a story about a magic backpack."
val response = generativeModel.generateContent(prompt)
print(response.text)

Java

请注意，generateContent() 会返回 ListenableFuture。如果您不熟悉此 API，请参阅有关使用 ListenableFuture 的 Android 文档。

// The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
GenerativeModel gm = new GenerativeModel(/* modelName */ "gemini-1.5-flash",
// Access your API key as a Build Configuration variable (see "Set up your API key" above)
    /* apiKey */ BuildConfig.apiKey);
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Content content = new Content.Builder()
    .addText("Write a story about a magic backpack.")
    .build();

Executor executor = // ...

ListenableFuture<GenerateContentResponse> response = model.generateContent(content);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

根据文本和图片输入生成文本（多模态）

Gemini 提供了多种可处理多模态输入的模型（Gemini 1.5 模型），让您可以输入文字和图片。请务必查看提示的图片要求。

当提示输入同时包含文本和图片时，使用包含 generateContent 的 Gemini 1.5 模型生成文本输出：

Kotlin

请注意，generateContent() 是一个挂起函数，需要从协程作用域进行调用。如果您不熟悉协程，请参阅 Android 上的 Kotlin 协程。

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
    modelName = "gemini-1.5-flash",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.apiKey
)

val image1: Bitmap = // ...
val image2: Bitmap = // ...

val inputContent = content {
    image(image1)
    image(image2)
    text("What's different between these pictures?")
}

val response = generativeModel.generateContent(inputContent)
print(response.text)

Java

请注意，generateContent() 会返回 ListenableFuture。如果您不熟悉此 API，请参阅有关使用 ListenableFuture 的 Android 文档。

// The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
GenerativeModel gm = new GenerativeModel(/* modelName */ "gemini-1.5-flash",
// Access your API key as a Build Configuration variable (see "Set up your API key" above)
    /* apiKey */ BuildConfig.apiKey);
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Bitmap image1 = // ...
Bitmap image2 = // ...

Content content = new Content.Builder()
    .addText("What's different between these pictures?")
    .addImage(image1)
    .addImage(image2)
    .build();

Executor executor = // ...

ListenableFuture<GenerateContentResponse> response = model.generateContent(content);
Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

建立多轮对话（聊天）

借助 Gemini，您可以跨多个回合构建自由形式的对话。该 SDK 通过管理对话状态来简化该过程，因此与 generateContent 不同，您无需自行存储对话历史记录。

如需构建多轮对话（如聊天），请使用 Gemini 1.5 模型或 Gemini 1.0 Pro 模型，并通过调用 startChat() 初始化对话。然后，使用 sendMessage() 发送一条新的用户消息，此消息也会将此消息和响应附加到聊天记录。

与对话内容相关联的 role 有两种可能的选项：

user：提供提示的角色。这是 sendMessage 调用的默认值。
model：提供响应的角色。使用现有的 history 调用 startChat() 时，可以使用此角色。

Kotlin

请注意，generateContent() 是一个挂起函数，需要从协程作用域进行调用。如果您不熟悉协程，请参阅 Android 上的 Kotlin 协程。

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with multi-turn conversations (like chat)
    modelName = "gemini-1.5-flash",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.apiKey
)

val chat = generativeModel.startChat(
    history = listOf(
        content(role = "user") { text("Hello, I have 2 dogs in my house.") },
        content(role = "model") { text("Great to meet you. What would you like to know?") }
    )
)

chat.sendMessage("How many paws are in my house?")

Java

请注意，generateContent() 会返回 ListenableFuture。如果您不熟悉此 API，请参阅有关使用 ListenableFuture 的 Android 文档。

// The Gemini 1.5 models are versatile and work with multi-turn conversations (like chat)
GenerativeModel gm = new GenerativeModel(/* modelName */ "gemini-1.5-flash",
// Access your API key as a Build Configuration variable (see "Set up your API key" above)
    /* apiKey */ BuildConfig.apiKey);
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

// (optional) Create previous chat history for context
Content.Builder userContentBuilder = new Content.Builder();
userContentBuilder.setRole("user");
userContentBuilder.addText("Hello, I have 2 dogs in my house.");
Content userContent = userContentBuilder.build();

Content.Builder modelContentBuilder = new Content.Builder();
modelContentBuilder.setRole("model");
modelContentBuilder.addText("Great to meet you. What would you like to know?");
Content modelContent = userContentBuilder.build();

List<Content> history = Arrays.asList(userContent, modelContent);

// Initialize the chat
ChatFutures chat = model.startChat(history);

// Create a new user message
Content.Builder userMessageBuilder = new Content.Builder();
userMessageBuilder.setRole("user");
userMessageBuilder.addText("How many paws are in my house?");
Content userMessage = userMessageBuilder.build();

Executor executor = // ...

// Send the message
ListenableFuture<GenerateContentResponse> response = chat.sendMessage(userMessage);

Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String resultText = result.getText();
        System.out.println(resultText);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

使用流式传输加快互动速度

默认情况下，模型会在完成整个生成过程后返回响应。通过不等待整个结果，您可以实现更快的互动，而是使用流式传输来处理部分结果。

以下示例展示了如何使用 generateContentStream 实现流式传输，以根据文本和图片输入提示生成文本。

Kotlin

请注意，generateContentStream() 是一个挂起函数，需要从协程作用域进行调用。如果您不熟悉协程，请参阅 Android 上的 Kotlin 协程。

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
    modelName = "gemini-1.5-flash",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.apiKey
)

val image1: Bitmap = // ...
val image2: Bitmap = // ...

val inputContent = content {
    image(image1)
    image(image2)
    text("What's the difference between these pictures?")
}

var fullResponse = ""
generativeModel.generateContentStream(inputContent).collect { chunk ->
    print(chunk.text)
    fullResponse += chunk.text
}

Java

此 SDK 中的 Java 流式传输方法会从 Reactive Streams 库返回 Publisher 类型。

// The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
GenerativeModel gm = new GenerativeModel(/* modelName */ "gemini-1.5-flash",
// Access your API key as a Build Configuration variable (see "Set up your API key" above)
    /* apiKey */ BuildConfig.apiKey);
GenerativeModelFutures model = GenerativeModelFutures.from(gm);

Bitmap image1 = // ...
Bitmap image2 = // ...

Content content = new Content.Builder()
    .addText("What's different between these pictures?")
    .addImage(image1)
    .addImage(image2)
    .build();

Publisher<GenerateContentResponse> streamingResponse =
    model.generateContentStream(content);

StringBuilder outputContent = new StringBuilder();

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        outputContent.append(chunk);
    }

    @Override
    public void onComplete() {
        System.out.println(outputContent);
    }

    @Override
    public void onError(Throwable t) {
        t.printStackTrace();
    }

    @Override
    public void onSubscribe(Subscription s) {
      s.request(Long.MAX_VALUE);
    }
});

对于纯文本输入和聊天用例，您可以使用类似的方法：

Kotlin

请注意，generateContentStream() 是一个挂起函数，需要从协程作用域进行调用。如果您不熟悉协程，请参阅 Android 上的 Kotlin 协程。

// Use streaming with text-only input
generativeModel.generateContentStream(inputContent).collect { chunk ->
    print(chunk.text)
}

// Use streaming with multi-turn conversations (like chat)
val chat = generativeModel.startChat()
chat.sendMessageStream(inputContent).collect { chunk ->
    print(chunk.text)
}

Java

此 SDK 中的 Java 流式传输方法会从 Reactive Streams 库返回 Publisher 类型。

// Use streaming with text-only input
Publisher<GenerateContentResponse> streamingResponse =
    model.generateContentStream(inputContent);

StringBuilder outputContent = new StringBuilder();

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        outputContent.append(chunk);
    }

    @Override
    public void onComplete() {
        System.out.println(outputContent);
    }

    @Override
    public void onSubscribe(Subscription s) {
      s.request(Long.MAX_VALUE);
    }

    // ... other methods omitted for brevity
});

// Use streaming with multi-turn conversations (like chat)
ChatFutures chat = model.startChat(history);

Publisher<GenerateContentResponse> streamingResponse =
    chat.sendMessageStream(inputContent);

StringBuilder outputContent = new StringBuilder();

streamingResponse.subscribe(new Subscriber<GenerateContentResponse>() {
    @Override
    public void onNext(GenerateContentResponse generateContentResponse) {
        String chunk = generateContentResponse.getText();
        outputContent.append(chunk);
    }

    @Override
    public void onComplete() {
        System.out.println(outputContent);
    }

    @Override
    public void onSubscribe(Subscription s) {
      s.request(Long.MAX_VALUE);
    }

    // ... other methods omitted for brevity
});

实现高级用例

本教程上一部分中介绍的常见用例可帮助您熟悉 Gemini API 的使用。本部分介绍了一些可能被视为更高级的用例。

调用函数

函数调用可让您更轻松地从生成模型获取结构化数据输出。然后，您可以使用这些输出来调用其他 API，并将相关响应数据返回给模型。换句话说，函数调用可帮助您将生成模型连接到外部系统，以便生成的内容包含最新且准确的信息。如需了解详情，请参阅函数调用教程。

计算词元数量

使用长提示时，在向模型发送任何内容之前统计词元数量可能会很有用。以下示例展示了如何针对各种用例使用 countTokens()：

Kotlin

请注意，countTokens() 是一个挂起函数，需要从协程作用域进行调用。如果您不熟悉协程，请参阅 Android 上的 Kotlin 协程。

// For text-only input
val (totalTokens) = generativeModel.countTokens("Write a story about a magic backpack.")

// For text-and-image input (multi-modal)
val multiModalContent = content {
    image(image1)
    image(image2)
    text("What's the difference between these pictures?")
}

val (totalTokens) = generativeModel.countTokens(multiModalContent)

// For multi-turn conversations (like chat)
val history = chat.history
val messageContent = content { text("This is the message I intend to send")}
val (totalTokens) = generativeModel.countTokens(*history.toTypedArray(), messageContent)

Java

请注意，countTokens() 会返回 ListenableFuture。如果您不熟悉此 API，请参阅有关使用 ListenableFuture 的 Android 文档。

Content text = new Content.Builder()
    .addText("Write a story about a magic backpack.")
    .build();

Executor executor = // ...

// For text-only input
ListenableFuture<CountTokensResponse> countTokensResponse = model.countTokens(text);

Futures.addCallback(countTokensResponse, new FutureCallback<CountTokensResponse>() {
    @Override
    public void onSuccess(CountTokensResponse result) {
        int totalTokens = result.getTotalTokens();
        System.out.println("TotalTokens = " + totalTokens);
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

// For text-and-image input
Bitmap image1 = // ...
Bitmap image2 = // ...

Content multiModalContent = new Content.Builder()
    .addImage(image1)
    .addImage(image2)
    .addText("What's different between these pictures?")
    .build();

ListenableFuture<CountTokensResponse> countTokensResponse = model.countTokens(multiModalContent);

// For multi-turn conversations (like chat)
List<Content> history = chat.getChat().getHistory();

Content messageContent = new Content.Builder()
    .addText("This is the message I intend to send")
    .build();

Collections.addAll(history, messageContent);

ListenableFuture<CountTokensResponse> countTokensResponse = model.countTokens(history.toArray(new Content[0]));

用于控制内容生成的选项

您可以通过配置模型参数和使用安全设置来控制内容生成。

配置模型参数

您发送到模型的每个提示都包含参数值，用于控制模型如何生成回答。对于不同的参数值，模型会生成不同的结果。详细了解模型参数。

Kotlin

val config = generationConfig {
    temperature = 0.9f
    topK = 16
    topP = 0.1f
    maxOutputTokens = 200
    stopSequences = listOf("red")
}

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with most use cases
    modelName = "gemini-1.5-flash",
    apiKey = BuildConfig.apiKey,
    generationConfig = config
)

Java

GenerationConfig.Builder configBuilder = new GenerationConfig.Builder();
configBuilder.temperature = 0.9f;
configBuilder.topK = 16;
configBuilder.topP = 0.1f;
configBuilder.maxOutputTokens = 200;
configBuilder.stopSequences = Arrays.asList("red");

GenerationConfig generationConfig = configBuilder.build();

// The Gemini 1.5 models are versatile and work with most use cases
GenerativeModel gm = new GenerativeModel(
    "gemini-1.5-flash",
    BuildConfig.apiKey,
    generationConfig
);

GenerativeModelFutures model = GenerativeModelFutures.from(gm);

使用安全设置

您可以使用安全设置来调整获得可能被视为有害响应的可能性。默认情况下，安全设置会在所有维度上屏蔽不安全内容的中等和/或高概率。详细了解安全设置。

设置一项安全设置的方法如下：

Kotlin

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with most use cases
    modelName = "gemini-1.5-flash",
    apiKey = BuildConfig.apiKey,
    safetySettings = listOf(
        SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.ONLY_HIGH)
    )
)

Java

SafetySetting harassmentSafety = new SafetySetting(HarmCategory.HARASSMENT,
    BlockThreshold.ONLY_HIGH);

// The Gemini 1.5 models are versatile and work with most use cases
GenerativeModel gm = new GenerativeModel(
    "gemini-1.5-flash",
    BuildConfig.apiKey,
    null, // generation config is optional
    Collections.singletonList(harassmentSafety)
);

GenerativeModelFutures model = GenerativeModelFutures.from(gm);

您还可以设定多项安全设置：

Kotlin

val harassmentSafety = SafetySetting(HarmCategory.HARASSMENT, BlockThreshold.ONLY_HIGH)

val hateSpeechSafety = SafetySetting(HarmCategory.HATE_SPEECH, BlockThreshold.MEDIUM_AND_ABOVE)

val generativeModel = GenerativeModel(
    // The Gemini 1.5 models are versatile and work with most use cases
    modelName = "gemini-1.5-flash",
    apiKey = BuildConfig.apiKey,
    safetySettings = listOf(harassmentSafety, hateSpeechSafety)
)

Java

SafetySetting harassmentSafety = new SafetySetting(HarmCategory.HARASSMENT,
    BlockThreshold.ONLY_HIGH);

SafetySetting hateSpeechSafety = new SafetySetting(HarmCategory.HATE_SPEECH,
    BlockThreshold.MEDIUM_AND_ABOVE);

// The Gemini 1.5 models are versatile and work with most use cases
GenerativeModel gm = new GenerativeModel(
    "gemini-1.5-flash",
    BuildConfig.apiKey,
    null, // generation config is optional
    Arrays.asList(harassmentSafety, hateSpeechSafety)
);

GenerativeModelFutures model = GenerativeModelFutures.from(gm);

后续步骤

提示设计是创建提示以从语言模型引出所需回复的过程。撰写结构合理的提示是确保语言模型做出准确优质响应的重要环节。了解提示撰写的最佳做法。
Gemini 提供多种模型变体，以满足不同应用场景的需求，例如输入类型和复杂程度、聊天或其他对话语言任务的实现以及大小限制。不妨了解可用的 Gemini 模型。
借助本教程中所述的 Android 版客户端 SDK，您可以访问在 Google 服务器上运行的 Gemini Pro 模型。对于涉及处理敏感数据、离线可用性或为常用用户流节省费用的用例，您可能需要考虑访问在设备端运行的 Gemini Nano。如需了解详情，请参阅 Android（设备端）教程。