Explore document processing capabilities with the Gemini API

The Gemini API can process and run inference on PDF documents passed to it. When a PDF is uploaded, the Gemini API can:

  • Describe or answer questions about the content
  • Summarize the content
  • Extrapolate from the content

This tutorial demonstrates some possible ways to prompt the Gemini API with provided PDF documents. All output is text-only.

What's next

This guide shows how to use generateContent and to generate text outputs from processed documents. To learn more, see the following resources:

  • Prompting with media files: The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting.
  • System instructions: System instructions let you steer the behavior of the model based on your specific needs and use cases.
  • Safety guidance: Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs.