The Gemini API can process and run inference on PDF documents passed to it. When a PDF is uploaded, the Gemini API can:
- Describe or answer questions about the content
- Summarize the content
- Extrapolate from the content
This tutorial demonstrates some possible ways to prompt the Gemini API with provided PDF documents. All output is text-only.
Before you begin: Set up your project and API key
Before calling the Gemini API, you need to set up your project and configure your API key.
Get and secure your API key
You need an API key to call the Gemini API. If you don't already have one, create a key in Google AI Studio.
It's strongly recommended that you do not check an API key into your version control system.
This tutorial assumes that you're accessing your API key as an environment variable.
Technical details
Gemini 1.5 Pro and 1.5 Flash support a maximum of 3,600 document pages. Document pages must be in one of the following text data MIME types:
- PDF -
application/pdf
- JavaScript -
application/x-javascript
,text/javascript
- Python -
application/x-python
,text/x-python
- TXT -
text/plain
- HTML -
text/html
- CSS -
text/css
- Markdown -
text/md
- CSV -
text/csv
- XML -
text/xml
- RTF -
text/rtf
Each document page is equivalent to 258 tokens.
While there are no specific limits to the number of pixels in a document besides the model's context window, larger pages are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller pages are scaled up to 768x768 pixels. There is no cost reduction for pages at lower sizes, other than bandwidth, or performance improvement for pages at higher resolution.
For best results:
- Rotate pages to the correct orientation before uploading.
- Avoid blurry pages.
- If using a single page, place the text prompt after the page.
Upload a document and generate content
You can use the File API to upload a document of any size. Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB.
Call media.upload
to upload a file using the
File API. The following code uploads a document file and then uses the file in a
call to
models.generateContent
.
NUM_BYTES=$(wc -c < "${PDF_PATH}")
DISPLAY_NAME=TEXT
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "${BASE_URL}/upload/v1beta/files?key=${GOOGLE_API_KEY}" \
-D upload-header.tmp \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: application/pdf" \
-H "Content-Type: application/json" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${PDF_PATH}" 2> /dev/null > file_info.json
file_uri=$(jq ".file.uri" file_info.json)
echo file_uri=$file_uri
# Now generate content using that file
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"contents": [{
"parts":[
{"text": "Can you add a few more lines to this poem?"},
{"file_data":{"mime_type": "application/pdf", "file_uri": '$file_uri'}}]
}]
}' 2> /dev/null > response.json
cat response.json
echo
jq ".candidates[].content.parts[].text" response.json
Get metadata for a file
You can verify the API successfully stored the uploaded file and get its
metadata by calling files.get
. Only the name
(and by extension, the uri
) are unique.
name=$(jq ".file.name" file_info.json)
# Get the file of interest to check state
curl https://generativelanguage.googleapis.com/v1beta/files/$name > file_info.json
# Print some information about the file you got
name=$(jq ".file.name" file_info.json)
echo name=$name
file_uri=$(jq ".file.uri" file_info.json)
echo file_uri=$file_uri
List files
You can list all files uploaded using the File API and their URIs using
files.list
.
echo "My files: "
curl "https://generativelanguage.googleapis.com/v1beta/files?key=$GOOGLE_API_KEY"
Delete files
Files uploaded using the File API are automatically deleted after 2 days. You
can also manually delete them using
files.delete
.
curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/$name?key=$GOOGLE_API_KEY
What's next
This guide shows how to use
generateContent
and
to generate text outputs from processed documents. To learn more,
see the following resources:
- File prompting strategies: The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting.
- System instructions: System instructions let you steer the behavior of the model based on your specific needs and use cases.
- Safety guidance: Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs.