Skip to main content

Gemini CLI: Comprehensive Reference & Cheat Sheet

This document is a high-density, comprehensive reference for interacting with Google's Gemini models via the command line. It is intended for users who are already familiar with the basics and need a quick way to look up specific commands, API parameters, and advanced configurations.

This guide is structured in two parts:

  1. Interactive gemini-cli Reference: Covers the commands and features of the official, open-source gemini-cli tool.
  2. Vertex AI API Reference: Details the raw API endpoints, request/response bodies, and parameters for direct access using tools like curl.

Part 1: Interactive gemini-cli Reference

This section covers the official gemini-cli tool, which provides a rich, conversational experience in the terminal.

Installation & Execution

The tool is run directly using npx, which requires Node.js v20+.

# Run the latest version of the Gemini CLI
npx https://github.com/google-gemini/gemini-cli

Authentication

Authentication is handled via the gcloud CLI and Application Default Credentials (ADC).

# Log in and set up ADC for your machine
gcloud auth application-default login

# Set your active Google Cloud project
gcloud config set project YOUR_PROJECT_ID

In-Tool Commands

Once the CLI is running, you can use these commands inside the > prompt.

CommandDescription
/helpDisplays a list of available commands and tools.
/historyShows the conversation history for the current session.
/clearClears the current terminal screen and conversation history.
/authRestarts the authentication flow to switch Google Cloud projects or authentication methods.
/quitExits the Gemini CLI application. Ctrl+C also works.

Built-in Tools

The interactive CLI comes with powerful tools that give it context about your local environment and the web.

ToolUsageDescription
@file@file path/to/your/file.jsReads the content of a local file and adds it to the context of your prompt. You can reference multiple files. This is essential for asking questions about your code.
@web@web "latest news on AI"Performs a web search and adds the results to the context. This allows the model to answer questions about current events or topics not in its training data.

Example using tools:

> @file src/api.ts @file src/database.ts Based on these files, what could be causing the latency issue?

Configuration

The interactive gemini-cli is designed to be zero-config. Configuration, such as the Google Cloud project and region, is handled through the initial prompts on first run. To change these settings, you can use the /auth command to re-initialize the configuration.


Part 2: Vertex AI Gemini API Reference (curl)

This section provides a detailed reference for interacting directly with the Vertex AI Gemini API endpoint. This method is ideal for scripting, automation, and integration into other applications.

API Endpoint Structure

The generic endpoint for the Gemini API is: https://{region}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{region}/publishers/google/models/{model_id}:{method}

  • {region}: The Google Cloud region for your request (e.g., us-central1).
  • {project_id}: Your Google Cloud Project ID.
  • {model_id}: The specific model you want to use.
    • gemini-1.5-pro-preview-0409 (Latest Pro model)
    • gemini-1.5-flash-preview-0514 (Fastest Pro model)
    • gemini-1.0-pro-vision (For multimodal prompts)
    • gemini-1.0-pro (General purpose)
  • {method}: The API method to call.
    • generateContent: For single-turn, non-streaming responses.
    • streamGenerateContent: For streaming responses.

Authentication Token

Use gcloud to print a short-lived access token for the Authorization header.

# Command to generate the bearer token
gcloud auth application-default print-access-token

Master API Request Body

Below is a comprehensive example of a JSON request body, demonstrating most of the available top-level objects.

{
"contents": [
{
"role": "user",
"parts": [
{"text": "What is the weather like in Boston?"}
]
}
],
"tools": [
{
"function_declarations": [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "OBJECT",
"properties": {
"location": {"type": "STRING", "description": "The city and state, e.g. San Francisco, CA"}
},
"required": ["location"]
}
}
]
}
],
"safetySettings": [
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_LOW_AND_ABOVE"
}
],
"generationConfig": {
"temperature": 0.4,
"topP": 1.0,
"maxOutputTokens": 2048,
"response_mime_type": "application/json"
}
}

generationConfig Parameters

This object controls the generative output of the model.

ParameterTypeDescription
temperaturenumberControls randomness. Lower values (e.g., 0.2) are more deterministic. Higher values (e.g., 1.0) are more creative. Range: [0.0, 2.0]
topPnumberNucleus sampling. The cumulative probability of tokens to consider. Range: [0.0, 1.0]
topKintegerTop-k sampling. The number of most likely tokens to consider.
maxOutputTokensintegerThe maximum number of tokens to generate in the response.
stopSequencesarray of stringsA list of sequences that will cause the model to stop generating. e.g., ["\n\n"]
response_mime_typestringSets the output format. Use "application/json" to force the model to generate a valid JSON object.

safetySettings Parameters

This object allows you to adjust the content safety filters.

Categories (category):

  • HARM_CATEGORY_HARASSMENT
  • HARM_CATEGORY_HATE_SPEECH
  • HARM_CATEGORY_SEXUALLY_EXPLICIT
  • HARM_CATEGORY_DANGEROUS_CONTENT

Thresholds (threshold):

  • BLOCK_NONE: Blocks nothing (with exceptions for severe harm).
  • BLOCK_ONLY_HIGH: Blocks content with a high probability of being harmful.
  • BLOCK_MEDIUM_AND_ABOVE: (Default) Blocks medium and high probability.
  • BLOCK_LOW_AND_ABOVE: Blocks low, medium, and high probability.

tools and Function Calling

To enable function calling, provide a tools object containing function_declarations. The model will not execute the function, but will return a functionCall object in its response, which your code can then use to execute the function.

API Response Body Structure

A successful response from the generateContent endpoint will look like this:

{
"candidates": [
{
"content": {
"role": "model",
"parts": [
{"text": "The model's response text goes here."}
]
},
"finishReason": "STOP",
"safetyRatings": [
{"category": "HARM_CATEGORY_...", "probability": "NEGLIGIBLE"}
]
}
],
"usageMetadata": {
"promptTokenCount": 15,
"candidatesTokenCount": 25,
"totalTokenCount": 40
}
}
  • candidates: An array of possible responses. Usually contains one.
  • finishReason: Why the model stopped. STOP is a normal completion. MAX_TOKENS means it hit the limit. SAFETY means it was blocked.
  • safetyRatings: A report on the safety assessment of the response.
  • usageMetadata: The number of tokens used for the prompt and response.

Multimodality Request Payloads

To send images or other non-text data, add more objects to the parts array.

Image via Base64:

{
"text": "Describe this image:",
"inline_data": {
"mime_type": "image/jpeg",
"data": "/9j/4AAQSkZJRgABAQ..."
}
}

File via Google Cloud Storage:

{
"text": "Summarize this PDF document:",
"file_data": {
"mime_type": "application/pdf",
"file_uri": "gs://your-bucket-name/document.pdf"
}
}

This reference provides the core details needed for advanced and scripted interactions with the Gemini API. For the most current list of models and parameters, always consult the official Google Cloud Vertex AI documentation.