Part 2

Connecting to Gemini

This page explains how the image transformer works — how an uploaded photo gets turned into AI-generated art using Google's Gemini API.

[Upload Image] → [Vercel Blob] → [Gemini Analyze] → [Gemini Generate] → [AI Art]
   Browser         Storage          Describe image      Create new image     Result

The Three Services

Vercel Blob

Cloud storage for uploaded images. When you drop an image onto bleau.ai, it gets sent to Vercel Blob and stored with a public URL. This URL is what we send to Gemini.

Gemini 2.0 Flash (Analysis)

Google's fast vision model. It looks at the uploaded image and writes a detailed description — subjects, colors, composition, lighting — everything an artist would need to recreate it.

Gemini 2.5 Flash (Image Generation)

Google's image generation model. It takes the description plus your chosen style (e.g. "Studio Ghibli") and generates a brand new image in that style.

Step by Step

Step	What Happens	Where
1	User drops an image onto the page	Browser (ImageUploader component)
2	Image uploads to Vercel Blob storage	API route: /api/upload
3	User picks a style and clicks Transform	Browser
4	Server fetches image, encodes it as base64	API route: /api/transform
5	Gemini 2.0 Flash analyzes the image	Google Gemini API
6	Gemini 2.5 Flash generates new art from description + style	Google Gemini API
7	Generated image sent back and displayed	Browser

How It All Connects

┌─────────────────────────────────────────────────────────────────┐
│                         BROWSER                                  │
│                                                                  │
│  ImageUploader component                                        │
│  1. Drag & drop image                                           │
│  2. XHR upload with progress bar                                │
│  3. Select style → click Transform                              │
└─────────────────────────────────────────────────────────────────┘
          │ POST /api/upload              │ POST /api/transform
          ▼                               ▼
┌──────────────────────────┐  ┌────────────────────────────────────┐
│      /api/upload         │  │         /api/transform             │
│                          │  │                                    │
│  Validates file          │  │  1. Fetches image from Blob URL   │
│  Uploads to Vercel Blob  │  │  2. Encodes as base64             │
│  Returns public URL      │  │  3. Sends to Gemini for analysis  │
└──────────────────────────┘  │  4. Sends to Gemini for generation│
          │                   │  5. Returns generated image        │
          ▼                   └────────────────────────────────────┘
┌──────────────────────────┐            │           │
│     VERCEL BLOB          │            ▼           ▼
│                          │  ┌──────────────┐ ┌─────────────────┐
│  Stores uploaded images  │  │ Gemini 2.0   │ │ Gemini 2.5      │
│  with public URLs        │  │ Flash        │ │ Flash           │
│                          │  │              │ │                 │
│  blob.vercel-storage.com │  │ "Describe    │ │ "Generate this  │
│                          │  │  this image" │ │  in Ghibli      │
└──────────────────────────┘  │              │ │  style"         │
                              └──────────────┘ └─────────────────┘

Files That Make It Work

src/
├── app/
│   ├── transform/
│   │   └── page.tsx            ← The transform page UI
│   └── api/
│       ├── upload/
│       │   └── route.ts        ← Handles file → Vercel Blob
│       └── transform/
│           └── route.ts        ← Handles Blob → Gemini → AI art
│
├── components/
│   └── ImageUploader.tsx       ← Drag & drop, progress bar, display
│
├── lib/
│   ├── constants.ts            ← API endpoints, Gemini config, styles
│   └── validations.ts          ← File size/type checks, URL validation
│
└── types/
    └── index.ts                ← TypeScript types for state & API

The Two Gemini API Calls

Call 1: Analyze the Image

We send the uploaded image to Gemini 2.0 Flash and ask it to describe the image in detail — like telling an artist what to paint.

POST /models/gemini-2.0-flash:generateContent

{
  "contents": [{
    "parts": [
      { "inlineData": { "mimeType": "image/jpeg", "data": "<base64>" } },
      { "text": "Describe this image in detail for an artist..." }
    ]
  }]
}

Call 2: Generate New Art

We take the description from Call 1, combine it with the user's chosen style, and send it to Gemini 2.5 Flash to generate a new image.

POST /models/gemini-2.5-flash-image:generateContent

{
  "contents": [{
    "parts": [{
      "text": "Create an image in Studio Ghibli style: <description>"
    }]
  }],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"]
  }
}

Why Two Steps?

The image generation model can't directly "see" an uploaded image and restyle it. Instead, we use a two-step approach:

Analyze — A vision model describes what's in the image
Generate — An image model creates something new from that description + your chosen style

This is like showing a photo to one artist and having them describe it to another artist who paints in a completely different style.

← Part 1: Publishing a Website How I Work →