Part 2
Connecting to Gemini
This page explains how the image transformer works — how an uploaded photo gets turned into AI-generated art using Google's Gemini API.
[Upload Image] → [Vercel Blob] → [Gemini Analyze] → [Gemini Generate] → [AI Art] Browser Storage Describe image Create new image Result
The Three Services
Vercel Blob
Cloud storage for uploaded images. When you drop an image onto bleau.ai, it gets sent to Vercel Blob and stored with a public URL. This URL is what we send to Gemini.
Gemini 2.0 Flash (Analysis)
Google's fast vision model. It looks at the uploaded image and writes a detailed description — subjects, colors, composition, lighting — everything an artist would need to recreate it.
Gemini 2.5 Flash (Image Generation)
Google's image generation model. It takes the description plus your chosen style (e.g. "Studio Ghibli") and generates a brand new image in that style.
Step by Step
| Step | What Happens | Where |
|---|---|---|
| 1 | User drops an image onto the page | Browser (ImageUploader component) |
| 2 | Image uploads to Vercel Blob storage | API route: /api/upload |
| 3 | User picks a style and clicks Transform | Browser |
| 4 | Server fetches image, encodes it as base64 | API route: /api/transform |
| 5 | Gemini 2.0 Flash analyzes the image | Google Gemini API |
| 6 | Gemini 2.5 Flash generates new art from description + style | Google Gemini API |
| 7 | Generated image sent back and displayed | Browser |
How It All Connects
┌─────────────────────────────────────────────────────────────────┐
│ BROWSER │
│ │
│ ImageUploader component │
│ 1. Drag & drop image │
│ 2. XHR upload with progress bar │
│ 3. Select style → click Transform │
└─────────────────────────────────────────────────────────────────┘
│ POST /api/upload │ POST /api/transform
▼ ▼
┌──────────────────────────┐ ┌────────────────────────────────────┐
│ /api/upload │ │ /api/transform │
│ │ │ │
│ Validates file │ │ 1. Fetches image from Blob URL │
│ Uploads to Vercel Blob │ │ 2. Encodes as base64 │
│ Returns public URL │ │ 3. Sends to Gemini for analysis │
└──────────────────────────┘ │ 4. Sends to Gemini for generation│
│ │ 5. Returns generated image │
▼ └────────────────────────────────────┘
┌──────────────────────────┐ │ │
│ VERCEL BLOB │ ▼ ▼
│ │ ┌──────────────┐ ┌─────────────────┐
│ Stores uploaded images │ │ Gemini 2.0 │ │ Gemini 2.5 │
│ with public URLs │ │ Flash │ │ Flash │
│ │ │ │ │ │
│ blob.vercel-storage.com │ │ "Describe │ │ "Generate this │
│ │ │ this image" │ │ in Ghibli │
└──────────────────────────┘ │ │ │ style" │
└──────────────┘ └─────────────────┘Files That Make It Work
src/
├── app/
│ ├── transform/
│ │ └── page.tsx ← The transform page UI
│ └── api/
│ ├── upload/
│ │ └── route.ts ← Handles file → Vercel Blob
│ └── transform/
│ └── route.ts ← Handles Blob → Gemini → AI art
│
├── components/
│ └── ImageUploader.tsx ← Drag & drop, progress bar, display
│
├── lib/
│ ├── constants.ts ← API endpoints, Gemini config, styles
│ └── validations.ts ← File size/type checks, URL validation
│
└── types/
└── index.ts ← TypeScript types for state & APIThe Two Gemini API Calls
Call 1: Analyze the Image
We send the uploaded image to Gemini 2.0 Flash and ask it to describe the image in detail — like telling an artist what to paint.
POST /models/gemini-2.0-flash:generateContent
{
"contents": [{
"parts": [
{ "inlineData": { "mimeType": "image/jpeg", "data": "<base64>" } },
{ "text": "Describe this image in detail for an artist..." }
]
}]
}Call 2: Generate New Art
We take the description from Call 1, combine it with the user's chosen style, and send it to Gemini 2.5 Flash to generate a new image.
POST /models/gemini-2.5-flash-image:generateContent
{
"contents": [{
"parts": [{
"text": "Create an image in Studio Ghibli style: <description>"
}]
}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"]
}
}Why Two Steps?
The image generation model can't directly "see" an uploaded image and restyle it. Instead, we use a two-step approach:
- Analyze — A vision model describes what's in the image
- Generate — An image model creates something new from that description + your chosen style
This is like showing a photo to one artist and having them describe it to another artist who paints in a completely different style.