Grailed Image Search
A visual search engine for Grailed that finds similar clothing items using AI-powered image and text embeddings.
Background
I'm a fashion enthusiast who shops on Grailed, a secondhand clothing marketplace. The platform has a search problem: you can only find items if you know the brand name or item name. For vintage pieces without clear labels, you're stuck. Even worse, if you like a piece but the condition or price isn't right, there's no way to find similar styles.
Grailed listings aren't standardized—the same item can look completely different depending on angles, lighting, and backgrounds. I wanted to search by what actually matters: visuals. Patterns, colors, silhouettes, details.
Solution
What I Built
A full-stack visual search engine with four components:
- Backend API — Scrapes listings, generates CLIP embeddings, handles similarity search
- Admin Dashboard — Manages scraping and embedding processes
- User Frontend — Displays search results
- Chrome Extension — Main entry point; click on any Grailed listing to find similar items
How It Works
I scraped ~2 million listings from Grailed (images, titles, metadata). Each image and title gets processed through OpenAI's CLIP model to generate embeddings—vector representations that capture visual and semantic meaning. Embeddings are stored in Pinecone for fast similarity search.
When you open the Chrome extension on a Grailed listing, it sends the image to my backend, searches the vector database, and returns visually similar items in the frontend.
The Challenge
This wasn't a single app—it was a complete data pipeline. The hardest part was deciding the tech stack and architecture:
- Scraping 2M listings with proper concurrency and checkpointing
- Cleaning and organizing massive datasets in MongoDB
- Batch processing images through CLIP efficiently
- Setting up Pinecone with the right indexing strategy
- Making the flow seamless (extension → backend → frontend)
I've discussed this project in multiple interviews because it touches on full-stack development, ML systems, and working with data at scale.
Tech Stack
Backend
- Python 3.12, Quart (async FastAPI) — API server
- PyTorch + Transformers — CLIP model for embeddings
- Playwright — Web scraping
- MongoDB — Raw data storage
- Pinecone — Vector similarity search
- Docker, Google Cloud Secret Manager — Deployment
Frontend
- Next.js 13+, React, Tailwind CSS — Dashboard and user interface
Browser Extension
- Chrome Extension API, JavaScript — Grailed integration
Pipeline
- 2M listings scraped and processed
- 512-dimensional CLIP embeddings for images and text
- Checkpoint system for resumable long-running processes
- RESTful API with Server-Sent Events for real-time logs
