Grailed Search
A visual search engine for Grailed that finds similar clothing items using AI-powered image and text embeddings.
Grailed Search
A visual search engine for Grailed that finds similar clothing items using AI-powered image and text embeddings.
Background
One day, I was shopping on Grailed, a secondhand clothing marketplace, and I found a khaki army jacket that I loved. Unfortunately, the condition was not good. I tried to find something similar, and it was when I found out a problem: Grailed only allowed you to find items if you know the brand name or item name. However, that jacket is a vintage piece, which means there is no clear labels, no name, nothing - I walked into a dead end.
Inpiration
The first idea that comes to my mind is "what if shoppers can search using how the clothings look like"? I took inspiration from one of the biggest Chinese e-commerce platform Taobao, where it had a feature that allows user to long-press to scan a product image and show other products that look similar. I wanted to create the same experience.
The Challenge
After some research, I found that Taobao built a search engine from ground up for the image search. The architecture is very close to the traditional index-based engine, which is CLEARLY out of my scope to achieve. In addition, the product I built was a third party app, which mean the searching and displaying have to happen outside of Grailed.
Design
To "cheat" on building the search engine mechanism, I immediately thought of using vector search. I found OpenAI's CLIP, a model that is capable of turning both text and image into the same vector dimension, which enable an additional benefit - not only users can search using image, they can now also use descriptive language. To handle the third-party constraints, my only option is to create a standalone platform to show the searched result, which means also bring data from Grailed to my place.
There I broke down the project into four components:
- Backend API — Scrapes listings, generates CLIP embeddings, handles similarity search
- Admin Dashboard — Visualizes and manages scraping and embedding processes
- Chrome Extension — Main entry point
- User Frontend — Displays search results
Process
I built an autonomous scraping system using Playwright, which scraped ~2 million listings from Grailed (images, titles, metadata). Each data point is stored in MongoDB first for checkpointing, then passed through CLIP model to generate embeddings—vector representations that capture visual and semantic meaning. Embeddings are stored in Pinecone for fast similarity search.
When user open the Chrome extension on a Grailed listing, it retrieves the images from the url and sends to the backend, searches the vector database, and returns visually similar items in the frontend.
Tech Stack
Backend
- Python 3.12, Quart (async FastAPI) — API server
- PyTorch + Transformers — CLIP model for embeddings
- Playwright — Web scraping
- MongoDB — Raw data storage
- Pinecone — Vector similarity search
- Docker, Google Cloud Secret Manager — Deployment
Frontend
- Next.js 13+, React, Tailwind CSS — Dashboard and user interface
- Chrome Extension API, JavaScript — Grailed integration