Grailed Search

A visual search engine for Grailed that finds similar clothing items using AI-powered image and text embeddings.

2024 · Web App · Github

Grailed Search

2024 · Web App · Github

A visual search engine for Grailed that finds similar clothing items using AI-powered image and text embeddings.

Background

One day, I was shopping on Grailed, a secondhand clothing marketplace, and I found a khaki army jacket that I loved. Unfortunately, the condition was not good. I tried to find something similar, and it was when I found out a problem: Grailed only allowed you to find items if you know the brand name or item name. However, that jacket is a vintage piece, which means there is no clear labels, no name, nothing - I walked into a dead end.

Inpiration

The first idea that comes to my mind is "what if shoppers can search using how the clothings look like"? I took inspiration from one of the biggest Chinese e-commerce platform Taobao, where it had a feature that allows user to long-press to scan a product image and show other products that look similar. I wanted to create the same experience.

The Challenge

After some research, I found that Taobao built a search engine from ground up for the image search. The architecture is very close to the traditional index-based engine, which is CLEARLY out of my scope to achieve. In addition, the product I built was a third party app, which mean the searching and displaying have to happen outside of Grailed.

Design

To "cheat" on building the search engine mechanism, I immediately thought of using vector search. I found OpenAI's CLIP, a model that is capable of turning both text and image into the same vector dimension, which enable an additional benefit - not only users can search using image, they can now also use descriptive language. To handle the third-party constraints, my only option is to create a standalone platform to show the searched result, which means also bring data from Grailed to my place.

There I broke down the project into four components:

Backend API — Scrapes listings, generates CLIP embeddings, handles similarity search
Admin Dashboard — Visualizes and manages scraping and embedding processes
Chrome Extension — Main entry point
User Frontend — Displays search results

Process

I built an autonomous scraping system using Playwright, which scraped ~2 million listings from Grailed (images, titles, metadata). Each data point is stored in MongoDB first for checkpointing, then passed through CLIP model to generate embeddings—vector representations that capture visual and semantic meaning. Embeddings are stored in Pinecone for fast similarity search.

When user open the Chrome extension on a Grailed listing, it retrieves the images from the url and sends to the backend, searches the vector database, and returns visually similar items in the frontend.

Tech Stack

Backend

Python 3.12, Quart (async FastAPI) — API server
PyTorch + Transformers — CLIP model for embeddings
Playwright — Web scraping
MongoDB — Raw data storage
Pinecone — Vector similarity search
Docker, Google Cloud Secret Manager — Deployment

Frontend

Next.js 13+, React, Tailwind CSS — Dashboard and user interface
Chrome Extension API, JavaScript — Grailed integration

Grailed Search

Grailed Search

Background

Inpiration

The Challenge

Design

Process

Tech Stack

Backend

Frontend

®AYMOND