This guide covers the core workflow: discovering product listings via
POST /v1/crawl, browsing discovered data via GET /v1/vendors, POST /v1/collections, and POST /v1/listings, and extracting products via POST /v1/products.Step 1: Crawl a Vendor
Start by crawling a vendor website to discover all their collections and product listings. UsePOST /v1/crawl with the vendor URL:
- Discovers collections from the vendor (e.g., “New In”, “Sale”, “Shoes”)
- Extracts product listings from each collection
- Start a crawl by calling
POST /v1/crawlwith the vendor URL - Receive an
execution_idimmediately (format:crawl-{hostname}-{uuid}) - Poll
GET /v1/crawl/{execution_id}to check progress:- Status will be
"pending","running","completed", or"failed" - When
status === "completed", you’ll seetotal_listings_foundindicating how many products were discovered
- Status will be
Billing Requirement: Crawl requests require auto top-up to be enabled in your billing settings. This ensures you have sufficient credits to complete the crawl operation.
Step 2: Browse Discovered Data
After crawling, use the three browse endpoints to explore and curate which products you want to extract. These endpoints return data from vendors you have crawled.List your vendors
UseGET /v1/vendors to see all vendors you have crawled and their product counts:
- Review which vendors you have indexed
- Check
product_countto see how many listings were discovered - Use
latest_product_update_by_catalogto see when data was last refreshed
Browse collections
UsePOST /v1/collections to explore a vendor’s collections:
- Retrieve collections (e.g., “New In”, “Sale”, “Shoes”)
- Decide which collections to include in your index (e.g., only “New Arrivals” or “Top Sellers”)
Curate product listings
UsePOST /v1/listings to page through product listings for a vendor or collection. As you browse:
- Review the lightweight listing data (title, URL, collection, timestamps)
- Curate which products you want to extract full data for
- Store the canonical product URLs for listings you want to index
Step 3: Extract Full Product Data
Once you have curated your list of product URLs, usePOST /v1/products to get full product data with AI enrichment, reviews, and image tags.
Typical flow:
- Pull a batch of URLs from your curated list (up to 1000 URLs per request)
- Call
POST /v1/productswith your URLs: - Receive an
execution_id(format:products-batch-{uuid}) and pollGET /v1/products/{execution_id} - Upsert the extracted products into your index (search engine, DB, vector store, etc.)
Extracting from Any URL Source
ThePOST /v1/products endpoint accepts product URLs from any source—not just URLs discovered through crawling. This gives you flexibility to build your index from multiple sources:
Common use cases:
- Affiliate feeds: Extract products from affiliate network URLs
- Merchant feeds: Process product URLs from partner data feeds
- Internal catalogs: Index products from your own product database
- Hand-curated lists: Extract specific products you’ve manually selected
- Competitor monitoring: Track products from URLs you’ve collected
execution_id and poll for results.
Keeping Your Index Fresh
To maintain a high-quality product index: Schedule re-crawling: Periodically re-crawl vendor websites to discover new product listings and collections. Schedule re-extraction: Re-run extraction to capture price, availability, and content changes for existing products. Monitor failures: Usesuccess and outcome fields to detect:
- Non-product URLs
- Unsupported vendors
- Products that have been removed