This guide covers the core workflow: discovering product listings via
POST /v2/crawl, browsing discovered data via GET /v2/vendors, POST /v2/collections, and POST /v2/listings, and extracting products via POST /v2/extract.Step 1: Crawl a Vendor
Start by crawling a vendor website to discover all their collections and product listings. UsePOST /v2/crawl with the vendor URL:
- Discovers collections from the vendor (e.g., “New In”, “Sale”, “Shoes”)
- Extracts product listings from each collection
- Start a crawl by calling
POST /v2/crawlwith the vendor URL - Receive an
execution_idimmediately (format:crawl-{hostname}-{uuid}) - Poll
GET /v2/crawl/{execution_id}to check progress:- Status will be
"pending","running","completed", or"failed" - When
status === "completed", you’ll seetotal_listings_foundindicating how many products were discovered
- Status will be
Billing Requirement: Crawl requests require auto top-up to be enabled in your billing settings. This ensures you have sufficient credits to complete the crawl operation.
Step 2: Browse Discovered Data
After crawling, use the three browse endpoints to explore and curate which products you want to extract. These endpoints return data from vendors you have crawled.List your vendors
UseGET /v2/vendors to see all vendors you have crawled and their product counts:
- Review which vendors you have indexed
- Check
product_countto see how many listings were discovered - Use
latest_product_update_by_catalogto see when data was last refreshed
Browse collections
UsePOST /v2/collections to explore a vendor’s collections:
- Retrieve collections (e.g., “New In”, “Sale”, “Shoes”)
- Decide which collections to include in your index (e.g., only “New Arrivals” or “Top Sellers”)
Curate product listings
UsePOST /v2/listings to page through product listings for a vendor or collection. As you browse:
- Review the lightweight listing data (title, URL, collection, timestamps)
- Curate which products you want to extract full data for
- Store the canonical product URLs for listings you want to index
Step 3: Extract Full Product Data
Once you have curated your list of product URLs, usePOST /v2/extract to get full product data with AI enrichment, reviews, and image tags.
Typical flow:
- Pull a batch of URLs from your curated list (up to 1000 URLs per request)
- Call
POST /v2/extractwith your URLs: - Receive an
execution_id(format:extract-urls-{uuid}) and pollGET /v2/extract/{execution_id} - Upsert the extracted products into your index (search engine, DB, vector store, etc.)
Shortcut: Extract All Products from a Vendor
If you want to extract all products from a vendor without browsing and curating, you can usePOST /v2/extract with the vendor parameter instead of urls. This skips the browse step entirely.
Typical flow:
-
Start extraction by calling
POST /v2/extractwith the vendor: -
Optionally provide a
crawl_idto wait for crawl completion before starting extraction: -
Receive an
execution_idimmediately (format:extract-{vendor}-{uuid}) -
Poll
GET /v2/extract/{execution_id}:- Check
statusandmeta.progressfor real-time progress - When
status === "completed", results are available with pagination - Use
pageandpage_sizequery parameters to retrieve results in chunks
- Check
- Upsert the extracted products into your index (search engine, DB, vector store, etc.)
When to use vendor-based extraction: This shortcut is ideal when you want a complete catalog from a vendor. If you only need specific collections or a curated subset of products, use the browse + URL extraction flow instead.
Extracting from Any URL Source
ThePOST /v2/extract endpoint accepts product URLs from any source—not just URLs discovered through crawling. This gives you flexibility to build your index from multiple sources:
Common use cases:
- Affiliate feeds: Extract products from affiliate network URLs
- Merchant feeds: Process product URLs from partner data feeds
- Internal catalogs: Index products from your own product database
- Hand-curated lists: Extract specific products you’ve manually selected
- Competitor monitoring: Track products from URLs you’ve collected
execution_id and poll for results.
Keeping Your Index Fresh
To maintain a high-quality product index: Schedule re-crawling: Periodically re-crawl vendor websites to discover new product listings and collections. Schedule re-extraction: Re-run extraction to capture price, availability, and content changes for existing products. Monitor failures: Usesuccess and outcome fields to detect:
- Non-product URLs
- Unsupported vendors
- Products that have been removed