Skip to main content
Build and maintain your own product index using Catalog’s crawl and products endpoints. The typical workflow is: crawl a vendor to discover their product listings, browse the discovered data to curate which products you want, then extract full product data for your selected listings.
This guide covers the core workflow: discovering product listings via POST /v1/crawl, browsing discovered data via GET /v1/vendors, POST /v1/collections, and POST /v1/listings, and extracting products via POST /v1/products.

Step 1: Crawl a Vendor

Start by crawling a vendor website to discover all their collections and product listings. Use POST /v1/crawl with the vendor URL:
{
  "url": "https://www.example.com"
}
The crawl process:
  • Discovers collections from the vendor (e.g., “New In”, “Sale”, “Shoes”)
  • Extracts product listings from each collection
Typical flow:
  1. Start a crawl by calling POST /v1/crawl with the vendor URL
  2. Receive an execution_id immediately (format: crawl-{hostname}-{uuid})
  3. Poll GET /v1/crawl/{execution_id} to check progress:
    • Status will be "pending", "running", "completed", or "failed"
    • When status === "completed", you’ll see total_listings_found indicating how many products were discovered
Billing Requirement: Crawl requests require auto top-up to be enabled in your billing settings. This ensures you have sufficient credits to complete the crawl operation.

Step 2: Browse Discovered Data

After crawling, use the three browse endpoints to explore and curate which products you want to extract. These endpoints return data from vendors you have crawled.

List your vendors

Use GET /v1/vendors to see all vendors you have crawled and their product counts:
  • Review which vendors you have indexed
  • Check product_count to see how many listings were discovered
  • Use latest_product_update_by_catalog to see when data was last refreshed

Browse collections

Use POST /v1/collections to explore a vendor’s collections:
  • Retrieve collections (e.g., “New In”, “Sale”, “Shoes”)
  • Decide which collections to include in your index (e.g., only “New Arrivals” or “Top Sellers”)
This helps you build a more structured index (vendor → collection → products).

Curate product listings

Use POST /v1/listings to page through product listings for a vendor or collection. As you browse:
  • Review the lightweight listing data (title, URL, collection, timestamps)
  • Curate which products you want to extract full data for
  • Store the canonical product URLs for listings you want to index
This step lets you select a subset of products rather than extracting everything—useful when you only need certain collections, price ranges, or product types.

Step 3: Extract Full Product Data

Once you have curated your list of product URLs, use POST /v1/products to get full product data with AI enrichment, reviews, and image tags. Typical flow:
  1. Pull a batch of URLs from your curated list (up to 1000 URLs per request)
  2. Call POST /v1/products with your URLs:
    {
      "urls": [
        "https://www.example.com/product/1",
        "https://www.example.com/product/2"
      ],
      "enable_enrichment": true,
      "enable_reviews": true,
      "enable_image_tags": true,
      "country_code": "us"
    }
    
  3. Receive an execution_id (format: products-batch-{uuid}) and poll GET /v1/products/{execution_id}
  4. Upsert the extracted products into your index (search engine, DB, vector store, etc.)

Extracting from Any URL Source

The POST /v1/products endpoint accepts product URLs from any source—not just URLs discovered through crawling. This gives you flexibility to build your index from multiple sources: Common use cases:
  • Affiliate feeds: Extract products from affiliate network URLs
  • Merchant feeds: Process product URLs from partner data feeds
  • Internal catalogs: Index products from your own product database
  • Hand-curated lists: Extract specific products you’ve manually selected
  • Competitor monitoring: Track products from URLs you’ve collected
Example:
{
  "urls": [
    "https://www.nike.com/t/air-force-1-07-mens-shoes-5QFp5Z/CW2288-111",
    "https://www.adidas.com/us/gazelle-shoes/BB5476.html",
    "https://www.newbalance.com/pd/574-core/ML574EVG.html"
  ],
  "enable_enrichment": true,
  "country_code": "us"
}
This works the same as extracting crawled URLs—you receive an execution_id and poll for results.

Keeping Your Index Fresh

To maintain a high-quality product index: Schedule re-crawling: Periodically re-crawl vendor websites to discover new product listings and collections. Schedule re-extraction: Re-run extraction to capture price, availability, and content changes for existing products. Monitor failures: Use success and outcome fields to detect:
  • Non-product URLs
  • Unsupported vendors
  • Products that have been removed
Prune stale products: Remove (or downgrade) products that consistently fail to process or are no longer available. Use URLs for targeted updates: For specific products that need frequent updates (e.g., featured items), use the Products endpoint to refresh them more frequently than a full re-extraction.