cURL
Extract
Extract
Async extraction of product data from URLs or a vendor. Returns immediately with an execution_id to check status and retrieve results.
POST
cURL
You can extract products using one of two input sources:
- URLs: Provide an array of product URLs to extract (up to 1000 URLs per request)
- Vendor: Provide a vendor to extract products from previously discovered listings
Async Processing: This endpoint starts processing asynchronously and returns an
execution_id immediately. Use GET /v2/extract/{execution_id} to check progress and retrieve results when processing completes.Request
Your API key for authentication
Request Body
Choose one of the following input sources:Input Source: URLs
Array of product URLs to extract (up to 1000 URLs per request). Use this when you have specific product URLs to extract.Requirements:Note: You can send a single URL in an array:
- Must be a non-empty array
- Each entry must be a non-empty string
- Maximum 1000 URLs per request
["https://example.com/product"]URL sanitization: URLs are sanitized and normalized on the backend; client-side sanitization is not required.Input Source: Vendor
The vendor to extract products from (e.g., “example.com” or “https://example.com”). Use this when you want to extract products from a vendor’s previously discovered listings.Examples:
"nike.com""https://www.adidas.com""example-store.com"
POST /v2/crawl, or you must provide a crawl_id to wait for an active crawl to complete. Extraction requires product listings to have been discovered via crawl.Vendor-Specific Parameters
These parameters only apply when using thevendor input source:
Maximum number of products to process. If not provided, all available products will be processed.
Optional crawl execution ID to wait for before starting extraction. If provided, extraction will be queued and will start automatically when the crawl execution completes.Use Case: Use this when you want to ensure product listings are discovered via crawl before extraction begins. The extraction will remain in
pending status with waiting_for in the meta until the crawl completes.Format: crawl-{vendor}-{uuid}Note: If you haven’t started a crawl yet, use POST /v2/crawl first, then use the returned execution_id as the crawl_id parameter here.Shared Parameters
These parameters apply to both input sources:ISO 2-letter country code for localization (e.g., “us”, “ca”, “gb”, “de”)Supported Country Codes:
nl, ca, si, co, by, ee, no, br, us, de, es, vn, il, th, gb, ph, kg, in, ru, fr, hk, ua, jp, at, mm, my, pl, au, nz, ro, tw, mx, id, dk, ng, ch, hu, sg, sa, ae, cn, ar, cl, it, tr, lv, gh, sk, gr, eg, lu, bg, se, lt, lk, kz, hr, bd, kr, cz, fi, ie, be, pt, za
Whether to enable AI enrichment for products. When enabled, products are enhanced with additional attributes and categorization.
Whether to enable product reviews processing
Whether to enable AI-generated image tags and descriptions
Whether to enable similar products discovery to find related items
Response
Unique execution identifier for this extraction job. Use this ID with
GET /v2/extract/{execution_id} to check progress and retrieve results.Format:- When using
urls:extract-urls-{uuid} - When using
vendor:extract-{vendor}-{uuid}(dots in vendor replaced with dashes)
Execution status. Will be
"pending" in the POST response. When using vendor with crawl_id, status will be "pending" with waiting_for in meta until the crawl completes.Response metadata
HTTP Status Code: This endpoint returns
202 Accepted to indicate the request has been accepted for processing. The response body contains the execution details you need to track the extraction progress.Response Schema and Enable Flags
The/v2/extract endpoint maintains a consistent response schema regardless of enable_* flag values. All fields are always present in product objects returned in the data array, but will be null when the corresponding flag is false.
Field Mappings:
| Flag | Affected Fields |
|---|---|
enable_reviews | reviews |
enable_enrichment | attributes, product_type, google_product_category_id, google_product_category_path |
enable_image_tags | images[].attributes |
enable_similar_products | similar_products |
cURL
Workflow
Using URLs
- Start Extraction: Send a POST request with your
urlsarray - Get Execution ID: Receive an
execution_idimmediately in the response (HTTP 202) - Check Status: Poll
GET /v2/extract/{execution_id}using theexecution_id - Retrieve Results: When
statusis"completed", results are available with pagination support
Using Vendor
- Crawl the Domain (if not already done): First, use
POST /v2/crawlto discover product listings from the domain. Wait for the crawl to complete or use thecrawl_idparameter in step 2. - Start Extraction: Send a POST request with your
vendordomain. If the domain hasn’t been crawled yet, provide thecrawl_idfrom step 1 to wait for crawl completion. - Get Execution ID: Receive an
execution_idimmediately in the response - Check Status: Poll
GET /v2/extract/{execution_id}. If waiting for a crawl, the status will showpendingwithwaiting_forin the meta. - Retrieve Results: When
statusis"completed", results are available with pagination support