You can extract products using one of two input sources:
URLs: Provide an array of product URLs to extract (up to 1000 URLs per request)
Vendor: Provide a vendor to extract products from previously discovered listings
Async Processing: This endpoint starts processing asynchronously and returns an execution_id immediately. Use GET /v2/extract/{execution_id} to check progress and retrieve results when processing completes.
Managing Executions:
View All Executions: You can view all your extraction executions using GET /v2/extract to see all processing jobs, their statuses, and when they were created.
Note: You can send a single URL in an array: ["https://example.com/product"]URL sanitization: URLs are sanitized and normalized on the backend; client-side sanitization is not required.
The vendor to extract products from (e.g., “example.com” or “https://example.com”). Use this when you want to extract products from a vendor’s previously discovered listings.Examples:
"nike.com"
"https://www.adidas.com"
"example-store.com"
Note: The domain will be normalized automatically. You can include or omit the protocol and www prefix.Important: This vendor must have been crawled previously using POST /v2/crawl, or you must provide a crawl_id to wait for an active crawl to complete. Extraction requires product listings to have been discovered via crawl.
Optional crawl execution ID to wait for before starting extraction. If provided, extraction will be queued and will start automatically when the crawl execution completes.Use Case: Use this when you want to ensure product listings are discovered via crawl before extraction begins. The extraction will remain in pending status with waiting_for in the meta until the crawl completes.Format:crawl-{vendor}-{uuid}Note: If you haven’t started a crawl yet, use POST /v2/crawl first, then use the returned execution_id as the crawl_id parameter here.
Execution status. Will be "pending" in the POST response. When using vendor with crawl_id, status will be "pending" with waiting_for in meta until the crawl completes.
Number of URLs that will actually be processed. When using urls, this equals url_count. When using vendor, this respects the max_products limit if set.
Crawl execution ID that this extraction is waiting for (only present when crawl_id was provided with vendor)
HTTP Status Code: This endpoint returns 202 Accepted to indicate the request has been accepted for processing. The response body contains the execution details you need to track the extraction progress.
The /v2/extract endpoint maintains a consistent response schema regardless of enable_* flag values. All fields are always present in product objects returned in the data array, but will be null when the corresponding flag is false.Field Mappings:
Crawl the Domain (if not already done): First, use POST /v2/crawl to discover product listings from the domain. Wait for the crawl to complete or use the crawl_id parameter in step 2.
Start Extraction: Send a POST request with your vendor domain. If the domain hasn’t been crawled yet, provide the crawl_id from step 1 to wait for crawl completion.
Get Execution ID: Receive an execution_id immediately in the response
Check Status: Poll GET /v2/extract/{execution_id}. If waiting for a crawl, the status will show pending with waiting_for in the meta.
Retrieve Results: When status is "completed", results are available with pagination support