Authorizations
API key must be provided in the Authorization header
Body
URL to index (documentation or website)
"https://docs.example.com"
URL patterns to include in crawling (supports wildcards)
[
"https://docs.example.com/api/*",
"https://docs.example.com/guides/*"
]
URL patterns to exclude from crawling
["/blog/*", "/changelog/*"]
Optional project ID to associate with
Maximum age of cached content in seconds (for fast scraping)
Content formats to return
["markdown", "html"]
Extract only main content (removes nav, ads, etc.)
Maximum number of pages to crawl
Maximum crawl depth
Whether to crawl the entire domain
Time to wait for page to load in milliseconds
Include full page screenshot
Check for llms.txt file for curated documentation URLs
How to use llms.txt if found:
- prefer: Start with llms.txt URLs, then crawl additional pages if under limit
- only: Only index URLs listed in llms.txt
- ignore: Skip llms.txt check (traditional behavior)
prefer
, only
, ignore
Response
Data source indexing started successfully
Unique identifier for the data source
The indexed URL
File name for text sources
Current indexing status
pending
, processing
, completed
, failed
, error
Number of pages indexed
Number of chunks/embeddings created
Associated project ID if any
web
, text
Custom display name for the data source
Error message if status is 'error' or 'failed'
Error code for programmatic error handling