Index a new data source

curl --request POST \
  --url https://apigcp.trynia.ai/v2/data-sources \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://docs.example.com",
  "url_patterns": [
    "https://docs.example.com/api/*",
    "https://docs.example.com/guides/*"
  ],
  "exclude_patterns": [
    "/blog/*",
    "/changelog/*"
  ],
  "project_id": "<string>",
  "max_age": 123,
  "formats": [
    "markdown",
    "html"
  ],
  "only_main_content": true,
  "limit": 10000,
  "max_depth": 20,
  "crawl_entire_domain": true,
  "wait_for": 2000,
  "include_screenshot": true,
  "check_llms_txt": true,
  "llms_txt_strategy": "prefer"
}'

{
  "id": "<string>",
  "url": "<string>",
  "file_name": "<string>",
  "status": "pending",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "page_count": 0,
  "chunk_count": 0,
  "project_id": "<string>",
  "source_type": "web",
  "is_active": true,
  "display_name": "<string>",
  "error": "<string>",
  "error_code": "<string>"
}

POST

data-sources

curl --request POST \
  --url https://apigcp.trynia.ai/v2/data-sources \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://docs.example.com",
  "url_patterns": [
    "https://docs.example.com/api/*",
    "https://docs.example.com/guides/*"
  ],
  "exclude_patterns": [
    "/blog/*",
    "/changelog/*"
  ],
  "project_id": "<string>",
  "max_age": 123,
  "formats": [
    "markdown",
    "html"
  ],
  "only_main_content": true,
  "limit": 10000,
  "max_depth": 20,
  "crawl_entire_domain": true,
  "wait_for": 2000,
  "include_screenshot": true,
  "check_llms_txt": true,
  "llms_txt_strategy": "prefer"
}'

{
  "id": "<string>",
  "url": "<string>",
  "file_name": "<string>",
  "status": "pending",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z",
  "page_count": 0,
  "chunk_count": 0,
  "project_id": "<string>",
  "source_type": "web",
  "is_active": true,
  "display_name": "<string>",
  "error": "<string>",
  "error_code": "<string>"
}

Authorizations

Authorization

string

header

required

API key must be provided in the Authorization header

Body

application/json

url

string<uri>

required

URL to index (documentation or website)

Example:

"https://docs.example.com"

url_patterns

string[]

URL patterns to include in crawling (supports wildcards)

Example:

[
  "https://docs.example.com/api/*",
  "https://docs.example.com/guides/*"
]

exclude_patterns

string[]

URL patterns to exclude from crawling

Example:

["/blog/*", "/changelog/*"]

project_id

string

Optional project ID to associate with

max_age

integer

Maximum age of cached content in seconds (for fast scraping)

formats

string[]

Content formats to return

Example:

["markdown", "html"]

only_main_content

boolean

default:true

Extract only main content (removes nav, ads, etc.)

limit

integer

default:10000

Maximum number of pages to crawl

max_depth

integer

default:20

Maximum crawl depth

crawl_entire_domain

boolean

default:true

Whether to crawl the entire domain

wait_for

integer

default:2000

Time to wait for page to load in milliseconds

include_screenshot

boolean

default:true

Include full page screenshot

check_llms_txt

boolean

default:true

Check for llms.txt file for curated documentation URLs

llms_txt_strategy

enum<string>

default:prefer

How to use llms.txt if found:

prefer: Start with llms.txt URLs, then crawl additional pages if under limit
only: Only index URLs listed in llms.txt
ignore: Skip llms.txt check (traditional behavior)

Available options:

prefer,

only,

ignore

Response

Data source indexing started successfully

string

Unique identifier for the data source

url

string

The indexed URL

file_name

string

File name for text sources

status

enum<string>

Current indexing status

Available options:

pending,

processing,

completed,

failed,

error

created_at

string<date-time>

updated_at

string<date-time>

page_count

integer

default:0

Number of pages indexed

chunk_count

integer

default:0

Number of chunks/embeddings created

project_id

string

Associated project ID if any

source_type

enum<string>

default:web

Available options:

web,

text

is_active

boolean

default:true

display_name

string

Custom display name for the data source

error

string

Error message if status is 'error' or 'failed'

error_code

string

Error code for programmatic error handling

List all data sources Get data source details

Repositories

Query

Search & Research

Sources

Data Sources

Package Search

Index a new data source

Authorizations

Body

Response