Skip to main content
POST
/
huggingface-datasets
curl --request POST \
  --url https://apigcp.trynia.ai/v2/huggingface-datasets \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "url": "dair-ai/emotion"
}
'
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"dataset_id": "emotion",
"url": "https://huggingface.co/datasets/dair-ai/emotion",
"status": "processing",
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T10:30:00Z",
"owner": "dair-ai",
"description": "Emotion is a dataset of English Twitter messages with six basic emotions.",
"splits": [
"train",
"test",
"validation"
],
"columns": [
{
"name": "text",
"dtype": "string"
},
{
"name": "label",
"dtype": "int64"
}
],
"row_count": 20000,
"indexed_row_count": 0,
"chunk_count": 0
}

Authorizations

Authorization
string
header
required

API key must be provided in the Authorization header

Body

application/json
url
string
required

HuggingFace dataset URL or identifier. Supports multiple formats:

Example:

"dair-ai/emotion"

config
string | null

Dataset configuration name (for multi-config datasets)

add_as_global_source
boolean
default:true

Add to global shared pool (default true). Set false for private indexing.

Response

HuggingFace dataset indexing started or completed successfully

id
string

Unique identifier for the data source

dataset_id
string

Dataset identifier (e.g., "squad", "emotion")

url
string

Canonical HuggingFace dataset URL

status
enum<string>

Current indexing status

Available options:
pending,
processing,
completed,
failed,
error
created_at
string<date-time>
updated_at
string<date-time>
owner
string | null

Dataset owner/organization

description
string | null

Dataset description

splits
string[]

Available dataset splits (e.g., ["train", "test", "validation"])

columns
object[]

Dataset columns with names and data types

row_count
integer
default:0

Total number of rows in the dataset

indexed_row_count
integer
default:0

Number of rows actually indexed (may differ due to sampling)

chunk_count
integer
default:0

Number of text chunks created

sampling_strategy
string | null

Sampling strategy used (full or sampled)

license
string | null

Dataset license

error
string | null

Error message if status is 'failed'