Index a research paper from arXiv. The paper’s PDF is extracted using Firecrawl, enriched with metadata from the arXiv API (title, authors, abstract, categories), and indexed into the vector store for semantic search.
Supports multiple input formats:
Papers are globally deduplicated - if another user has already indexed a paper, you’ll get instant access to the existing index.
API key must be provided in the Authorization header
arXiv URL or raw ID. Supports multiple formats:
"2312.00752"
Research paper indexing started or completed successfully
Unique identifier for the data source
The arXiv identifier (e.g., "2312.00752")
Paper title extracted from arXiv
List of paper authors
Paper abstract
arXiv categories (e.g., ["cs.CL", "cs.AI"])
Primary arXiv category
Current indexing status (aligned with DataSourceResponse)
pending, processing, completed, failed, error Number of text chunks created from the paper
DOI if available
Publication date from arXiv
Direct URL to the PDF
URL to the arXiv abstract page
Error message if status is 'failed'