Data Extraction

Extract structured records from any PDF — financial filings, invoices, spec sheets, engineering drawings — using custom JSON schemas, visual element detection, or purpose-built engineering extraction.

Three Extraction Modes

Table Extraction

Define a JSON schema describing the fields you need, and Nia extracts structured records from any PDF. Ideal for financial data, line items, tabular content, and repeating structures.

Detect Extraction

Detect and locate visual elements — tables, figures, charts, diagrams — in PDF pages. Returns bounding boxes, classifications, and annotated page images.

Engineering Extraction

Purpose-built for technical documents — engineering drawings, P&IDs, schematics, and spec sheets. Extracts structured metadata with optional follow-up queries for deeper analysis.

How It Works

Submit a Document

Provide a PDF URL or an existing Nia source ID along with an optional page range. For table extraction, include a JSON schema defining the fields to extract.

Processing

Nia parses the document, identifies relevant structures, and extracts data according to your schema (table mode) or built-in engineering models (engineering mode).

Retrieve Results

Poll the extraction job until it completes. Table extraction returns an array of structured records; engineering extraction returns a result object you can query further.

Table Extraction

Define a JSON schema and Nia returns structured records matching your specification. This is ideal for pulling repeating data out of dense documents like SEC filings, invoices, or product catalogs.

Start an Extraction Job

curl -X POST https://apigcp.trynia.ai/v2/extract \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.sec.gov/Archives/edgar/data/1326801/000132680124000006/meta-20231231.htm",
    "page_range": "60-80",
    "json_schema": {
      "type": "object",
      "properties": {
        "line_item": {
          "type": "string",
          "description": "Name of the financial line item"
        },
        "fiscal_year_2023": {
          "type": "number",
          "description": "Value for fiscal year 2023 in millions USD"
        },
        "fiscal_year_2022": {
          "type": "number",
          "description": "Value for fiscal year 2022 in millions USD"
        },
        "yoy_change_pct": {
          "type": "number",
          "description": "Year-over-year change as a percentage"
        }
      },
      "required": ["line_item", "fiscal_year_2023"]
    }
  }'

Response:

{
  "id": "ext_abc123",
  "status": "queued"
}

Check Extraction Status

curl https://apigcp.trynia.ai/v2/extract/ext_abc123 \
  -H "Authorization: Bearer $NIA_API_KEY"

Response when completed:

{
  "id": "ext_abc123",
  "status": "completed",
  "progress": 100,
  "record_count": 24,
  "page_count": 20,
  "records": [
    {
      "line_item": "Total revenue",
      "fiscal_year_2023": 134902,
      "fiscal_year_2022": 116609,
      "yoy_change_pct": 15.69
    },
    {
      "line_item": "Cost of revenue",
      "fiscal_year_2023": 38019,
      "fiscal_year_2022": 25249,
      "yoy_change_pct": 50.57
    }
  ]
}

JSON Schema Tips

Use descriptions — Add a description to each field in your schema. Nia uses these to understand what data to look for, especially when column headers in the PDF are ambiguous.

Narrow the page range — If you know which pages contain the data, specify page_range to speed up extraction and improve accuracy.

Detect Extraction

Detect and locate visual elements within PDF pages — tables, figures, charts, and diagrams. Detect mode returns bounding boxes and classifications for each element found, and can render annotated page images with the detections overlaid.

Start a Detect Extraction Job

curl -X POST https://apigcp.trynia.ai/v2/extract/detect \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/annual-report-2024.pdf",
    "page_range": "1-10",
    "include_symbols": false
  }'

Parameter	Description
`url`	URL of the PDF to process (provide either `url` or `source_id`)
`source_id`	Source ID of an already-indexed document
`page_range`	Pages to process (e.g. `"1-10"`, `"5,8,12"`)
`include_symbols`	Enable symbol-level detection for technical documents (default `false`)
`filter_pattern`	Regex to filter detected element types

Response:

{
  "id": "det_abc123",
  "status": "queued",
  "type": "detect"
}

Check Detect Extraction Status

curl https://apigcp.trynia.ai/v2/extract/detect/det_abc123 \
  -H "Authorization: Bearer $NIA_API_KEY"

Response when completed:

{
  "id": "det_abc123",
  "status": "completed",
  "progress": 100,
  "type": "detect",
  "page_count": 10,
  "result": {
    "pages": [
      {
        "page_number": 1,
        "elements": [
          {
            "type": "table",
            "bbox": [72, 200, 540, 450],
            "confidence": 0.97
          },
          {
            "type": "figure",
            "bbox": [72, 500, 400, 700],
            "confidence": 0.93
          }
        ]
      }
    ]
  }
}

Get Annotated Page Image

Retrieve a page image with bounding boxes drawn over detected elements:

curl https://apigcp.trynia.ai/v2/extract/detect/det_abc123/page/1/image \
  -H "Authorization: Bearer $NIA_API_KEY" \
  --output page-1-annotated.png

This returns a PNG image with detection bounding boxes overlaid on the original page.

Use detect before table extraction — Run detect first to identify which pages contain tables, then target those specific pages with table extraction for faster, more accurate results.

Engineering Extraction

Extract structured information from technical documents — engineering drawings, P&IDs, schematics, datasheets, and construction specifications. Engineering mode uses specialized models tuned for technical content.

Start an Engineering Extraction

curl -X POST https://apigcp.trynia.ai/v2/extract/engineering \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/piping-diagram-rev3.pdf",
    "page_range": "1-5",
    "accuracy_mode": "precise"
  }'

The accuracy_mode parameter controls the speed/accuracy tradeoff:

Mode	Description
`fast`	Optimized for speed. Good for initial scans and high-volume processing.
`precise`	Maximum accuracy. Best for critical documents where every detail matters.

Response:

{
  "id": "eng_xyz789",
  "status": "queued"
}

Check Engineering Extraction Status

curl https://apigcp.trynia.ai/v2/extract/engineering/eng_xyz789 \
  -H "Authorization: Bearer $NIA_API_KEY"

Response when completed:

{
  "id": "eng_xyz789",
  "status": "completed",
  "result": {
    "document_type": "P&ID",
    "title": "Process Flow - Unit 400 Cooling System",
    "revision": "Rev 3",
    "components": [
      {
        "tag": "P-401A",
        "type": "Centrifugal Pump",
        "specifications": "250 GPM, 150 PSI"
      },
      {
        "tag": "HX-402",
        "type": "Shell and Tube Heat Exchanger",
        "specifications": "500 sq ft, 150 PSI design"
      }
    ]
  }
}

Follow-Up Queries

After an engineering extraction completes, you can ask follow-up questions about the results without re-processing the document:

curl -X POST https://apigcp.trynia.ai/v2/extract/engineering/eng_xyz789/query \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the design pressure rating for all heat exchangers in this diagram?"
  }'

Response:

{
  "id": "eng_xyz789",
  "chat_messages": [
    {
      "role": "user",
      "content": "What is the design pressure rating for all heat exchangers in this diagram?"
    },
    {
      "role": "assistant",
      "content": "Based on the extraction results, there is one heat exchanger in this diagram:\n\n- **HX-402** (Shell and Tube Heat Exchanger): Design pressure is **150 PSI**."
    }
  ]
}

Follow-up queries use the already-extracted context, so they are fast and do not consume additional extraction credits.

List All Extractions

Retrieve all your extraction jobs, optionally filtered by type:

# List all extractions
curl https://apigcp.trynia.ai/v2/extractions \
  -H "Authorization: Bearer $NIA_API_KEY"

# Filter by type
curl "https://apigcp.trynia.ai/v2/extractions?type=table" \
  -H "Authorization: Bearer $NIA_API_KEY"

curl "https://apigcp.trynia.ai/v2/extractions?type=engineering" \
  -H "Authorization: Bearer $NIA_API_KEY"

Extraction Statuses

Both table and engineering extractions follow the same status lifecycle:

Status	Description
`queued`	Job received and waiting to be processed
`processing`	Extraction is actively running
`completed`	Extraction finished successfully — results are available
`failed`	Extraction encountered an error

Use Cases

Financial Analysis

Extract line items, revenue figures, and balance sheet data from SEC filings (10-K, 10-Q) into structured records for analysis and comparison.

Engineering Review

Parse P&IDs, wiring diagrams, and spec sheets to catalog components, materials, and specifications. Ask follow-up questions about extracted details.

Invoice Processing

Pull vendor names, line items, quantities, and totals from invoices using a custom JSON schema tailored to your format.

Technical Due Diligence

Extract equipment lists, compliance data, and specifications from engineering documents during M&A or audits.

Getting Started

Capabilities

Source Types

Explore

Examples & Guides

Enterprise

Privacy

Three Extraction Modes

Table Extraction

Detect Extraction

Engineering Extraction

How It Works

Table Extraction

Start an Extraction Job

Check Extraction Status

JSON Schema Tips

Detect Extraction

Start a Detect Extraction Job

Check Detect Extraction Status

Get Annotated Page Image

Engineering Extraction

Start an Engineering Extraction

Check Engineering Extraction Status

Follow-Up Queries

List All Extractions

Extraction Statuses

Use Cases

Financial Analysis

Engineering Review

Invoice Processing

Technical Due Diligence

Getting Started

Capabilities

Source Types

Explore

Examples & Guides

Enterprise

Privacy

Documentation Index

​Three Extraction Modes

Table Extraction

Detect Extraction

Engineering Extraction

​How It Works

​Table Extraction

​Start an Extraction Job

​Check Extraction Status

​JSON Schema Tips

​Detect Extraction

​Start a Detect Extraction Job

​Check Detect Extraction Status

​Get Annotated Page Image

​Engineering Extraction

​Start an Engineering Extraction

​Check Engineering Extraction Status

​Follow-Up Queries

​List All Extractions

​Extraction Statuses

​Use Cases

Financial Analysis

Engineering Review

Invoice Processing

Technical Due Diligence

Three Extraction Modes

How It Works

Table Extraction

Start an Extraction Job

Check Extraction Status

JSON Schema Tips

Detect Extraction

Start a Detect Extraction Job

Check Detect Extraction Status

Get Annotated Page Image

Engineering Extraction

Start an Engineering Extraction

Check Engineering Extraction Status

Follow-Up Queries

List All Extractions

Extraction Statuses

Use Cases