Skip to main content
Extract structured records from any PDF — financial filings, invoices, spec sheets, engineering drawings — using custom JSON schemas or purpose-built engineering extraction.

Two Extraction Modes

Table Extraction

Define a JSON schema describing the fields you need, and Nia extracts structured records from any PDF. Ideal for financial data, line items, tabular content, and repeating structures.

Engineering Extraction

Purpose-built for technical documents — engineering drawings, P&IDs, schematics, and spec sheets. Extracts structured metadata with optional follow-up queries for deeper analysis.

How It Works

1

Submit a Document

Provide a PDF URL or an existing Nia source ID along with an optional page range. For table extraction, include a JSON schema defining the fields to extract.
2

Processing

Nia parses the document, identifies relevant structures, and extracts data according to your schema (table mode) or built-in engineering models (engineering mode).
3

Retrieve Results

Poll the extraction job until it completes. Table extraction returns an array of structured records; engineering extraction returns a result object you can query further.

Table Extraction

Define a JSON schema and Nia returns structured records matching your specification. This is ideal for pulling repeating data out of dense documents like SEC filings, invoices, or product catalogs.

Start an Extraction Job

curl -X POST https://apigcp.trynia.ai/v2/extract \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.sec.gov/Archives/edgar/data/1326801/000132680124000006/meta-20231231.htm",
    "page_range": "60-80",
    "json_schema": {
      "type": "object",
      "properties": {
        "line_item": {
          "type": "string",
          "description": "Name of the financial line item"
        },
        "fiscal_year_2023": {
          "type": "number",
          "description": "Value for fiscal year 2023 in millions USD"
        },
        "fiscal_year_2022": {
          "type": "number",
          "description": "Value for fiscal year 2022 in millions USD"
        },
        "yoy_change_pct": {
          "type": "number",
          "description": "Year-over-year change as a percentage"
        }
      },
      "required": ["line_item", "fiscal_year_2023"]
    }
  }'
Response:
{
  "id": "ext_abc123",
  "status": "queued"
}

Check Extraction Status

curl https://apigcp.trynia.ai/v2/extract/ext_abc123 \
  -H "Authorization: Bearer $NIA_API_KEY"
Response when completed:
{
  "id": "ext_abc123",
  "status": "completed",
  "progress": 100,
  "record_count": 24,
  "page_count": 20,
  "records": [
    {
      "line_item": "Total revenue",
      "fiscal_year_2023": 134902,
      "fiscal_year_2022": 116609,
      "yoy_change_pct": 15.69
    },
    {
      "line_item": "Cost of revenue",
      "fiscal_year_2023": 38019,
      "fiscal_year_2022": 25249,
      "yoy_change_pct": 50.57
    }
  ]
}

JSON Schema Tips

Use descriptions — Add a description to each field in your schema. Nia uses these to understand what data to look for, especially when column headers in the PDF are ambiguous.
Narrow the page range — If you know which pages contain the data, specify page_range to speed up extraction and improve accuracy.

Engineering Extraction

Extract structured information from technical documents — engineering drawings, P&IDs, schematics, datasheets, and construction specifications. Engineering mode uses specialized models tuned for technical content.

Start an Engineering Extraction

curl -X POST https://apigcp.trynia.ai/v2/extract/engineering \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/piping-diagram-rev3.pdf",
    "page_range": "1-5",
    "accuracy_mode": "precise"
  }'
The accuracy_mode parameter controls the speed/accuracy tradeoff:
ModeDescription
fastOptimized for speed. Good for initial scans and high-volume processing.
preciseMaximum accuracy. Best for critical documents where every detail matters.
Response:
{
  "id": "eng_xyz789",
  "status": "queued"
}

Check Engineering Extraction Status

curl https://apigcp.trynia.ai/v2/extract/engineering/eng_xyz789 \
  -H "Authorization: Bearer $NIA_API_KEY"
Response when completed:
{
  "id": "eng_xyz789",
  "status": "completed",
  "result": {
    "document_type": "P&ID",
    "title": "Process Flow - Unit 400 Cooling System",
    "revision": "Rev 3",
    "components": [
      {
        "tag": "P-401A",
        "type": "Centrifugal Pump",
        "specifications": "250 GPM, 150 PSI"
      },
      {
        "tag": "HX-402",
        "type": "Shell and Tube Heat Exchanger",
        "specifications": "500 sq ft, 150 PSI design"
      }
    ]
  }
}

Follow-Up Queries

After an engineering extraction completes, you can ask follow-up questions about the results without re-processing the document:
curl -X POST https://apigcp.trynia.ai/v2/extract/engineering/eng_xyz789/query \
  -H "Authorization: Bearer $NIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is the design pressure rating for all heat exchangers in this diagram?"
  }'
Response:
{
  "id": "eng_xyz789",
  "chat_messages": [
    {
      "role": "user",
      "content": "What is the design pressure rating for all heat exchangers in this diagram?"
    },
    {
      "role": "assistant",
      "content": "Based on the extraction results, there is one heat exchanger in this diagram:\n\n- **HX-402** (Shell and Tube Heat Exchanger): Design pressure is **150 PSI**."
    }
  ]
}
Follow-up queries use the already-extracted context, so they are fast and do not consume additional extraction credits.

List All Extractions

Retrieve all your extraction jobs, optionally filtered by type:
# List all extractions
curl https://apigcp.trynia.ai/v2/extractions \
  -H "Authorization: Bearer $NIA_API_KEY"

# Filter by type
curl "https://apigcp.trynia.ai/v2/extractions?type=table" \
  -H "Authorization: Bearer $NIA_API_KEY"

curl "https://apigcp.trynia.ai/v2/extractions?type=engineering" \
  -H "Authorization: Bearer $NIA_API_KEY"

Extraction Statuses

Both table and engineering extractions follow the same status lifecycle:
StatusDescription
queuedJob received and waiting to be processed
processingExtraction is actively running
completedExtraction finished successfully — results are available
failedExtraction encountered an error

Use Cases

Financial Analysis

Extract line items, revenue figures, and balance sheet data from SEC filings (10-K, 10-Q) into structured records for analysis and comparison.

Engineering Review

Parse P&IDs, wiring diagrams, and spec sheets to catalog components, materials, and specifications. Ask follow-up questions about extracted details.

Invoice Processing

Pull vendor names, line items, quantities, and totals from invoices using a custom JSON schema tailored to your format.

Technical Due Diligence

Extract equipment lists, compliance data, and specifications from engineering documents during M&A or audits.