Extract structured data from PDFs using JSON schemas, detect visual elements, and use engineering-specific extraction for technical documents
Extract structured records from any PDF — financial filings, invoices, spec sheets, engineering drawings — using custom JSON schemas, visual element detection, or purpose-built engineering extraction.
Define a JSON schema describing the fields you need, and Nia extracts structured records from any PDF. Ideal for financial data, line items, tabular content, and repeating structures.
Detect Extraction
Detect and locate visual elements — tables, figures, charts, diagrams — in PDF pages. Returns bounding boxes, classifications, and annotated page images.
Engineering Extraction
Purpose-built for technical documents — engineering drawings, P&IDs, schematics, and spec sheets. Extracts structured metadata with optional follow-up queries for deeper analysis.
Provide a PDF URL or an existing Nia source ID along with an optional page range. For table extraction, include a JSON schema defining the fields to extract.
2
Processing
Nia parses the document, identifies relevant structures, and extracts data according to your schema (table mode) or built-in engineering models (engineering mode).
3
Retrieve Results
Poll the extraction job until it completes. Table extraction returns an array of structured records; engineering extraction returns a result object you can query further.
Define a JSON schema and Nia returns structured records matching your specification. This is ideal for pulling repeating data out of dense documents like SEC filings, invoices, or product catalogs.
Use descriptions — Add a description to each field in your schema. Nia uses these to understand what data to look for, especially when column headers in the PDF are ambiguous.
Narrow the page range — If you know which pages contain the data, specify page_range to speed up extraction and improve accuracy.
Detect and locate visual elements within PDF pages — tables, figures, charts, and diagrams. Detect mode returns bounding boxes and classifications for each element found, and can render annotated page images with the detections overlaid.
This returns a PNG image with detection bounding boxes overlaid on the original page.
Use detect before table extraction — Run detect first to identify which pages contain tables, then target those specific pages with table extraction for faster, more accurate results.
Extract structured information from technical documents — engineering drawings, P&IDs, schematics, datasheets, and construction specifications. Engineering mode uses specialized models tuned for technical content.
After an engineering extraction completes, you can ask follow-up questions about the results without re-processing the document:
curl -X POST https://apigcp.trynia.ai/v2/extract/engineering/eng_xyz789/query \ -H "Authorization: Bearer $NIA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message": "What is the design pressure rating for all heat exchangers in this diagram?" }'
Response:
{ "id": "eng_xyz789", "chat_messages": [ { "role": "user", "content": "What is the design pressure rating for all heat exchangers in this diagram?" }, { "role": "assistant", "content": "Based on the extraction results, there is one heat exchanger in this diagram:\n\n- **HX-402** (Shell and Tube Heat Exchanger): Design pressure is **150 PSI**." } ]}
Follow-up queries use the already-extracted context, so they are fast and do not consume additional extraction credits.