Extract structured data from PDFs using JSON schemas and engineering-specific extraction for technical documents
Extract structured records from any PDF — financial filings, invoices, spec sheets, engineering drawings — using custom JSON schemas or purpose-built engineering extraction.
Define a JSON schema describing the fields you need, and Nia extracts structured records from any PDF. Ideal for financial data, line items, tabular content, and repeating structures.
Engineering Extraction
Purpose-built for technical documents — engineering drawings, P&IDs, schematics, and spec sheets. Extracts structured metadata with optional follow-up queries for deeper analysis.
Provide a PDF URL or an existing Nia source ID along with an optional page range. For table extraction, include a JSON schema defining the fields to extract.
2
Processing
Nia parses the document, identifies relevant structures, and extracts data according to your schema (table mode) or built-in engineering models (engineering mode).
3
Retrieve Results
Poll the extraction job until it completes. Table extraction returns an array of structured records; engineering extraction returns a result object you can query further.
Define a JSON schema and Nia returns structured records matching your specification. This is ideal for pulling repeating data out of dense documents like SEC filings, invoices, or product catalogs.
Use descriptions — Add a description to each field in your schema. Nia uses these to understand what data to look for, especially when column headers in the PDF are ambiguous.
Narrow the page range — If you know which pages contain the data, specify page_range to speed up extraction and improve accuracy.
Extract structured information from technical documents — engineering drawings, P&IDs, schematics, datasheets, and construction specifications. Engineering mode uses specialized models tuned for technical content.
After an engineering extraction completes, you can ask follow-up questions about the results without re-processing the document:
curl -X POST https://apigcp.trynia.ai/v2/extract/engineering/eng_xyz789/query \ -H "Authorization: Bearer $NIA_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "message": "What is the design pressure rating for all heat exchangers in this diagram?" }'
Response:
{ "id": "eng_xyz789", "chat_messages": [ { "role": "user", "content": "What is the design pressure rating for all heat exchangers in this diagram?" }, { "role": "assistant", "content": "Based on the extraction results, there is one heat exchanger in this diagram:\n\n- **HX-402** (Shell and Tube Heat Exchanger): Design pressure is **150 PSI**." } ]}
Follow-up queries use the already-extracted context, so they are fast and do not consume additional extraction credits.