Source Types - Nia AI Documentation

Use this page when your first question is not “which tool do I call?” but “what kind of source do I have?” If you prefer to browse by job to be done, start at Capabilities.

Capabilities

Browse Nia by what it does: index, search, read, research, sync, and share context.

API Reference

Need endpoint details and schemas? Jump straight to the API reference.

At a Glance

Code & Repositories

GitHub repositories, package source code, file trees, grep, and code search workflows.

Documentation Sites

Crawl public docs, honor llms.txt when available, and search pages semantically or with regex.

PDFs & Research Papers

Structured PDF parsing, section-aware retrieval, and paper search for technical documents.

HuggingFace Datasets

Index dataset rows and schema-like structure for semantic retrieval.

Google Drive

Connect Drive, browse files and shared drives, select content, and keep it synced.

Spreadsheets & Tables

CSV, TSV, XLSX, and XLS ingestion with row-aware indexing.

Slack & Conversations

Index Slack history and search conversations with org-scoped isolation.

Local Knowledge

Local folders, databases, and chat history via direct folder indexing or continuous sync.

X (Twitter)

Index posts from X/Twitter accounts and search them semantically alongside your other sources.

Connectors

Generic connector framework for integrating external data sources with OAuth and API key flows.

E2E Encrypted Sources

iMessage, WhatsApp, Apple Notes, Contacts, Reminders, Stickies, and Screenshots — synced with zero-knowledge encryption.

Source-Type Matrix

Source type	Bring it in with	Best tools after that	Notes
Code & repositories	`index`, Tracer, `get_github_file_tree`	`search`, `nia_read`, `nia_grep`, `nia_explore`	Package source code also works without indexing via `nia_package_search_hybrid`
Documentation sites	`index`	`search`, `nia_read`, `nia_grep`, `nia_explore`	`llms.txt` aware, supports crawl filters
PDFs & research papers	`index`, PDF Indexing	`search`, `nia_read`	Tree-guided retrieval for long documents
HuggingFace datasets	`index`, HuggingFace Datasets	`search`, `nia_read`, `nia_explore`	Large datasets are sampled intelligently
Google Drive	Google Drive Integration	`search`, `nia_read`, `nia_grep`, `nia_explore`	Supports selected files, folders, shared drives, and incremental sync
Spreadsheets & tables	`index` with CSV, TSV, XLSX, or XLS	`search`, `nia_read`, `nia_explore`	Row and header aware
Slack	Slack Search	`search`, `nia_grep`	Workspace data stays org-scoped
Local folders, databases, chat history	`index(folder_path=...)`, Local Sync	`search`, `nia_read`, `nia_grep`, `nia_explore`	Best fit for continuously changing personal or team knowledge
X (Twitter)	X Integration	`search`	Index posts from any public account with your own bearer token
Connectors	Connectors	`search`, `nia_read`, `nia_grep`	Generic framework for external data sources with OAuth and scheduling
E2E encrypted sources (iMessage, WhatsApp, Notes, Contacts, Reminders, Stickies, Screenshots)	E2E Encryption, TypeScript SDK adapters	`search` with `e2e_session_id`	Zero-knowledge sync — plaintext never leaves your device

Code & Repositories

Best for source code, implementation patterns, architecture exploration, and exact file reading. Use when:

you want to index a GitHub repository and search it semantically
you need grep-style matching across a repo
you want public package source code without indexing first
you want GitHub code search without maintaining an index

Start with:

index for repositories you want in your own workspace
nia_package_search_hybrid for package source code
Tracer for public GitHub repo search without indexing
get_github_file_tree for quick structure inspection

Typical prompts:

"Index https://github.com/vercel/ai"
"Search the indexed repo for how streaming responses are implemented"
"Use Tracer to find how auth middleware works in the Next.js repo"
"Search the fastapi package for authentication examples"

Documentation Sites

Best for framework docs, product docs, API docs, and structured technical websites. Use when:

you want grounded answers from official docs
you need a source your agents can cite and revisit
you want to crawl a docs site with include or exclude patterns

Start with:

index on the docs URL
search to find relevant pages
nia_read and nia_grep for deeper inspection

Typical prompts:

"Index https://nextjs.org/docs"
"Search the docs for cache invalidation"
"Read the page about route handlers"

PDFs & Research Papers

Best for long technical documents where section structure matters. Use when:

you have PDFs, papers, filings, manuals, or legal docs
you need section-aware retrieval instead of flat chunk search

Start with:

PDF Indexing
index on a PDF URL or arXiv URL

HuggingFace Datasets

Best for row-level search, schema discovery, and agentic retrieval over dataset contents. Use when:

you need to search examples, records, or splits
you want natural-language access to a dataset instead of manual browsing

Start with:

HuggingFace Datasets
index on a dataset URL

Google Drive

Best for cloud-hosted files and folders you want to browse selectively, index deeply, and keep in sync over time. Use when:

your working knowledge already lives in Drive instead of a repo or docs site
you need selected files or folders rather than a full bucket import
you want shared drives and incremental sync support

Start with:

Google Drive Integration

Spreadsheets & Tables

Best for CSVs, TSVs, Excel files, and structured business data. Use when:

your source is tabular
you want row-aware retrieval and header-aware indexing
you need a lighter-weight alternative to database sync

Start with:

index on CSV, TSV, XLSX, or XLS files
search for semantic lookup
nia_read and nia_explore to inspect rows and structure

Slack & Conversations

Best for operational knowledge, team decisions, support threads, and internal discussion history. Use when:

important context lives in Slack instead of docs
you want semantic and keyword retrieval over conversations

Start with:

Slack Search

Local Knowledge

Best for internal notes, local folders, private documents, databases, and saved chat history. Use when:

the knowledge lives on disk or in a local database
you want a continuously fresh private index
you want to sync chat history and local project context into Nia

Start with:

index(folder_path=...) for one-off folder indexing
Local Sync for continuous synchronization

X (Twitter)

Best for indexing public posts, threads, and discussions from X/Twitter accounts. Use when:

you want to search someone’s posts semantically
you need to track public technical discussions or announcements
you want X content alongside your other indexed sources

Start with:

X Integration

Connectors

Best for integrating external data sources through a unified framework with OAuth and API key authentication. Use when:

you have external services to connect
you need scheduled syncing of external data
you want a unified way to manage third-party integrations

Start with:

Connectors

E2E Encrypted Sources

Best for sensitive personal data where plaintext must never leave your device — messages, contacts, notes, and more. Use when:

you need zero-knowledge sync for privacy-sensitive data
you want semantic search over encrypted content
you’re building with iMessage, WhatsApp, Apple Notes, Contacts, Reminders, Stickies, or Screenshots

Supported sources: iMessage, WhatsApp, Apple Notes, Apple Contacts, macOS Stickies, Apple Reminders, Screenshots Start with:

E2E Encryption for the full architecture guide and cookbook
TypeScript SDK adapters in sdk/typescript/src/local-first/

Capabilities

Browse the platform by what it can do.

Explore & Chat

Search and chat across pre-indexed knowledge without setting up everything yourself.

Pre-indexed Sources

Subscribe to sources the community has already indexed.

Local Sync

Keep private local knowledge fresh over time.

Capabilities

API Reference

​At a Glance

Code & Repositories

Documentation Sites

PDFs & Research Papers

HuggingFace Datasets

Google Drive

Spreadsheets & Tables

Slack & Conversations

Local Knowledge

X (Twitter)

Connectors

E2E Encrypted Sources

​Source-Type Matrix

​Code & Repositories

​Documentation Sites

​PDFs & Research Papers

​HuggingFace Datasets

​Google Drive

​Spreadsheets & Tables

​Slack & Conversations

​Local Knowledge

​X (Twitter)

​Connectors

​E2E Encrypted Sources

​Related Pages

Capabilities

Explore & Chat

Pre-indexed Sources

Local Sync

At a Glance

Source-Type Matrix

Code & Repositories

Documentation Sites

PDFs & Research Papers

HuggingFace Datasets

Google Drive

Spreadsheets & Tables

Slack & Conversations

Local Knowledge

X (Twitter)

Connectors

E2E Encrypted Sources

Related Pages