Question 1

When should I pick a filesystem over a vector database?

Accepted Answer

When the agent is retrieving things it knows the keywords for: contract clauses, code identifiers, named entities, exact phrases, structured data. Filesystem retrieval (grep, awk, jq, pdftotext) is faster, cheaper, and deterministic. The agent issues a shell command and gets the exact match.

Question 2

When does a vector database actually pay off?

Accepted Answer

When the question doesn't map cleanly to keywords — "find conversations that felt similar to this one," "retrieve documents semantically related to a topic," "rank these passages by relevance." Embeddings shine where exact-match search fails, which is genuinely a real-but-narrow set of agent tasks.

Question 3

Can I use both?

Accepted Answer

Yes — most production agents do. TroveFiles for the agent's own memory, scratchpad, and known-keyword corpus retrieval. A vector database for semantic similarity over a large external corpus. They are complements, not competitors.

Question 4

Doesn't a vector database scale better than grep?

Accepted Answer

For very large corpora (tens of millions of documents), vector indices win on latency. For typical agent corpora — a customer's files, a knowledge base, last year's contracts — TroveFiles stays sub-second by keeping retrieval close to the data, so the agent isn't paying for round trips between an embed call, an index query, and a rerank.

Question 5

Does grep hold up under concurrent agents?

Accepted Answer

Yes. TroveFiles is built so each command runs independently — concurrent agents fan out instead of queuing through a shared index. Throughput scales with parallel readers rather than the index tier you pay for. Vector DBs, by contrast, route every query through a single index, so concurrency is bounded by replicas and pricing tier.

Question 6

What about chunking and re-embedding when documents change?

Accepted Answer

Vector pipelines have to re-chunk and re-embed every time a source document changes. With TroveFiles, you just write the new file. The next grep picks it up. No re-indexing job, no embedding cost on writes.

Question 7

How do I migrate from a vector database to TroveFiles?

Accepted Answer

Most migrations are partial: keep the vector DB for true semantic queries, move the keyword and structured-data retrieval onto TroveFiles. Upload the source docs, point the agent's bash tool at TroveFiles, and start removing custom retrieval code. Most teams find 60-80% of their queries collapse into grep/awk/jq.

Question 8

Who's running TroveFiles in production?

Accepted Answer

TroveFiles is the storage layer behind Silvia, our AI CFO with over $30 billion in connected assets. Every Silvia user has a TroveFiles namespace where the agent stores memories, skills, and preferences and retrieves them via shell commands across sessions.

Dimension	TroveFiles (filesystem)	Vector database
Retrieval model	Exact pattern (grep, awk, jq)	Semantic similarity (cosine, dot product)
Best for	Keyword search, structured data, exact-match recall	Fuzzy semantic search, ranking, similarity
Write cost	One file write	Chunk + embed + upsert (per doc, per change)
Read cost	One shell command	Embed query + index query + rerank
Determinism	Same command, same answer	Depends on embedding model and chunking strategy
Inspectability	cat the file, eyeball the result	Vector inspection, no human-readable representation
Multi-tenant isolation	Per-namespace directory roots	Per-namespace collections (tooling varies)
Multimodal preprocessing	pdftotext, ffmpeg, convert in the workspace	Separate ETL pipeline before embedding
Deletion / GDPR	rm -rf workspace/users/alice/	Find every chunk, drop from index, hope metadata is clean
Portability	A directory of files — copy anywhere	Vendor-locked index format; migrating is a project
Cost at small scale (< 100k docs)	Storage only	Embeddings + index hosting
Cost at large scale (10M+ docs)	Grep latency grows linearly	Sublinear with proper index

TroveFiles vs. vector databases for AI agent retrieval.

Two different
retrieval models.

What it costs to retrieve.

The LLM is a better
retrieval architect
than your pipeline.

The honest tradeoffs.

Pick TroveFiles when…

Pick a vector database when…

Filesystem vs. vector DB,
answered.

When should I pick a filesystem over a vector database?

When does a vector database actually pay off?

Can I use both?

Doesn't a vector database scale better than grep?

Does grep hold up under concurrent agents?

What about chunking and re-embedding when documents change?

How do I migrate from a vector database to TroveFiles?

Who's running TroveFiles in production?

Try retrieval that
doesn't need embeddings.

TroveFiles vs. vector databases for AI agent retrieval.

Two differentretrieval models.

What it costs to retrieve.

The LLM is a betterretrieval architectthan your pipeline.

The honest tradeoffs.

Pick TroveFiles when…

Pick a vector database when…

Filesystem vs. vector DB,answered.

When should I pick a filesystem over a vector database?

When does a vector database actually pay off?

Can I use both?

Doesn't a vector database scale better than grep?

Does grep hold up under concurrent agents?

What about chunking and re-embedding when documents change?

How do I migrate from a vector database to TroveFiles?

Who's running TroveFiles in production?

Try retrieval thatdoesn't need embeddings.

Two different
retrieval models.

The LLM is a better
retrieval architect
than your pipeline.

Filesystem vs. vector DB,
answered.

Try retrieval that
doesn't need embeddings.