Trusted Sources Library
Where the answers come from.
Pilot5 doesn't generate from training data alone. Every deliberation can pull from 400+ data sources and integrations across 21 knowledge domains — 250+ verified institutional sources (government registers, central banks, peer-reviewed databases, official statistics, regulatory codes) plus 140+ specialist API adapters for real-time queries against the same institutions. Allowlist model: no unverified blogs, no scraped forums, no Wikipedia. If a source isn't on this list, the deliberation can't cite it. Every claim it produces is tagged [SOURCED] or [INFERRED] so you always know what's grounded.
400+
data sources and integrations across the platform
250+
verified institutional sources (curated, allowlist)
140+
specialist API adapters for real-time retrieval
5
frontier models in deliberation
21
knowledge domains routed by classifier
0
blocklisted sources — allowlist only
Hybrid retrieval: BM25 + pgvector + Reciprocal Rank Fusion + FlashRank cross-encoder rerank. Top-k = 5, similarity threshold = 0.45 cosine, max context = 4000 chars per source.
How sources are used
Government registers, central banks, peer-reviewed databases, official statistics, regulatory codes. Cited directly. Tagged [SOURCED].
Domain adapters: Cochrane, EUR-Lex case law, NICE guidelines, OECD benchmarks, EPO patents. Same provenance discipline as Tier 1.
Real-time queries against the open web for fast-moving facts. Treated with stricter scrutiny: claims that don't trace to a verified source are downgraded to [INFERRED].
Retrieval anatomy
When a deliberation needs grounding, the orchestrator runs a four-stage retrieval pipeline against the trusted-sources corpus. Each stage has a measurable parameter; none of them are magic.
01BM25 lexical search. Term-frequency match against full-text indexes — catches exact keyword + phrase matches a vector model would miss. Returns a ranked candidate list.02pgvector dense search. Cosine similarity between the question embedding and source embeddings stored in Postgres via thevectorextension. Catches semantic overlap a keyword search would miss. Threshold:0.45.03Reciprocal Rank Fusion. Combines the two ranked lists into a single ordering weighted by reciprocal-of-rank, parameterk = 50. Avoids the “BM25 wins because the question used the source’s exact terminology” bias.04FlashRank cross-encoder rerank. A local ONNX cross-encoder model re-scores the fused list against the question semantically. Finaltop_k = 5sources are passed to the deliberation, capped at4000characters of context per source.
How a claim becomes [SOURCED]
Provenance isn’t a self-reported tag. The orchestrator checks each claim against the retrieved evidence and downgrades anything that doesn’t actually trace.
- Persona writes a claim. In Round 1, each persona’s analysis annotates every claim with
[SOURCED]when it cites a retrieved source, or[INFERRED]when it’s analytical reasoning. - Audit verifier runs. For every
[SOURCED]claim, the verifier looks for the cited reference in the actual retrieved corpus for that round. If the cited source is in the corpus and supports the claim, the tag stays. - Fabricated citations are downgraded. Anything tagged
[SOURCED]that can’t be traced to retrieved evidence is rewritten to[INFERRED]and logged as aSOURCED_TAGS_DOWNGRADEDevent. Synthesis only ever sees the post-audit version of the round. - You see the count. Every deliberation’s audit summary reports verified
[SOURCED]tags vs.[INFERRED]tags. A high downgrade count is a signal that the panel tried to over-claim grounding — visible in the audit trail, not buried.
Coverage by domain
327 unique institutionsLaw & Regulation45
Medical & Life Sciences26
Tax23
Finance42
Privacy & AI Regulation9
Sanctions & Trade9
Accounting Standards8
Science & Engineering28
Economics & Climate8
Statistics6
Patents & IP6
Logistics9
Startup & Private Markets18
Industry Analysts (citation only)14
Country-specific Business Law67
Cultural & Archival9
How we add a source
- Manual review — licensing, jurisdiction, freshness, cite-ability.
- Allowlist only — sources are added, never blocked.
- Request a source: legal@pilot5.ai
Want to see this in action? Run a deliberation and read the citations.