RAG Portfolio Segmentation & Synthesis
A research intelligence tool that lets funders query their own grant portfolios using natural language — turning thousands of grants and their linked publications into actionable answers with full source citations.
that previously took days or weeks
The problem
A minister asks how much your organisation has funded on a particular disease area. A stakeholder wants to know which grants are working on early-stage diagnostics. A strategy lead needs a landscape view of a new priority theme. In each case, the same thing happens: someone runs keyword searches across internal systems, manually checks hundreds of results for relevance, worries about what the keywords missed, and tries to eliminate false positives — a process that can take days or, for complex queries, several weeks.
And even when reporting platforms provide structured data, the quality isn't always what you'd expect — incomplete fields, inconsistent categorisation, and missing records can undermine any analysis built on top of them.
You can't solve this by pasting records into a general-purpose chatbot. No context window can handle thousands of grants at once and return reliable, comprehensive answers. What's needed is a system that can search intelligently across your full grant portfolio — and where available, the publications linked to those grants — and retrieve precisely what's relevant to each question.
Part of the challenge is linguistic: strategic questions are often framed in policy terms that don't match the technical vocabulary in grant records. Our work on bridging policy and research language with LLMs directly informed the retrieval approach used here.
What the tool does
Ask a question in plain English — "what have we funded on neuroinflammation in the last five years?" — and the system identifies the relevant grants, draws on linked publications for additional evidence where available, and generates a structured answer with inline citations back to the original sources.
The tool operates in two distinct modes, and each deployment can include one or both depending on the use case.
Narrative Synthesis
Produces written answers backed by source-level references — ideal for impact case studies, fundraising reports, and briefing non-expert audiences. Uses semantic retrieval over grant records and, where available, the full text of linked publications, with reranking for relevance and LLM synthesis to generate a cited narrative.
Evidence Gathering
Runs high-recall systematic searches across grants and linked publications, combining semantic search, keyword matching, and LLM-powered verification to remove false positives. Returns portfolio-level statistics and CSV export — designed for portfolio analysis, research intelligence workflows, and answering the "how much have we funded on X?" question with confidence.
How it works
Ask
Submit a research question in natural language. The system analyses it for ambiguity and may ask clarifying questions — time period, population, scope — to sharpen retrieval before searching.
Retrieve
The engine expands your query using HyDE (hypothetical document embeddings), then searches across grant records, semantic vectors, and keyword indexes simultaneously — combining multiple retrieval methods for high recall and precision.
Synthesise
Retrieved evidence is reranked for relevance, grouped by source, and passed to a large language model that generates a structured answer with numbered inline citations — each traceable to a specific grant record or publication.
Key capabilities
Narrative synthesis with citations
Written answers with [1][2]-style inline citations linked to grant records and, where available, DOIs. Click any citation to see the source passage. Export the full reference list.
Systematic evidence search
High-recall search across grants and linked publications using FAISS semantic search, BM25 keyword search, or hybrid fusion. Each result verified for relevance by LLM to eliminate false positives.
Conversation threading
Ask follow-up questions that carry context forward. Star important queries, add custom tags, and search your full query history.
Portfolio-level analytics
Evidence searches surface year distributions, top authors, leading institutions, and total funding — giving a landscape view of any research area in your portfolio.
Built for research organisations
The tool is currently being built and deployed for a large health research charity and a major public funding body. It indexes grant records alongside linked publications sourced from Europe PMC and enriched via OpenAlex — capable of handling thousands of grants and tens of thousands of linked papers in a single deployment.
Deployments are configured to the client's own portfolio and use case. For one client, the tool is being built for an internal team — answering strategic questions, supporting portfolio reviews, and accelerating the reporting work that currently absorbs significant analyst time. For the other, it will serve as an external-facing tool, giving stakeholders who regularly work with a particular portfolio self-service access to query it directly, reducing the volume of ad hoc requests that fall to internal teams.
Each deployment can include the narrative mode, the evidence-gathering mode, or both.
Data governance & hosting
The tool can be hosted locally by the client or on a cloud instance configured to respect GDPR requirements including geographic location of servers. For AI models, commercial APIs can be used for non-sensitive data, while local open-source models are available for private or restricted datasets. I can either run the tool on behalf of clients or help them set up their own infrastructure for fully self-hosted deployment.
Interested in the RAG Portfolio Tool?
I'd be happy to walk you through a demo and discuss how it could be configured for your organisation's portfolio.