Home What I Offer Projects Tools & Platforms Blog About Book Intro Call Describe Your Project

Portfolio Evaluation

I deliver comprehensive evaluations of research funding portfolios, from a single fellowship cohort to an organisation's entire 60-year funding history. My analyses are deep and multi-layered: they connect grants to publications, analyse citation and collaboration patterns, automatically extract information from progress reports, and identify policy and clinical outcomes from data and narrative accounts.

Linking portfolios to open data

I identify unique researcher identifiers and match awardees to their full publication output and grants they received in their career from major public and charitable funders. I can then use semantic similarity to link papers back to specific grants and calculate career-level statistics, comparing researchers' trajectories before and after funding. I run advanced citation analysis, co-authorship network mapping, and patent citation tracking to capture downstream commercial impact. I trace inclusions in policy documents and clinical guidelines through the NICE API and, when available, the Overton database.

Using AI to speed up research processes

I build automated classification processes to categorise historical grants against a funder's current strategic framework, even when the original grants predate that framework by decades. I use LLM pipelines for systematic information extraction from progress reports and publications, synthesising qualitative data at a scale that would be impossible manually. I've built RAG tools that allow clients to query their own portfolios in natural language, asking strategic questions across hundreds or thousands of grants simultaneously.

I also work extensively with embeddings to map portfolio overlap with other funders, identify areas of comparative strength, and surface clusters of funded research that have been more or less impactful. This produces a level of strategic insight that goes well beyond standard bibliometric indicators.

Bibliometric network calculated based on co-authorship patterns from a funder's publication portfolio. Scroll to zoom in and out. Click and drag to pan. Click on a node to see details about each researcher.

Outputs

The deliverable is shaped by what the client needs. It might be a board-ready executive summary, a detailed analytical report, interactive dashboards for tracking fellowship cohorts, or dynamic HTML visualisations of bibliometric networks. Often it's a combination — the strategic narrative for leadership, the granular data for the research team, and the striking visualisations and research narratives for fundraising or communications colleagues.

BibliometricsOpenAlexGrant-to-paper matchingPatent citationsOvertonClinicalTrials.govEurope PMCiCiteLLM pipelinesRAGEmbeddingsNetwork analysis

Landscape & Gap Analysis

Where portfolio evaluation looks inward at what a funder has supported, landscape analysis looks outward at an entire field. I map the full research landscape across grants, publications, patents, and clinical trials, and identify where the strategic opportunities lie: gaps that a funder could fill, emerging areas gaining momentum, and established clusters where additional investment would have diminishing returns.

Bibliometrics x Topic modelling

My landscapes are built at the intersection of topic modelling and bibliometrics. I use techniques like BERTopic and Latent Dirichlet Allocation to decompose a field into its constituent sub-areas, then layer bibliometric metadata on top (collaboration patterns, geographic concentrations, funding flows, organisational strengths) to show that fields of research are not monoliths. Sub-clusters within a field often have radically different behaviours: different key players, different levels of international collaboration or industry interest, different trajectories. Surfacing that structure is where strategic value lives.

Semantic map of grants funded by different funding bodies in a research field. Dots are clustered together based on their thematic similarity. Double-click each funder on the legend to only see the dots for that funder, and see the semantic space each occupies.

Processing scientific information at scale

AI and LLM techniques are core to this work. I use embeddings extensively to create two-dimensional visualisations of entire research fields (via t-SNE and UMAP), making complex landscapes navigable and intuitive. I use LLMs to eliminate false positives from keyword-based searches, to produce automated summaries of thematic clusters, and to conduct structured information extraction to help with evidence synthesis. I also build detailed researcher profiles by synthesising an individual's full publication record, identifying their core expertise and how it has evolved. I conduct tasks such as entity extraction (e.g., genes, proteins) or thematic classifications (e.g., MeSH, Fields of Science) through the use of machine learning models.

Topic modellingBERTopicLDAEmbeddingst-SNE / UMAPNetwork analysisLLM summarisationGap analysisResearcher profilingBibliometrics

AI-Powered Research Tools

I build bespoke analytical tools that give research funders capabilities they would otherwise only get through expensive proprietary platforms, or not at all. These aren't generic dashboards. They're purpose-built applications designed around a specific organisation's data, workflows, and strategic questions.

The tools I've built include a bibliometric research tool that retrieves grants and publications from multiple open databases, calculates metrics, generates interactive bibliometric networks in HTML, and profiles individual academics based on their publication record and online presence — all through a single interface that a research team can use directly. I've also developed an IP opportunity identifier that automatically extracts information from progress reports, identifies industry collaborators within a funder's researcher network, conducts automated online searches for commercial activity, and synthesises everything into a scored assessment of translation potential. This turns what would be weeks of manual desk research into a systematic, reproducible process.

I'm currently building a portfolio segmentation tool that uses RAG (retrieval-augmented generation) and HyDE (hypothetical document embeddings) to let funders query their own grant portfolios and linked publications using natural language. A user can ask a strategic question — "which grants in our portfolio are working on early-stage diagnostics?" — and the tool identifies the relevant subset, retrieves supporting evidence from grant records and papers, and generates a narrative answer. It's the kind of capability that transforms how a research strategy team interacts with their own data.

Delivery models vary depending on the engagement. Some tools I operate on the client's behalf as a managed service — they pay on a fee basis and I run the analysis as needed. Others I hand over to the client's own technical team, with comprehensive documentation and detailed handover processes to ensure a smooth transition. In both cases, the goal is a tool that keeps delivering value long after my engagement ends.

The technical stack is Python-based, using different embedding models depending on the project's requirements. I work with vector databases for production systems and FAISS for local prototyping and privacy-sensitive work. LLMs are accessed via APIs for general use or run locally on dedicated hardware when client data requires it, ensuring sensitive grant and portfolio data never leaves a controlled environment.

PythonRAGHyDEFAISSVector databasesEmbedding modelsLLM pipelinesReactInteractive dashboardsAPI integrationOpenAlexNLP

Open Data Infrastructure

I build the data infrastructure that makes everything else possible, and I help clients build it for themselves. My work is grounded in a conviction, which I advocate for publicly, that research funders should not be dependent on expensive proprietary platforms for basic intelligence about their own fields. Open data sources are now comprehensive enough to replace tools like Dimensions, Scopus, and Clarivate's InCites for most research evaluation purposes. I help organisations make that transition.

At the core of my work is OpenAlex — both through the API for targeted queries and locally hosted snapshots for large-scale analysis. I maintain local copies of global patent databases linked to the OpenAlex snapshot via citation records, enabling patent-to-publication analysis without any proprietary dependencies. I also maintain a local copy of ClinicalTrials.gov, cross-linked to the same publication data. This interconnected local infrastructure means I can trace a research investment from grant funding through to publications, citations, patent citations, clinical trials, and policy impact — entirely from open sources.

A concrete example: before OpenAlex introduced funding information in its database, I built a harmonised grants dataset that grew to four million records, compiled from UKRI Gateway to Research, NIHR, Europe PMC, European Commission programmes and other international funding repositories. For over a year, this was one of the most comprehensive openly accessible compilations of grant-to-publication linkages available anywhere — a capability that had previously only existed behind the paywalls of proprietary platforms. That dataset gave my clients a strategic advantage.

When clients come to me for open data work, the engagement can take several forms. Some want me to set up their own open data pipelines: connecting them to OpenAlex, configuring automated retrieval, building the matching and deduplication logic that turns raw data into usable intelligence. Others want tools built on top of that infrastructure: dashboards, monitoring systems, or analytical applications that their teams can operate independently. In every case, I'm building something the client owns and controls, with no ongoing licence fees or vendor lock-in.

OpenAlex API & snapshotsPatent databasesClinicalTrials.govEurope PMCETL pipelinesData harmonisationPythonLocal-first processingOpen data advocacy

Have a project in mind?

I'd welcome a conversation about how these capabilities could support your organisation's strategic goals.