Patent–publication citation linking from open data at global scale

A signal most funders never see

When a patent cites a scholarly paper, it is one of the early signs that a piece of research has crossed the boundary from academic discovery into the realm of commercial application. Someone building a product, a process, or a technology looked at that paper and said: this is relevant to what we are making.

For research funders, that signal should be very relevant. It answers questions that most impact frameworks struggle with: is our funded research actually being picked up by industry? Which parts of our portfolio are closest to real-world application? Where are the translation pathways we should be paying attention to?

And yet, most research funders have never used patent citation data in their evaluations. Not because it isn’t useful, but because until recently it has been locked behind expensive platforms designed for IP professionals — not for research strategy teams.

The result is that patent citation analysis has remained a blind spot in research evaluation. Funders routinely track publications, citations, clinical trial mentions, and recently, policy references. But the link between funded research and commercial innovation — one of the most concrete indicators of translational impact — is largely missing from the picture.

What I built

I built a pipeline that processes the entire EPO global patent dataset — 142 million patent documents from the EPO Bulk Data Distribution Service — and resolves their non-patent literature citations to DOIs, then matches those DOIs to OpenAlex records. The result is a dataset linking patents to scholarly papers, built entirely from open sources with no proprietary dependencies.

Key figures

142 million patent documents processed from the EPO global dataset
29.6 million non-patent literature citations resolved to DOIs
4.35 million unique scholarly papers matched to OpenAlex records
Zero proprietary dependencies

Why this matters now

Research funders are under growing pressure — from government, from boards, from the public — to demonstrate that their investments lead to tangible outcomes. Publications and citations are necessary but no longer sufficient. Funders need to show that research translates: into treatments, into technologies, into products and policies that make a difference.

Patent citations sit at exactly that boundary. A patent citing a funded paper is evidence that translation is already happening. And unlike many impact indicators, it is specific: you can see which paper, which patent, which applicant, which technology area. You can trace the pathway from a grant to a publication to a patent filing to a commercial application. You can use publication metadata to enhance the information you have about the process: are certain topics, researchers or organisations influencing more innovation?

The fact that most funders have not had access to this data is not a reflection of its value — it is a reflection of the nature of the ecosystem: traditionally only specialised organisations could query, compile and harmonise the data, before re-selling it. That is changing.

What you can ask with this data

Once you can link your funded publications to the patents that cite them, a set of strategic questions opens up:

Which parts of our portfolio are generating the most commercial interest?
Which industries are building on our research, and are they the ones we expected?
How does our portfolio’s translation profile compare to similar funders, or to the field as a whole?
Can we identify early signals of commercial interest to build industry partnerships strategically?
Are there research areas where we are funding strong science that industry is not yet picking up — and should we be doing more to bridge that gap?

A paper cited in a patent application may not appear in any clinical guideline or policy document for years. But the commercial signal is already there — someone has already decided this research is worth building on.

For research funders looking to identify and support translational potential early, that is a powerful kind of intelligence to have.