Pipeline Architecture
Four-stage workflow: matching global patents to the publication database OpenAlex
📄
142.0M
Patents
Processed
→
🔗
16.2M
Patents with
NPL Citations
→
🔍
29.6M
NPL Citations
Resolved to DOI
→
✓
4.35M
Unique Papers
Cited in Patents
A note on recency and citation lags… Filing-to-publication delays and retroactively added citation data mean the most recent patent window is structurally incomplete. Furthermore, it simply takes time for research to influence patents and downstream innovation. Coverage is strongest for publications from 2010–2019 and it should be expected that numbers for recent years will grow in the future.
Year-by-Year Breakdown
Patent citation rates by publication year — filing-to-publication delays and retroactively added citations make recent years less complete
| Year | Total Papers | Cited in Patents | % Cited | |
| 2010 | 2,471,891 | 147,201 | 6.0% | |
| 2011 | 2,624,743 | 151,785 | 5.8% | |
| 2012 | 2,821,408 | 151,610 | 5.4% | |
| 2013 | 3,084,418 | 153,285 | 5.0% | |
| 2014 | 3,492,805 | 155,837 | 4.5% | |
| 2015 | 3,525,229 | 153,907 | 4.4% | |
| 2016 | 4,060,079 | 158,510 | 3.9% | |
| 2017 | 4,526,794 | 155,779 | 3.4% | |
| 2018 | 4,682,274 | 154,309 | 3.3% | |
| 2019 | 5,286,580 | 149,705 | 2.8% | |
| 2020 | 5,717,037 | 138,078 | 2.4% | |
| 2021 | 6,140,875 | 111,493 | 1.8% | |
| 2022 | 8,145,851 | 110,177 | 1.4% | |
| 2023 | 7,483,404 | 55,617 | 0.7% | |
| 2024 | 8,321,909 | 32,687 | 0.4% | |
| 2025 | 6,842,721 | 8,221 | 0.1% | |
| Total | 79.2M | 1,988,201 | 2.5% | |
⚙
How It Works
142.0M patents were processed from an open global patent snapshot (through February 2026). Of these, 16.2M cite non-patent literature. DOI resolution via text extraction, and title + year matching to a snapshot of articles, reviews and preprints indexed in OpenAlex yielded 29.6M NPL citations with DOIs. In total, 4.35M unique scholarly papers were found to have cited patents and conclusively linked to OpenAlex data (through October 2025). The entire pipeline runs locally with no proprietary data dependencies.