SOURCE - The Future of American University

SOURCE (Searchable Open University Records of Charitable Expenditures) provides visibility into over 1.6 million grants from over 77,000 U.S. private foundations and public charities to over 6,100 colleges and universities. SOURCE is developed by Tao Tan.

Introduction to SOURCE

SOURCE (Searchable Open University Records of Charitable Expenditures) is an interactive tool to access a dataset of over 1.6 million grants from over 77,000 U.S. private foundations to colleges and universities, predominantly American. Each row is a single grant: the funder, the recipient, the year, the dollar amount, the foundation’s own free-text description of why the grant was made, and a thematic tag added by AEI to make the corpus searchable by topic. Filters cover funder, recipient, year, tag, and geography.

Every row also carries a link to the underlying Form 990 and 990-PF filing on ProPublica, as well as the foundation’s overall profile. A reader who wants to see the original disclosure for any grant in the browser can click through to ProPublica’s XML viewer and read the raw filing as the IRS received it.

SOURCE was built so that journalists, researchers, donors, university leaders, and the general public can pull the same data and reach their own conclusions. A technical appendix is attached for any reader who wishes to reproduce or extend the dataset. Limitations to this tool are discussed below under “Caveats and limitations.”

Editorial stance

We make no normative claims about any of the grants in this dataset. The data is public and always has been. The classification scheme may appear opinionated in places, and we fully disclose how we arrived at that position.

We do not rate, score, or rank foundations (except by size). The only subjective position we take is classifying the free-text purpose of each grant by topic. We do not assert what a foundation’s intent was, what an institution’s response should be, or whether any grant is good or bad. Every criteria that contributed to the classification is disclosed here and in the technical appendix.

Data Sources

The raw material is Form 990 and Form 990-PF, the annual return that private foundations and public charities file with the IRS. The IRS releases the 990 corpus in bulk as XML files, and GivingTuesday mirrors and archives the corpus in a public S3 bucket.

We extracted grant-level detail using a Python script. The code reads the XML, isolates the grants-paid schedule, and writes the result to a single flat file. We then ran two parallel iterative cleaning passes on that file: one to normalize the recipient names so that the same university does not appear under variant spellings, and one to add a thematic tag to each grant using AI (Claude Haiku 4.5). Code, prompts, and replication instructions for both passes are in the technical appendix.

Topical tags

The topical tag column applies one or more labels to each grant. Eight of the ten labels are specific topics and can combine, e.g. a nursing school scholarship correctly tags as stem, finaid, and professional at the same time. The remaining two labels (general and other) are exclusive and fire only when no specific topic can be identified. Most of the corpus tags as either general or finaid, while the other eight tags split the rest. The possibility of multiple tags makes the total sum above 100%.

Grantees

The grantee column in the raw 990 and 990-PF data is noisy. The same university appears under dozens of variant names, e.g. “Stanford University,” “Leland Stanford Junior University,” “Trustees of Stanford,” “Stanford Univ.,” “FBO Stanford University.”

We normalized every raw recipient string in the corpus against a canonical list of institution names drawn from IPEDS, the federal Integrated Postsecondary Education Data System, supplemented by a manually curated list of canonicals for foreign institutions and edge cases. Each row in SOURCE carries both the foundation’s original recipient string (preserved for traceability) and the normalized canonical name (used for filtering and aggregation). Across the full corpus, the overwhelming majority of recipient strings resolve to a specific U.S. or foreign institution.

Caveats and limitations

Users should note a few constraints before drawing conclusions from this tool.

The classification is based on self-reported grant purpose only. The model assesses the foundation’s free-text purpose string and nothing else. A foundation that funds a specific program with a vague purpose description (“general operating support”) will produce a row that tags as general, even when the recipient or context would have made the topic obvious. We do not infer the foundation’s intent or recipient’s understanding, but rely solely on self-reported grant purpose.

Multi-tags are common and intentional. A grant to a nursing school scholarship correctly tags as stem, finaid, and professional. Users filtering on a single tag will see every grant that touches that topic, including grants that also touch other topics. The export reflects the multi-tag structure with semicolon-joined values.

The classifier is a language model. We audited it against human-prepared training data and against an earlier keyword-based version of the same dataset. The model agreed with a human auditor in over 90 percent of cases in our pilot samples. A reader who finds a clear misclassification should treat it as a data-quality issue and inform us.

The data is only for U.S. based private foundations and public charities that file Form 990 and 990-PF. Gifts and grants from individuals, corporations (including philanthropic LLCs), and foreign sources are not shown. More than half of the funds are shown from “support organizations” (such as foundations affiliated with universities) and from transfer payments (universities to each other), which we filter out by default. The remaining show third-party external funding into the higher education ecosystem.

The IRS began accepting e-filing of Form 990 and 990-PFs in 2008, but only began requiring it in 2020. Thus, while individual foundations may have data going back to 2008, the data set as a whole is only considered robust from 2020 onwards. The “jump” in dollar counts starting 2020 reflects filing completion, not an actual growth in philanthropic activity.

Finally, as discussed, we do not score foundations, universities, or grants. The recipient column lists the institution that received each grant. A user who filters the dataset to a single university or foundation will see what the funding looks like, but the tool offers no indication on what the pattern means. The pattern is one input to a longer institutional analysis that lies outside the tool’s scope, but which we encourage the user to further develop on their own.

Raw Data

We make raw data available to researchers and academics via an extractor portal here. Free registration is required.

Technical appendix

A full technical appendix, including code and markdowns for any user who wishes to reproduce or extend the analysis can be found at this GitHub repository. All code and markdowns are released under the GNU General Public License 3.0.

How to cite

American Enterprise Institute for Public Policy Research, for the AEI SOURCE application, code, and markdowns.
U.S. Internal Revenue Service, for the underlying grant data (IRS Form 990 and 990-PF).
U.S. Department of Education, for IPEDS data.
GivingTuesday, for the consolidated and mirrored XML corpus.
ProPublica Nonprofit Explorer, for the per-filing links to the XML viewer.

Questions and comments

We welcome your feedback on methodology and replication. Please contact AEI CFAU here.