Introduction to SOURCE

SOURCE (Searchable Open University Records of Charitable Expenditures) is an interactive tool in a dataset of nearly 1 million grants from over 50,000 U.S. private foundations to colleges and universities, predominantly American. Each row is a single grant: the funder, the recipient, the year, the dollar amount, the foundation’s own free-text description of why the grant was made, and a thematic tag added by AEI to make the corpus searchable by topic. Filters cover funder, recipient, year, tag, and geography.

Every row also carries a link to the underlying Form 990-PF filing on ProPublica, as well as the foundation’s overall profile. A reader who wants to see the original disclosure for any grant in the browser can click through to ProPublica’s XML viewer and read the raw filing as the IRS received it.

SOURCE was built so that journalists, researchers, donors, university administrators, and general public can pull the same data and reach their own conclusions. A technical appendix is attached for any reader who wishes to reproduce or extend the dataset. Limitations to this tool are discussed below under “Caveats and limitations.”

Editorial stance

We make no normative claims about any of the grants in this dataset. The data is public and has been for years. The classification scheme may appear opinionated in places, and we fully disclose how we arrived at that position.

We do not rate, score, or rank foundations (except by size). The only subjective position we take is classifying the free-text purpose of each grant by topic. We do not assert what a foundation’s intent was, what an institution’s response should be, or whether any grant is good or bad. Every criteria that contributed to the classification is disclosed here and in the technical appendix.

Data sources

The raw material is Form 990-PF, the annual return that private foundations file with the IRS. The IRS releases the 990 corpus in bulk as XML files, and GivingTuesday mirrors and archives the corpus in a public S3 bucket.

We extracted grant-level detail using a Python script. The code reads the XML, isolates the grants-paid schedule, filters to Form 990-PF only (so the corpus is private foundations, not operating charities), and writes the result to a single flat file. We then ran two parallel cleaning passes on that file: one to normalize the recipient names so that the same university does not appear under variant spellings, and one to add a thematic tag to each grant using AI (Claude Haiku 4.5). Code, prompts, and replication instructions for both passes are in the technical appendix.

Topical tags

The topical tag column applies one or more labels to each grant. Eight of the ten labels are specific topics and can combine, e.g. a nursing school scholarship correctly tags as stem, finaid, and professional at the same time. The remaining two labels (general and other) are exclusive and fire only when no specific topic can be identified. The tags are as follows:

  1. stem covers science, technology, engineering, mathematics, and medicine.
  2. hass covers humanities, arts, and social sciences.
  3. athletics covers sports, teams, intramural and intercollegiate competition, and scholarships explicitly named for a sport.
  4. finaid covers scholarships and student aid such as undergraduate and graduate scholarships, named scholarships, tuition assistance, student-level fellowships. Here we took a specific call: a fellowship counts as finaid only when the recipient class is explicitly a student (graduate, doctoral, predoctoral, PhD, master’s, or undergraduate). Faculty fellowships and postdoctoral fellowships tag research instead.
  5. research covers direct scholarly investigation, such as faculty research awards, postdoctoral fellowships, investigator grants, and research centers.
  6. professional covers the professional schools, such as law, business, journalism, and public policy. Health professional schools (medicine, etc.) double-tag as both stem and professional because they are simultaneously scientific and professional.
  7. studentlife covers the non-academic side of campus, including student clubs, fraternities and sororities, religious life, and community engagement programs.
  8. capital covers physical construction and equipment such as buildings, facilities, laboratories, and renovation. “Capital campaigns” do not count when the campaign funds programs rather than buildings, e.g. a grant to a new biology building tags both stem and capital, while a grant to “the capital campaign” with no further specification tags general.
  9. general covers generic operating language: unrestricted gifts, annual fund contributions, corporate matching, IRS boilerplate, “for the donee’s charitable purposes.” It fires only when no specific topic can be extracted from the purpose text.
  10. other covers the residual: codes, internal references, truncated strings, or rows with no purpose text at all.

About 80% of the corpus tags as either general or finaid, while the other eight tags split the rest. The possibility of multiple tags makes the total sum above 100%.

Grantees

The grantee column in the raw 990-PF data is noisy. The same university appears under dozens of variant names, e.g. “Stanford University,” “Leland Stanford Junior University,” “Trustees of Stanford,” “Stanford Univ.,” “FBO Stanford University.”

We normalized every raw recipient string in the corpus against a canonical list of institution names drawn from IPEDS, the federal Integrated Postsecondary Education Data System, supplemented by a manually curated list of canonicals for foreign institutions and edge cases. Each row in SOURCE carries both the foundation’s original recipient string (preserved for traceability) and the normalized canonical name (used for filtering and aggregation).

Every recipient string sorts into one of four categories:

  1. Specific institution: resolves to a U.S. college or university with an IPEDS identifier.
  2. Foreign: a non-U.S. institution (universities in Canada, UK, Israel, China, and elsewhere).
  3. Not applicable: the grantee is not a degree-granting institution. Examples include college-access nonprofits, K-12 schools with “college” in the name, and professional associations.
  4. Not clearly identified: the string is too truncated, garbled, or ambiguous to pin down, e.g. “THE UNIVERSITY OF” without specifying which university.

Across the full corpus, the overwhelming majority of recipient strings resolve to a specific U.S. or foreign institution.

Caveats and limitations

Users should note a few constraints before drawing conclusions from this tool.

The classification is based on self-reported grant purpose only. The model assesses the foundation’s free-text purpose string and nothing else. A foundation that funds a specific program with a vague purpose description (“general operating support”) will produce a row that tags as general, even when the recipient or context would have made the topic obvious. We do not infer the foundation’s intent or recipient’s understanding, but rely solely on self-reported grant purpose.

Multi-tags are common and intentional. A grant to a nursing school scholarship correctly tags as stem, finaid, and professional. Users filtering on a single tag will see every grant that touches that topic, including grants that also touch other topics. The export reflects the multi-tag structure with semicolon-joined values.

The classifier is a language model. We audited it against human-prepared training data and against an earlier keyword-based version of the same dataset, and we report the audit results in the technical appendix. The model agreed with a human auditor on roughly 90 percent of cases in our pilot samples. A reader who finds a clear misclassification should treat it as a data-quality issue and inform us.

The data is only for U.S. based private foundations that file Form 990-PF, which comprise approximately $12 billion out of the $23 billion in total private grants, gifts, and contracts received by colleges and universities in 2023. The remaining $11 billion is a combination of funds from individuals, public (operating) charities, corporations (including LLCs), and foreign sources, of which visibility is severely limited.

The IRS began permitting e-filing of Form 990s in 2008, but only began requiring it in 2020. Thus, while individual foundations may have data going back to 2008, the data set as a whole is only considered robust from 2020 onwards. The “jump” in dollar counts starting 2020 reflects filing completion, not an actual growth in philanthropic activity.

Finally, as discussed, we do not score foundations, universities, or grants. The recipient column lists the institution that received each grant. A user who filters the dataset to a single university or foundation will see what the funding looks like, but the tool offers no indication on what the pattern means. The pattern is one input to a longer institutional analysis that lies outside the tool’s scope, but which we encourage the user to further develop on their own.

How to cite

Cite the elements of this tool as follows:

  • American Enterprise Institute for Public Policy Research for the SOURCE tool, code, and instructions.
  • U.S. Internal Revenue Service for underlying grant data (IRS Form 990-PF).
  • GivingTuesday for the consolidated XML files.
  • ProPublica for links to the XML viewer.

Technical appendix

[TBD]