Grants data fact sheet
This page provides answers to frequently asked questions about Candid’s grants data specifically, as well as special notes for researchers interested in using our data. These questions include:
- What are the sources of Candid’s grants data?
- What data does Candid collect about grants?
- How recent is the data?
- How complete is the data?
- How are grants coded?
While Candid also collects data on U.S. federal government grants and official development assistance, the focus of this page is our data on philanthropic grantmaking. If you are a researcher and have additional questions about Candid’s data, please contact: email@example.com. Other inquiries should be directed at firstname.lastname@example.org.
What are the sources of Candid's grants data?
Every year, Candid processes data on approximately three million grants representing more than $180 billion in funding, not including grants made by the U.S. federal government. This data comes from a wide variety of sources, detailed on our Data Sources page. In brief, we collect data from government agencies (such as the IRS or Canada Revenue Agency), from funders themselves, from other data sharing partners, from organizations’ websites, and, increasingly, from news sources. Data from these disparate sources are cleaned, harmonized, and then coded according to our Philanthropy Classification System (PCS). For more information on our coding process, please see the section on “How are grants coded?” below.
Avoiding duplication of grants data
In some cases, we may get data about the same grant from different sources at different times. For example, we may first learn about a recently awarded grant from the news or a funder’s press release and then see that same grant in the funder’s IRS Form 990-PF later. To prevent grants from being duplicated in our products, Candid uses a system of “survivorship” to determine which source of grants data gets precedence. This system works as follows:
- Grants reported by funders themselves always override other sources. If we receive a complete grants list from a funder, it would block or replace data from annual returns, such as IRS Form 990 or 990-PF.
- Grants obtained from news sources—which are usually focused on a particular topic (such as a disaster)—are replaced by a complete source for that funders’ grantmaking once it is available for the year in question. Complete sources of grants data (i.e., ones that reflect the funders’ total giving) include data from the funder themselves or the 990-PF.
What data does Candid collect about grants?
In order to load data about a grant into our database, Candid must have at least the following information: funder name, recipient name and location1, fiscal year, and grant amount. Whether data beyond this—such as a grant description—is available tends to depend on the source. Grants data from government sources, such as the 990-PF and 990 annual returns filed by funders, usually have very limited information. Candid’s eReporting program offers funders the option of sharing additional details about their grants, including grant title, program area, grant description, and key characteristics, such as subject area, population served, and geographic area served. A complete list of eReporting fields and their definitions can be found on the eReporting template.
How recent is the data?
The timeliness of grants data also varies depending on the source. Annual return data from government sources—such as Form 990 or 990-PF—can take six months to two years to reach Candid from when they were filed, whereas data from the news can be just one to two days old. The timeliness of data provided by funders directly to Candid can also vary significantly, as it is up to funders to decide how often they share data with us. Most funders submit data once a year. Others report monthly or, in a few cases, as grants are approved. Candid’s products are updated daily and reflect the latest data available.
How complete is the data?
Completeness of data can be thought of in two ways: the completeness of data for the sector (i.e., data that covers all funders’ giving) and the completeness of data for a specific funder.
United States: Candid’s data for a given year tends to be complete about two years after that year’s end (for example, data for 2019 will be complete by the end of 2021). By this time, complete grants data for all foundations and grantmaking public charities is usually available from the U.S. government via the 990 and 990-PF. Data for more recent years is less complete, because fewer 990s and 990-PFs are available. This more recent data has largely been reported by funders themselves, and, to a lesser extent, obtained by Candid from the news, funder websites, etc.
Other countries: Apart from the U.S., Candid has comprehensive data on Canadian grantmaking and mandated corporate social responsibility spending by Indian companies. Our grants data from other countries is primarily self-reported by funders via eReporting or contributed by Candid’s data partners and therefore represents an unknown portion of grantmaking in those countries.
Specific funder data
Just as with sector-wide data, current or previous year data for any single institutional funder is less likely to be complete than that for earlier years. If a funder reports on its grantmaking on a monthly or quarterly basis, for example, at any given time, Candid will only have partial data for the given year. Grants obtained from the news are also highly likely to represent only a small portion of a funder’s grantmaking.
In short, the more timely the data, the less likely it is to be complete at both the sector and individual funder level.
Other factors impacting completeness of individual funder data
Candid is not always able to obtain a complete grants list for a funder, no matter what the timeframe. In some cases, we receive incomplete annual returns from government sources (for example, a 990-PF may have a missing or partial grants list). In other cases, a funder participating in our eReporting program may choose to report on only a subset of its grants. For example, human rights and other funders may choose not to share information on specific grants in order to protect the safety of their grantees (the FAQs on our eReporting page include guidance for funders on how to share sensitive information).
Candid does its best to fill known gaps in our data—by contacting the government agency supplying the data, by requesting the information from funders directly, etc.—but we are not always successful.
How are grants coded?
Candid’s Philanthropy Classification System consists of five facets: subject, population served, support strategy, transaction type, and organization type. Candid also collects and/or applies coding on geographic area served. Unless supplied directly to Candid by funders2, taxonomic codes are applied to grants in four ways:
1. “Slam dunk” coding
First, Candid’s system searches grant descriptions for phrases that exactly or very closely match our taxonomic codes and applies these codes to those grants. For example, a grant with a description of "For general support" would be assigned the general support code. All incoming grants go through this process.
After the first round of “slam dunk coding,” all grants are sent through Candid’s autoclassifier, which is intended to replicate how our expert staff would code grants. The autoclassifier is a machine learning model that bases its coding on data in the following fields: grant title, grant description, and program area. Grant title and program area are only available if a funder has shared its data with Candid directly and included those fields.
Whether the autoclassifier assigns a code for a given facet depends on the level of detail available in these fields, and whether the code predicted meets the confidence threshold Candid has established for that facet (please see the box below for more information about how the autoclassifier was developed and how it works).
It is important to keep in mind two things about coding by the autoclassifier:
- The autoclassifier will not apply coding for all facets unless it has sufficient information to do so.
- Unlike with slam dunk coding, autoclassified codes are not predicted strictly based on the presence or absence of specific words or phrases. Instead, codes are predicted based on statistical algorithms that estimate the codes that are highly likely to be the best fit, based on how Candid’s staff have coded those words and similar word combinations in the past.
Candid’s autoclassifier has been trained for accuracy with a supervised machine learning model. This model was first developed in 2015 using millions of grants manually coded to the PCS by Candid’s taxonomy experts in previous years. Using this data, the model was trained to associate patterns between words and phrases in grant text and the PCS codes that had been manually assigned. Candid periodically improves the model for greater accuracy, always using manually coded data as a basis for training.
The model is trained to balance both high precision, meaning the model does not predict codes a person did not assign, and high recall, meaning the model can predict many, if not all, of the PCS codes a person assigned to the grant. For example, say a grant was coded by a person with the following subject codes: senior services, nutrition, and basic and emergency aid. If the model predicted nutrition, human services, and disaster relief, these results would have poor precision, because two of the codes predicted do not match those assigned by the human coder. If the model predicted only senior services it would have low recall, as it missed the nutrition and basic and emergency aid codes assigned by the coder.
Each code predicted by the model is accompanied by a confidence score that indicates how sure the model is that a predicted code is correct. The scores range from 0 to 1, with scores closer to 1 more likely to be accurate.
Minimum thresholds that the model must meet or exceed in order for codes to be applied are set for each facet. These thresholds reflect the balance Candid must strike between the desire to have grants be coded as precisely as possible (the lower in the taxonomy hierarchy a code is, the harder it can be to predict accurately) with the need to minimize the number of incorrect codes. Given the larger implications of incorrect population codes, this facet has more stringent thresholds.
3. Manual coding
A portion of grants are reviewed and coded manually by Candid’s indexing team. This team reviews grants of $250,000 or more from 1,000 of the largest U.S. funders or for special projects, such as Advancing Human Rights, adjusting or augmenting coding assigned by the autoclassifier as needed. In all, indexers review approximately 25,000 grants representing $27 billion in giving per year. Candid’s autocoding process will never override coding applied by Candid staff or that supplied by funders who share their data.
4. Application of recipient organization coding
If the grant text available is so vague that it is not possible to auto- or manually code, the grant receives the coding of the recipient organization, if available in Candid’s database. If coding for a recipient does not exist for a given facet, the grant coding for that facet will also remain blank.
Coding of recipient organizations happens in one of three ways:
- Review of the organization’s website by Candid staff.
- Autoclassification based on the organization’s mission statement, if available.
- The provision of codes by the organization itself through Nonprofit Profiles on guidestar.org.
General notes about coding
The quality of the codes applied by either Candid’s autoclassifier or staff is directly related to the quality of the grants data we receive from funders or obtain from other sources. Staff are instructed not to make assumptions when coding grants; there must be evidence for a code in the text available for that code to be applied. Grant descriptions in the annual returns filed by funders tend to be minimal, such as “Health,” or even nonexistent. For this reason, we encourage funders to include detailed grant descriptions providing answers to the questions of who, what, where, and how in the data they share with Candid. An exception is grants for general support. Funders providing this type of funding can simply include “For general support” or similar language as their grant description.
The more detailed the available text, the more precise the coding of grants is likely to be. This could mean the difference between a grant receiving only a high-level subject code, such as “Arts and Culture,” versus a more specific “Ceramic Arts” or “Sacred Sites” code, or receiving a high-level population code, such as "Indigenous Peoples," rather than a more precise "Alaskan Natives" or "Pacific Islanders" code.
To give a real-world example, here is a grant description as provided in the 990:
“Human trafficking prevention.”
Here is the description for the same grant, but as provided directly to Candid by the funder:
“To support Better Brick—Nepal in its effort to improve labor conditions on brick kilns and to address the underlying economic factors that drive worker exploitation.”
The latter description allows for much more specific coding, including for geographic area served.
Special notes for researchers
Those intending to use Candid’s grants data to produce analyses about funding trends should keep the following in mind:
- Year-over-year comparability: Grants data availability in Candid’s database varies year to year. We may have grants for a specific funder in one year but not the next because that funder did not submit data (for non-U.S. funders) or we were not otherwise able to source a grants list. Additionally, available grants for a particular funder for a given year do not necessarily reflect that funder’s full grantmaking for that year. As a result, tools in which grants data appears, such as Foundation Directory Online and Foundation Maps, are not optimized to show year-over-year changes in giving. Nor can it be assumed that available grants data reflects the full scope of an organization’s grantmaking.
To account and correct for these challenges, Candid created an annual research set, the Foundation 1000, which captures grants of $10,000 or more awarded by a set of the 1,000 largest U.S. funders for a given year. Grants in this set undergo additional cleaning and review to ensure an acceptable level of completeness and coding accuracy. Grants data based on the Foundation 1000 set is available for purchase by contacting Bunkie Righter at Bunkie.Righter@candid.org.
- Allocation of grant dollars where multiple codes exist: Grants may benefit multiple subjects, population groups, support strategies, and geographic areas served. In these cases, Candid allocates the full dollar amount to each category, because we do not have sufficient information to specify the share of support that is intended for each.
Population coding: In assigning population codes, neither Candid staff nor Candid’s autoclassifier takes into account the demographic makeup of the geographic area served. In other words, even if a grant is intended to benefit a community with a largely Latinx population, unless that population is explicitly referenced in the grant text available, it is unlikely to receive that population code. As such, when using our data to comment on populations served, please keep in mind that Candid’s data is more likely to reflect grantmaking explicitly designated for a given population group rather than all grantmaking actually reaching that population group.
Candid’s population served taxonomy also cannot capture intersectional identities. For example, a grant meant to benefit low-income seniors would be coded in the same way as a grant meant to benefit low-income people and seniors. Both grants would receive the following population served codes: “Seniors” and “Low-income people.” Our taxonomy does not have explicitly intersectional codes that apply, for example, only to “low-income seniors.”
- Authorized or paid grant amounts: Grant amounts may be either authorized, reflecting the full value of the grant the year it was made, or paid, representing the amount of funding actually disbursed to a recipient organization in the given fiscal year. Most grants in Candid’s database reflect amounts paid. Authorized amounts are generally only available if provided to Candid by the funder. If a grant is authorized for multiple years, the full value of the grant is attributed to the year in which it was authorized—i.e., a grant authorized in 2020 for $5 million to be distributed over the next five years will appear in Candid’s database as a single grant for $5 million for 2020. We record information about the grant duration, if available, separately.
- Double-counting of grants dollars: In some cases, grants dollars may be accounted for more than once in Candid’s database—i.e., when a funder awards a grant to an organization that then regrants those funds, as in the example below.
- The Democracy Fund awards a grant to NEO Philanthropy for its State Infrastructure Fund.
- NEO Philanthropy awards a grant to ONE Arizona to support its building the civic infrastructure needed to engage communities of color, in particular Latinx communities, in the civic life of Arizona.
To adjust for this double-counting in our own research efforts, Candid removes grants awarded to organizations that also appear in the set as grantmakers when calculating overall totals (this approach also means that where a grant is awarded to support the capacity of an organization, rather than for regranting, it would be removed as well). In the above example, we would remove the grant awarded to NEO Philanthropy, but keep the grant awarded by NEO Philanthropy to ONE Arizona.
When reporting on grantmaking by funder, however, we take all grants into account. In this example, we would keep the grant awarded to NEO Philanthropy so as not to under-report Democracy Fund’s total grantmaking.
 Candid recognizes that it is not always possible or advisable for funders to share recipient name and location due to privacy or security concerns. We offer funders several ways in which they can responsibly share their data with us in these contexts, detailed in the FAQs on our eReporting page.
 Organizations must provide the exact codes—either the term (e.g., elementary education) or the code (e.g., SB030200) themselves—for Candid’s system to assign these codes to grants correctly. In some cases, organizations have sent Candid their taxonomy to be crosswalked to the PCS. These organizations then apply this crosswalk in their grants management system so the grants data they send Candid is already coded for the applicable facets.