The Wide World of Drinking Water Data

Note: This article was revised on March 26th, 2024 for improved accuracy about the Clean Water State Revolving Fund process.

Frozen in glaciers, pooling in lakes, running in rivers and flowing beneath our city streets, water takes many forms. Its quantity and quality vary across time and space, despite society’s consistent need for water to sustain our agriculture, industry, and our basic survival. For experts working to improve drinking water quality and access, the data is almost as important as water itself. Just like water, its data varies in quantity and quality and wells up from many sources—from utilities, government agencies, and private companies, to the general public. How can we better access this data? What data should be used and for what purpose? Finally, who has access to safe, affordable, and quality drinking water—and who doesn’t? This article begins to answer some of those timely questions.

What data—and what water—can we trust?

The task at hand for water data is how to make sense of it—given its volume and variety of sources—in order to ensure equitable investment, build trust in water sources, and raise alarms when needed. Currently, we address drinking water data at the point it starts flowing from the utility, but it's important to note that surface and source water—just like our rivers, streams, lakes, and aquifers—have their own data nuances, systems, and caveats (more on that later!). 

To start, let’s consider three categories based on data topics: 

1) Testing and compliance data

2) Community data

3) Financial and demographic data

Looking at the major sources of data within each of these categories, several points about when and how to use these sources of information deserve attention. The summary views below detail processes and limitations we think are important:

  • The Safe Drinking Water Information System (SDWIS) is maintained by EPA and all community water systems, and it submits information quarterly. This data is essentially a table of all water system names and any violations they may have had; it is not geospatial data. Because information is submitted regularly, it is the most up-to-date national picture of our drinking water landscape (but is not without flaws). Lastly, see the SDWIS Database here, and SDWIS rules and regulations like the Lead and Copper Rule and Total Coliform Rule.

  • The EPA publishes a dataset known as the 6-year Review—a robust and vetted data product containing contaminant testing results across the nation, further elucidating issues in drinking water quality. While incredibly rich, these data are only available in segmented Microsoft Access database files. Recent academic efforts have made these datasets standardized and accessible. Also check out EPA’s 6 Year Review, and Massive Data Institute’s accessible 6-Year Review dataset.

  • Some states maintain discrete drinking water databases. One example is the Texas Drinking Water Watch, which shows testing data generated from mandated compliance regimes. Depending on the state, these databases may contain more information about a water system than the federal SDWIS system. Also check out Oregon’s Drinking Water Database and Maryland’s Drinking Water Watch.

  • CCRs, sometimes known as Water Quality Report, is a yearly requirement for utilities to summarize their water systems status. They include summary testing data, supply and demand figures, and general information about the utility. They are written for public consumption and are generally 5-20 pages in length. Lastly, check out EPA’s Find your Local CCR page and the City of Baltimore’s CCR.

Key datasets within this cohort include the federal Safe Drinking Water Information System (SDWIS) and complementary state systems. SDWIS primarily tracks compliance with EPA’s drinking water rules which regulate the allowable values of certain contaminants. Violations of these rules are known as “health based violations,” and can result in penalties like fines, increased oversight, and consent decrees if an entity is found to be in consistent violation. In addition, regulatory agencies like the EPA inspect facilities to ensure their processes, staffing, and protocols are sufficient to meet standards. Data collected during compliance exercises are revealed at the summary level through SDWIS, and yearly at the water system level, through a Consumer Confidence Report (CCR). Private companies and academics have also pulled together similar contaminant databases through web-scraping, Freedom of Information Act Requests, and direct outreach to utilities. Finally, detailed contaminant results can be found in EPA’s 6-year Review datasets. While these are vetted for quality, years of delay can ensure between data submission and final data publication.

One more important piece of the puzzle in terms of water data sources is utility and government data contained in lead service line inventories. By October 16, 2024, utilities are required to submit inventories as per the revised Lead and Copper Rule. Larger, well funded, utilities will generally be able to complete these inventories in accordance with newer requirements. However, polled water experts at the 2023 Association of State Drinking Water Administrators conference believe that an estimated 27% of utilities will only be able to inventory less than half of their service lines. Despite those shortcomings, better data should improve the estimates submitted during the DWINSA, painting a fuller picture of where to prioritize funding from programs like the State Revolving Fund (SRF)—the largest federal pot of funds dedicated to water infrastructure improvements. 

Community-generated data is information collected by community-led groups or individual households about the quality of their drinking water. 

Despite the ocean of data collected by utilities and government agencies, individual and community water testing can often be the most impactful. 

In recent water crises like Flint, Michigan, and Jacksonville, Mississippi, community members expressed great distrust of their water but needed data to prove it. While data suggests an estimated 10% of the US community water population (about 30 million) have unsafe drinking water, studies indicate that 60 million people drink bottled water rather than water from a tap—a two-fold difference. In addition to known shortcomings around compliance information from databases like SDWIS—the federal Government Accountability Office (GAO) found that 26-38% of violations are either not reported or inaccurately reported—the percentage of unsafe drinking water is likely higher than 10%, placing the onus on water users to advocate around their own experiences. Qualitative data like taste, color, and odor, are valid data points and can help sound alarms around water safety—but these often fall short of the high standards needed to engage remediation efforts by government entities. In dire cases, drinking water contaminants can cause chronic headaches, elevated blood lead levels, and higher rates of cancer depending on the contaminant.

While utilities do test at taps, the majority of utility data is conducted at the treatment facility or utility owned service lines, leaving the consumer perspective relatively under represented. Companies like SimpleLab can fill this gap and lend credence to sensory knowledge by providing homes with testing kits. Referred to as “tap-data,” water users can request kits for a variety of drinking water contaminants and quickly receive test results, either validating their distrust of water quality, or quelling fears. Data from individual households or semi-structured community sampling is also incredibly valuable—particularly when utilities claim an issue has been resolved but community experience says otherwise. SimpleLab has combined  tap user and utility collected data, revealing it through the City Water Project. The application allows for users to search a city and see test results from both sources, contextualized with federal limits, potential health impacts, and other details.

Lastly, community science efforts like those undertaken by the Navajo Nation and AGU Thriving Earth Project represent how testing results are not the only outcome from local sampling efforts. Such projects can increase our understanding of drinking water infrastructure, data literacy, and ultimately self-determination for a resource often left to the hands of corporations and public agencies. Cities can support this work, too—see New York City’s example, where lead testing kits are provided free of charge to residents.

At-home water quality testing, if sampled properly, can provide great insight to your community’s water challenges. But because tests are often conducted at-will by each household—and not sampled methodically across a community—the data doesn't have the quantity of information we need to inform national-level analysis. Still, it can be used to get early warning signs of emerging water quality issues and help regions prioritize where to test and invest.

Financial and demographic data help paint a picture of where investments are occurring. They also characterize the communities served by drinking water utilities. 

While there are various funding streams available to utilities, ranging from local grants to state and federal programs, the most significant one is the State Revolving Fund, or SRF program, split into clean water (e.g., sewage, stormwater, source water) and drinking water, and funded primarily by the federal government. In 2022, Congress appropriated $43 billion in supplemental funding for the Clean Water and Drinking Water State Revolving Funds. Clean water covers infrastructure investments for sewer and stormwater, while drinking water covers water treatment and delivery - for the purposes of this blog we’ll only be discussing the process for the clean water program. The EPA distributes federal SRF funds to states, and each state operates their own SRF programs to distribute funds to communities across the state. The EPA allocates federal Drinking Water SRF funds to states according to formulas based on the Drinking Water Needs and Assessment Survey, or DWINSA, with each state receiving a minimum 1% of federal funds appropriated. The DWINSA is a formulaic process designed to ascertain needs across states and divide resources accordingly. The most recent survey estimates a need of $625 billion over the next twenty years to sufficiently maintain and upgrade our nation's infrastructure, the bulk of which is for replacing and rehabilitating aging infrastructure.  The EPA distributes Clean Water SRF funds in accordance with an allotment formula mandated in the Clean Water Act.

Infographic depicting the “steps” involved in State Revolving Funds (SRFs) allocation.

Just like our efforts to track SRF funding through our SRF Dashboard, EPIC took the initiative to develop a national map of service area boundaries. Due to varying state criteria, scoring methodologies, style, and release timing of these state documents, it remains difficult to make cross-state comparisons. However, the data is useful for understanding how resources are spread across space and issue areas within particular states. Importantly, many states only reveal who they intend to fund. It's not until states submit data on their finalized funding agreements with project applicants  to the EPA, however—sometimes as much as two years after the draft IUP, PPL, and funding lists—that we get the final picture of how these funds are actually distributed. Recently, the EPA made a significant contribution to tracking SRF dollars and released a dashboard providing final project funding data. (If you're curious about the differences between EPA and EPIC’s dashboard, see our recently published explainer here). Despite these limitations, we can start to answer questions like:

- Which communities are applying for SRF assistance, and which communities does a state intend to fund? 
- How much funding does a state intend to allocate for lead service line replacement?
- What funds are going to projects to address emerging contaminants, such as PFAS?
- What proportion of the intended awards are being allocated to state-defined disadvantaged communities (DACs)?

All the data sources mentioned above have one thing in common: they are inherently spatial. Without a robust understanding of where water quality violations occur, or where dollars are being spent, we won’t understand who is impacted and who is benefited. For drinking water, that spatial layer is called the Water Utility Service Area Boundary—basically who gets water from whom. In short, service area boundaries are a map of where one utility ends and another begins. Just like PPLs, service area boundary data is collected by utilities and suffers from similar issues of inconsistencies across states. However, without high-quality service area boundary information, we’re unable to answer key questions about drinking water.

Are there disparities in access to drinking water across demographic groups like race and income?

Just like our efforts to track SRF funding through our SRF Dashboard, EPIC took the initiative to develop a national map of service area boundaries along with our partners, SimpleLab and Internet of Water. Collecting data from direct outreach, publicly available data, and Freedom of Information Act Requests, EPIC assembled a dataset allowing for the various types of data from utilities, consumers, and government agencies, to be combined at the service level or PWSID (a unique service area identifier). By addressing issues at the source—the utility level—we dramatically improve our analysis capacity compared to previous researchers who were restricted to coarser geographies like cities and counties. With a common geospatial unit of analysis that accurately reflects who gets water from where, we can advance our understanding of disparities and prioritize funding to meet goals laid out in initiatives like Justice40—in turn, supporting equitable distribution of federal benefits.

The last step to answer the “who benefits?” question is demographic data. The primary source for this is the US Census Bureau, which conducts basic counts via the decennial cycle—but also produces more detailed data from samples through the American Community Survey (or ACS) at more frequent intervals. These data contain rich information on income, racial breakdowns, employment status, and many other variables. By linking Census data with water utility data through service area boundaries with a process known as crosswalking, we can combine data from sources like SDWIS with detailed information about the people and communities impacted.

Better equipped with these facts on what water data is available and what purposes it serves, we can start applying datasets to questions at different scales. Several examples of major interest to EPIC and our partners include:

  • Select Datasets: Safe Drinking Water Information System, Service Area Boundaries, US. Census.

    Scanlon, et al. have shown how Drinking water quality and social vulnerability linkages exist at the system level in the US. This work heavily relies on SDWIS data to reveal inequalities across demographic groups. For example, researchers found that water utilities serving tribal areas were 3 times more likely to experience a health based violation from 2018 to 2020, compared to the national average of 1 in 10 people served by a utility found to have a health based violation. The disparities found in tribal areas continued, albeit with additional nuance, for Black and Latino/a population.

  • Select Datasets: Safe Drinking Water Information System, EPIC Scraped State Revolving Fund Data, Service Area Boundaries, U.S. Census, Climate and Economic Justice Screening Tool

    Focusing on Texas, EPIC used data from SRF project priority lists and our Service Area Boundary data in conjunction with the Climate and Economic Justice Screening Tool (CEJST) and the Safe Drinking Water Information System (SDWIS), to address what communities have the greatest need and who is, and isn’t receiving funding. To enable the goals of Justice 40 - that 40% of many federal programs go to disadvantaged communities, CEJST identifies census tracts as ‘disadvantaged’ through a methodology and 34 input datasets. Understanding who should get funding from the SRF program is a combination of tools like CEJST, and prevalence of many factors including the frequency of health based violations as indicated in SDWIS. Results indicate that many water utilities which serve large proportions of disadvantaged communities, and have persistent health based violations did not receive funding from the SRF program in FY2022. Furthermore, analysis shows that water utilities which did funding typically served less disadvantaged communities.

  • Largely rural, the Eastern Shore of Maryland has a high proportion of residents on well water—often subject to lower scrutiny than community water systems. The Lower Shore Safe Well Water Initiative is a community testing project which found 1 in 5 households had Nitrate levels which pose a risk to human health. The cause? Over 300 million chickens are produced on the Eastern Shore, largely in Controlled Animal Feeding Operations (CAFOs). Chicken manure contains high amounts of phosphates and nitrates, the latter being a significant drinking water contaminant. In addition to community testing, data researchers used CCRs as a proxy for well water quality and CAFO location data to pinpoint potential problem areas and sources.

Conclusion

While this article doesn’t attempt to provide a comprehensive list of all relevant data sources, it hopefully provides a framework for better categorizing data across entities and sources.

The examples we’ve sketched highlight what data might be utilized, as well as when and how approaching explicit data questions linked to water access and quality can generate insights and minimize gaps. EPIC’s work to date on drinking water data has informed an understanding of where those important data gaps exist, i.e., where we are not collecting data in the right quantity and format. Data like the service area boundaries, for example, do have known shortcomings—particularly for rural, lower-income, and minority service areas. And while we are working to improve this data with partners, other data sources like SDWIS or Lead Service Line Inventories also require significant resources and coordination to improve. Those challenges notwithstanding, advances across the sector are being made to clarify who does and does not have access to safe, affordable, drinking water—and that goal must remain our collective priority.

Are you interested in this work? Want to learn more or give us feedback? Don’t hesitate to reach out!

Previous
Previous

It’s Time to Tap Into Innovation: Federal Challenges Can Help Solve Stubborn Problems 

Next
Next

ICYMI: A Recap of our 2023 Procurement Policy Work