New Data to Answer Questions about Drinking Water Access, Affordability, and Quality
Using spatial data to identify disparities in water user experience. Want Access? See our Github for methods and data.
Roughly 300 million people in the United States (almost 90% of the population) are served by one of nearly 50,000 community water systems. However, these water systems are not all providing the same level of service. For example, did you know systems with over 25% Native American users typically have a history of experiencing 31 more overall violations compared to systems with smaller Native American populations?
Using an early map of water systems, peer reviewed analysis found “higher proportions of minority groups are more likely to experience initial and recurring drinking water violations.” While salient, this insight required omitting 42% of drinking water systems and many tribal systems due to insufficient data quality from publicly available sources.
With modeled drinking water system boundaries released by the EPA, we can now begin to include the omitted 42% of systems and guide decisions about drinking water investment prioritization for a wider swath of drinking water system users. To lower the barrier to entry for analysis we harmonized socioeconomic and environmental data with drinking water boundaries, free of use to the research community. With this more complete dataset, we are enabling an improved data-driven assessment of drinking water user experience.
Comparing Apples and Oranges
Building the bridge between water utility data and community characteristics proved to be complex, and we found ourselves comparing apples to oranges. Water systems can serve a fraction of a rural neighborhood or a sprawling urban center, such that some estimation methods are more suitable than others. For example, most of Houston, TX is served by a single water system (see figure; green polygon), but there are quite a few systems that cover a single neighborhood (see figure; blue polygon). In addition, since we lack a reliable number that tells us how many people a water system serves, we don’t have a benchmark we can use for comparing methods - just a couple of different estimates.
But, the good news is we did the hard data work of turning oranges to apples so you don’t have to. To handle the diverse sizes of water system boundaries, our method intersects and weights data for larger systems and uses a block-parcel crosswalk for smaller systems. Using a tiered approach, we can tailor our methods for a specific water system for apples-to-apples comparisons. With this process, we estimated 78 socioeconomic variables for 98.82% of community water systems across the U.S. In addition to socioeconomic data, we also crunched drinking water violations and utility information from the Safe Drinking Water Information System (SDWIS), and the watersheds of wells and intake locations for utilities. This dataset is available on GitHub, along with a public function that pulls in a user-generated spreadsheet to automatically crosswalk any kind of census variable to water system boundaries.
Dive Deeper with Us
If you’re still here and interested in diving deeper, the first tier of our method interpolates data using census block populations or houses (depending on the specific census variable) as weights to address the non-uniformity of communities across a landscape. Our second tier uses a block-parcel crosswalk developed by BlueConduit to weight census data to water system boundaries. Considering we found this method often overestimated populations in comparison to other methods when scaled to the tract level (the scale at which most of our census variables are available), we capped total populations to those reported by SDWIS, and scaled count data appropriately.
In our dataset, 91.72% of water system boundaries are estimated using weighted interpolation (tier one), and 7.09% are estimated using a block parcel crosswalk (tier two; 4.61% of these were capped to the population reported by SDWIS). Only 1.18% of water system boundaries could not be estimated with the methods described above.
Help Improve this Dataset!
We’re interested in hearing from you and other data geeks in the drinking water community to improve this method and resulting dataset. Elucidating the characteristics of communities served by water systems is critical for identifying gaps in systems providing safe, reliable drinking water service and taking steps to address those gaps. If you’re interested in submitting feedback or contributing to this project, please get in touch.