It is possible to work quickly, iteratively and in community to tackle gaps in water data.

Let’s bust the myth that environmental data is “too complicated and expensive” - Water Service Area Boundaries are just one example of how the quality and affordability of creating and maintaining this data has significantly changed in the last five years.


I’ve spent the better part of my career using data to plan, evaluate and monitor environmental programs - very early I noticed, as many do, that the vast majority of information I had available to me was out-of-date and often not available at a scale that was actually useful to the level of decision-making it was being applied for. For example, land use and impervious data that I used to inform engineering plans to adapt for climate change in Massachusetts was 15 years out of date; the wetlands data I used for planning restoration projects varied from 10-30 years out of date depending on where in the country we were working; and most recently, data to show which communities lack access to safe drinking water did not exist for 38 states, even though I had several government agency staff mention they had wanted this data for their entire career (some mentioned wanting this since 1994!). 


A little less than a year ago, I knew nothing about water utility service area boundaries - the mouthful description of the dataset that basically indicates who serves drinking water to who within the US. When colleagues highlighted the importance of this dataset to contextualize communities that should be prioritized for water infrastructure improvements, track Justice40 spending and adapt to a changing climate, my team set out to develop this data.


In eight months, we built a high quality understanding of where 157 million people – i.e. more than half of the US population – gets their water from. In the first two months of the project, we built the first unified national approximation of this data in partnership with SimpleLab and Internet of Water. Following the initial release, SimpleLab added four additional states data (Arkansas, Rhode Island, Utah and Illinois), and EPIC then continued to improve through targeted outreach to water systems that served over 100,000 people and were not in states that maintained service area boundaries. Within three months of outreach following our initial data publication, we have now added boundaries for 50 large water systems which results in a more precise understanding of water service for ~17 million people! And in the next two months, our partner, the Internet of Water, will release a simple, low-cost, easy-to-use tool for communities who want to create and share this data layer (stay tuned). This information is essential for federal agencies, state agencies, tribes and local systems (stay tuned for additional blogs this fall).

This map shows the quality of service area boundary data per state, based on what tier the majority of the water systems in the state are- see the full dataset here.

I’m proud of how much we’ve been able to accomplish in this short amount of time. This speed was possible because of some of the principles we followed to demonstrate the value - and reality/possibility - of quick, iterative, open and community-oriented data development. 

  1. Multidisciplinary team: As the saying goes: if you want to go fast, go alone; if you want to go far, go together. We built strong collaborations at the onset of this project such that our core team was composed of water and policy experts, programmers, statisticians and community liaisons. This multidisciplinary team set the foundation for us to move with the speed, accuracy, intention and pace that we did. 

  2. Stakeholder engagement:  It is imperative to take the time to engage with folks throughout the project and not develop critical datasets in isolation. At the onset of the project and throughout the development of the dataset, we met with ~150 stakeholders across academia, technology companies, government agencies and community groups that 1) had already worked on this issue, 2) could provide insight into the development, or 3) would be impacted by the use of the data. We knew we didn't have all the answers (and likely never will!) - we wanted and needed input from others. Not only does an engaged community of practice lead to a better and more accurate national dataset, it also creates the foundation for improvements and maintenance of the data over time. 

  3. User-centered and FAIR data: Through the methodology and publication of this data, we followed FAIR principles - Findable, Accessible, Interoperable, and Reusable, but also took it one step further to embody user-centered design principles. This builds upon the themes mentioned above, but in practice meant that we made the data and insights from the data available in a variety of methods so that, irrespective of data and technology familiarity, users or beneficiaries of the information would be able to access the aspects of our work that were most useful to them. From interactive websites with the data, the ability to download it all for free, and write-ups of the uses and insights from the data - we tried to, and will continue to, make this information as easily accessible as possible to the plethora of stakeholders that could use it. 

  4. Continuous improvement: We know this first map is not perfect, but it was a starting point for an iterative, continuous process to develop this data. By publishing the data along the way (with clear documentation!), we then had folks across the US point us to improvements that we could quickly incorporate and share back out. We developed a three-tier system with Tier 1 being the highest quality data and Tier 3 needing the most improvement - by creating an simple understandable ranking system, users of the data can use their discretion for what is best suited to their analysis, AND we are easily able to track progress towards how we have improved the dataset. For reference, we started data development in February and when we first released the data in April, we had 12 states that represented ~40% of the population (~122 million people) in Tier 1. Five months later, we now added 5 additional states and ~70 of the largest water systems in the US to expand Tier 1 coverage by ~50 million people . And given the open, iterative nature of development, we know this will continue to improve and build an ecosystem of engaged contributors along the way. 



There is immense value in iterative, quick development of these data that have a profound impact on our ability to manage natural resources and provide safe drinking water to all. Without better quality information easily available to agency staff and policy makers, we will keep investing in the places that have the resources to get themselves on the map in the first place, and those that have historically been excluded will continue to be left behind. We can and we must do better. With sound, efficient design and data development principles - what else can we accomplish together?

You can read more about our Technology Principles here, and explore more about the service area boundaries project here

Support for this work was received from the Bezos Earth Fund.

Previous
Previous

Intersection of Technology and Water: Using Technology to Speed Up Lead Pipe Replacement

Next
Next

Lead-Free Water Challenge: What We Learned About How To Prepare Your Inventory Without Knowing Everything Yet