We Are Still Hunting for Treasure in a Sea of Environmental Information   

Collecting sea shells is my favorite activity when I go to the beach. In the midst of searching for treasure, I often find myself miles down the beach with my arms full of shark teeth, small coins, and shells of all shapes and sizes to add to my collection. In data science, I noticed that finding datasets to answer some of our most pressing questions is similar to collecting sea shells. Sometimes you find “buried treasure” and locate a well-documented dataset that holds the key to answering your question. But most of the time, you’re sifting through rubble and seaweed and coming up completely empty handed. Data are either not available in an easily accessible format, deeply embedded in multiple websites, private, not at the right scale (temporal, spatial, or spectral), outdated, or all of the above. For example, the National Land Cover Database is updated every 10 years (and is therefore outdated at the time of publishing), which means the data are really only fit for large-scale analyses at the state or national level. Barriers such as these make it impossible to answer some of the most fundamental questions about our nation’s environmental progress or whether certain communities have access to basic resources. 

To increase our chances of finding buried treasure and accelerate positive environmental outcomes, we need to invest in data and data policy. We currently see $32.3 billion of the Fiscal Year 2023 budget going towards water resources, $50.9 billion towards agriculture, and $25.9 billion towards conservation and land management. But without investments in data, digital infrastructure, and data-oriented teams, it’s difficult to: 1) properly prioritize where to invest, 2) optimize how we work, 3) evaluate if we were successful, and 4) make timely corrections if we weren’t. From our work at EPIC, here are some examples of hard-hitting policy questions that are not possible to answer due to poor data quality:

Question: Does everyone have access to clean drinking water? 

  • Although this may seem like a simple question, we actually don’t have an accurate dataset that shows where everyone gets their drinking water from. We don’t know who gets water from a private well, and for the majority of Americans that are served by a water utility, the boundaries of who that water utility serves are often not publicly reported. So, we have to provisionally match them to jurisdictional boundaries or estimate them from population data. In addition to having an incomplete understanding of where folks get water from, water quantity and quality data are also fragmented across 25 different federal entities on 57 data platforms that hold 462 different data types. With this disparate landscape of water data and inconsistencies with the quality of data that are reported, it’s challenging to collate all of the information that does exist to determine which communities lack access to safe drinking water. Considering the possible 12.8 million toxic lead service lines that exist in unknown locations across the United States – this information is absolutely critical. We need to invest in quality water data to not only meet the promise of clean drinking water for all, but also to inform climate disaster planning and identify communities impacted by water quality violations. 


Question: Are we achieving no net loss of wetlands? 

  • Wetlands provide a variety of ecosystem services – recreation in coastal wetlands alone has an annual economic value estimated at nearly $20-60 billion. While the Army Corps of Engineers is responsible for ensuring no net loss of wetlands, the data for this spans many agencies and falls into three broad categories: 1) where the wetlands are - this data is managed by the Fish and Wildlife Service and is 40-50 years old in some states, and could cost upwards of $10 million per state to update, 2) where impacts to wetlands are located - this data comes from several agencies and is not centrally located, and 3) where restoration projects are occurring - this data also comes from several agencies and is also not centrally located. What’s more is that our research has shown it takes more than three years on average to approve restoration projects, in part due to fragmented IT systems for permitting and permissions on projects. Wetland activities logged from NEPA could help track impacts and restoration efforts, but data are deeply buried within PDFs and fragmented across multiple agencies (DOI, USACE, DOT, EPA, and more). While NEPAccess has made environmental impact statements searchable, extracting quantitative data from PDFs and combining data across federal agencies requires extensive technical development. We need to be investing in processes that create high-quality geospatial data to begin with, and not further perpetuate the need for complex data scraping exercises. Increasing interoperability and strengthening connections across government platforms to collect all wetland data (or even all environmental data), would help us better evaluate the health of our most sensitive ecosystems, as well as protect and restore them for humans and ecosystems to thrive. 


Question: Are we delivering on the Justice40 initiative?

  • The Justice40 initiative introduced in 2021 requires that 40% of federal benefits flow towards disadvantaged communities that are marginalized, underserved, and overburdened by pollution. While there were some prior screening tools to assess environmental justice, many more have been created in the wake of Justice40. However, there is inconsistency with how disadvantaged communities are defined and it is even more difficult for community members and program administrators to navigate this ecosystem of resources (for example, what tool should be used and when?). Moreover, the reporting data that would be required to track progress on this initiative takes substantial effort to locate and extract. Data are often reported at different spatial scales (e.g. the location of the office that applied versus where the work is happening, or the point location of a project rather than the full extent of the project) and the lack of a standardized unit for reporting makes it impossible to evaluate whether disadvantaged communities that this initiative was aimed to serve are actually getting the resources they were promised. Eventually, we’ll be asking where did we spend all that money, and was it where we wanted to spend it? The time is now to invest in the standards, guidance and systems needed to effectively evaluate where money is being spent, and which communities are benefiting. 

Question: Are we successfully managing invasive species? 

  • Invasive species cost the U.S. approximately $26 billion/year and have contributed to the extinction of nearly 31.5% of the listed species that have disappeared since the 1500s. With an extinction rate of possibly three species/hour, time is of the essence to curb a looming sixth mass extinction. While multiple agencies manage for invasives, differences in data standards and long-term monitoring data are exceptionally challenging to collate, standardize, and compare. As a result, it's difficult to understand what kind of invasive species management activities are occurring and where. For example, unreliable data makes it impossible to evaluate whether watercraft decontamination stations have been effective in preventing the spread of invasive quagga and zebra mussels, which have impacted nearly every major river basin in the U.S. The adoption of new technology (such as automated eDNA monitoring stations), and interagency coordination for a centralized invasive species database would aid the construction of better management strategies moving forward. 


We’ve been searching for buried treasure to answer these important questions, but all we’ve been finding is seaweed and rubble. These are just four of the dozens of pressing policy questions that cannot be adequately answered due to poor data management from the local to national-scale. Although building quality data takes time and effort, investing in data infrastructure, strong data-oriented teams, and the construction of well-maintained systems is necessary to accelerate equitable environmental progress. Answering these questions would help us address climate justice issues and ensure the preservation of our ecosystems for all future generations to enjoy.

Previous
Previous

President Biden Delivers on Promise to Replace All Toxic Lead Pipes

Next
Next

Key Takeaways: CEQ’s First-Ever Environmental Permitting Technology & Data Summit