|
Persistent Identifier
|
doi:10.13021/ORC2020/FDBTQM |
|
Publication Date
|
2026-03-27 |
|
Title
| Replication Data: Long-term criteria and toxic pollutants trends and community exposures over the Marcellus Shale region in the U.S. |
|
Author
| Baek, Bok Haeng (George Mason University) - ORCID: 0000-0003-1054-6325 |
|
Point of Contact
|
Use email button above to contact.
Baek, Bok Haeng (George Mason University) |
|
Description
| This dataset supports a comprehensive assessment of long-term air quality impacts associated with unconventional oil and gas development (UOGD) in the Marcellus Shale region of the United States, covering Pennsylvania, West Virginia, and Ohio from 2002–2021. The project integrates emissions inventory development, ambient monitoring data, chemical transport modeling, and deep learning–based simulations to evaluate trends in criteria air pollutants (CAPs) and hazardous air pollutants (HAPs) and to assess potential community exposure disparities. The dataset includes processed emissions inventories, modeled air pollutant concentrations, and supporting data products generated using the Community Multiscale Air Quality–based modeling system Comprehensive Air Quality Model with Extensions (CAMx) and the deep learning chemical transport modeling framework DeepCTM. The integrated modeling framework was used to evaluate the influence of UOGD-related emissions on ozone (O₃), nitrogen dioxide (NO₂), formaldehyde (HCHO), and PM₂.₅, and to characterize long-term spatial and temporal patterns of air quality. These data products enable replication of key analyses in the associated study and support further research on emissions trends, atmospheric chemistry, and community exposure assessments in regions impacted by unconventional energy development. The archived materials include processed datasets, metadata, and documentation necessary to interpret and reproduce the modeling results. |
|
Subject
| Earth and Environmental Sciences |
|
Related Publication
| https://doi.org/10.5194/essd-15-5261-2023 doi 10.5194/essd-15-5261-2023 https://doi.org/10.5194/essd-15-5261-2023 |
|
Producer
| Bok H. Baek (George Mason University) (GMU) https://cser.science.gmu.edu |
|
Production Date
| 2026-03-01 |
|
Production Location
| George Mason University |
|
Contributor
| Project Leader : Bok H. Baek |
|
Funding Information
| Health Effects Institute: https://www.healtheffects.org |
|
Distribution Date
| 2026-03-27 |
|
Depositor
| Baek, Bok Haeng |
|
Deposit Date
| 2026-03-19 |
|
Time Period
| Start Date: 2002-01-01 ; End Date: 2021-12-31 |
|
Date of Collection
| Start Date: 2024-05-01 ; End Date: 2025-12-31 |
|
Data Type
| Air quality datasets for O3, PM2.5 and Toxics (SBTEX) |
|
Series
| # Dataset Description This dataset contains air quality model outputs and machine learning model predictions developed for the HEI-funded study on long-term air pollutant simulations over the Marcellus Shale region. All files are compressed and split using tar and pigz. Each folder contains an `extract.sh` script to reconstruct and extract the corresponding `.tar.gz` archive. To extract the data, navigate to a folder (e.g. ./Package_CAx_HAPs) and run: chmod +x extract.sh ./extract.sh If `pigz` is not available on your system, the script will automatically fall back to standard gzip decompression. # Data Contents - CAMx_HAPs SBTEX hazardous air pollutant (HAPs) concentration data simulated using CAMx RTRAC (IOAPI NetCDF format). - CAMx_CAPs Criteria air pollutant (CAPs) concentration outputs from CAMx (IOAPI NetCDF format). - DCTM_CAPs CAPs predictions from the DeepCTM model (NumPy array format, *.npy). - DCTM_HAPs Benzene concentration predictions from the DeepCTM model (NumPy array format, *.npy). --- # Data Format Notes - NetCDF files follow the IOAPI convention commonly used in air quality modeling. - NumPy arrays (*.npy) are structured for machine learning model input/output. - Data are organized by pollutant category and modeling framework (CAMx vs DeepCTM). HEI_Data_Pack_for_GMU_Dataverse ├── Package_CAMx_CAPs │ ├── CAMx_CAPS.tar.gz.part-aa │ ├── CAMx_CAPS.tar.gz.part-ab │ ├── CAMx_CAPS.tar.gz.part-ac │ ├── CAMx_CAPS.tar.gz.part-ad │ ├── CAMx_CAPS.tar.gz.part-ae │ ├── CAMx_CAPS.tar.gz.part-af │ ├── CAMx_CAPS.tar.gz.part-ag │ ├── ...... │ └── extract.sh ├── Package_CAMx_HAPs │ ├── CAMx_HAPS.tar.gz.part-aa │ ├── CAMx_HAPS.tar.gz.part-ab │ ├── CAMx_HAPS.tar.gz.part-ac │ ├── CAMx_HAPS.tar.gz.part-ad │ ├── CAMx_HAPS.tar.gz.part-ae │ └── extract.sh ├── Package_DCTM_CAPs │ ├── DCTM_CAPS.backup.tar.gz.part-aa │ ├── DCTM_CAPS.backup.tar.gz.part-ab │ ├── DCTM_CAPS.backup.tar.gz.part-ac │ ├── DCTM_CAPS.backup.tar.gz.part-ad │ ├── DCTM_CAPS.backup.tar.gz.part-ae │ ├── ...... │ └── extract.sh ├── Package_DCTM_HAPs │ ├── DCTM_HAPS.backup.tar.gz.part-aa │ ├── DCTM_HAPS.backup.tar.gz.part-ab │ ├── DCTM_HAPS.backup.tar.gz.part-ac │ ├── ...... │ └── extract.sh └── READ.me # Data Availability Statement The dataset includes model outputs (CAMx simulations), machine learning predictions (DeepCTM), and processed data files used in the analysis. All data included in this repository are available to the public for research purposes. The data are provided in standard formats (IOAPI NetCDF and NumPy arrays) to facilitate reuse. Due to the large size of the dataset and storage constraints, intermediate files, raw emission inventories, and certain preprocessing scripts are not included in this repository. These materials may be made available upon reasonable request to the corresponding author. --- # Intended Use and Limitations This dataset is intended for research and educational purposes, particularly for studies related to air quality modeling, emissions analysis, and machine learning applications in atmospheric science. Uncertainties in the dataset may arise from: - Emissions inventory limitations, particularly for hazardous air pollutants (HAPs) - Spatial and temporal resolution constraints - Model assumptions in chemical transport and machine learning frameworks Users are advised to consider these limitations when interpreting the data. # License and Data Use This dataset is provided for non-commercial research and educational use only. Proper citation of the associated study is required in any publications or presentations. |
|
Software
| CAMx & DeepCTM, Version: 7.0 |
|
Related Material
| DeepCTM (AI Chemical Transport Model) Emulator trained based on CAMx version 7.0 |
|
Other Reference
| https://doi.org/10.1016/j.atmosres.2021.105919 |