Beyond the Historical Record: How I Simulated Seattle’s Future Climate
Raw weather data is often incomplete or inconsistent. For this project, rather than manually retrieving data, we established an automated pipeline to identify, download, and validate the necessary records.
Using PrecipGen, we automated the station search. We targeted Sea-Tac Airport and filtered for stations within a 25km radius containing at least 15 years of historical data.
import precipgen as pg
# Sea-Tac coordinates
LAT = 47.4502
LON = -122.3088
# Find nearby stations with sufficient data
stations = pg.find_nearby_stations(
latitude=LAT,
longitude=LON,
radius_km=25,
min_years=15
)
# Display results as DataFrame
stations_df = pd.DataFrame(stations)
display(stations_df[['id', 'name', 'distance_km', 'years_available']])
| id | name | distance_km | years_available | |
|---|---|---|---|---|
| 0 | USW00024233 | SEATTLE TACOMA AP | 0.742529 | 78 |
| 1 | USC00454169 | KENT | 6.143065 | 112 |
| 2 | US1WAKG0081 | RENTON 0.5 SSW | 8.503486 | 18 |
| 3 | USW00094248 | RENTON MUNI AP | 8.669373 | 28 |
| 4 | US1WAKG0041 | RENTON 3.6 SSE | 10.578712 | 18 |
The script identified Station USW00024233 (Seattle Tacoma Intl AP) as the optimal source. We downloaded the station's daily records (.dly file) and parsed the text format into a clean dataset.
# Download and parse in one operation
dly_path = pg.download_station(station_id, cache_dir='data')
parser = pg.GHCNParser(dly_path)
ghcn_data = parser.parse_dly_file(dly_path)
# Extract precipitation and convert to mm
precip_data = parser.extract_precipitation(ghcn_data)
print(f"Extracted {len(precip_data)} daily records.")
To ensure data integrity, we ran a Quality Check using the library's standard validator. This assessed the dataset's completeness to prevent simulation errors caused by missing values.
# Built-in quality validation with fallback logic
validator = pg.DataValidator(pg.QualityPresets.standard())
quality_report = validator.assess_with_fallback(
precip_data,
quality_flags=ghcn_data['quality_flag'],
site_id=station_id
)
print(f"Data Completeness: {quality_report.completeness_percentage:.1f}%")
print(f"Is Acceptable: {quality_report.is_acceptable}")
2025-12-22 16:27:58,078 - precipgen.data_validator - INFO - assess_data_quality:375 - Starting comprehensive quality assessment for USW00024233
2025-12-22 16:27:58,084 - precipgen.data_validator - INFO - assess_data_quality:464 - Data quality assessment passed for USW00024233
2025-12-22 16:27:58,085 - precipgen.data_validator - INFO - assess_with_fallback:529 - Data acceptable with 'permissive' quality standards
Data Completeness: 100.0%
Is Acceptable: True
After the data passed the validation check, we visualized the summary statistics to confirm the precipitation patterns were consistent with the region before saving the cleaned CSV file.
| Time series plots showing daily and annual historical total precipitation. |
| Metric | Value | |
|---|---|---|
| 0 | Total Records | 28,443 |
| 1 | Wet-day count | 12,113 (42.59%) |
| 2 | Dry-day fraction | 57.41% |
| 3 | Mean | 6.32 |
| 4 | Std | 7.97 |
| 5 | Min | 0.30 |
| 6 | 25% | 1.30 |
| 7 | 50% | 3.60 |
| 8 | 75% | 8.40 |
| 9 | Max | 127.50 |
Parameter Estimation and Trend Analysis
Once the data was prepared, we used the AnalyticalEngine to calculate stochastic parameters for each month. The engine identifies two primary patterns:
The Markov Chain: The probability of precipitation occurring given the previous day's state (wet/dry), which captures storm persistence.
The Gamma Distribution: The distribution of rainfall intensity on wet days.
# Initialize analytical engine with 0.1mm threshold
engine = pg.AnalyticalEngine(precip_data, wet_day_threshold=0.1)
# Calculate Markov chain probabilities and Gamma parameters for each month
monthly_params = engine.calculate_monthly_parameters()
# Visualize the climate "fingerprint"
viz.plot_monthly_parameters(monthly_params)
| Charts showing P(Wet|Wet) and Alpha/Beta |
The resulting charts act as a statistical profile of the location, distinguishing between the high persistence of winter storms and the fleeting nature of summer precipitation.
To account for potential climate shifts, we performed a Trend Analysis using a 30-year sliding window. This determines if annual rainfall or storm frequency is changing over time, allowing us to decide whether to simulate a stationary past or a non-stationary future.
# Perform 30-year sliding window analysis
window_analysis = engine.perform_sliding_window_analysis(window_years=30)
trend_results = engine.extract_trends(window_analysis)
# Display trend significance by season
for season, metrics in trend_results.seasonal_slopes.items():
print(f"\nSeason: {season}")
for param, slope in metrics.items():
sig = trend_results.trend_confidence[season].get(param, 'N/A')
print(f" {param:<10 code="" f="" sig="" slope:="" year="">10>
Finally, we consolidated these statistical probabilities, intensities, and trends into a single JSON "manifest" to drive the simulation engine.
Monte Carlo Simulation
In this phase, we input the parameter manifest into the SimulationEngine. To quantify uncertainty and risk, we ran a Monte Carlo Simulation, generating 1000 independent realizations of the coming year using parallel processing for efficiency.
import multiprocessing as mp
from datetime import datetime
# Configuration
n_realizations = 1000
start_date = datetime(2025, 1, 1)
days_to_simulate = 365
# Prepare parallel tasks
tasks = [
(manifest, False, i, start_date, days_to_simulate)
for i in range(n_realizations)
]
# Execute on multiple cores
with mp.Pool(processes=mp.cpu_count() - 1) as pool:
results_list = pool.map(io.run_simulation_worker, tasks)
# Assemble into DataFrame
dates = pd.date_range(start=start_date, periods=days_to_simulate, freq='D')
mc_df = pd.DataFrame(
dict(zip([f"Run_{i}" for i in range(n_realizations)], results_list)),
index=dates
)
Above: The envelope plot. Each gray line represents one possible future. The blue band displays the 10th-to-90th percentile range, offering a clear visualization of uncertainty.
Using 100 years of synthetic data allows for the analysis of extremes that may be rare in the historical record. We examined the distribution of total annual rainfall to understand the spread between drought conditions and high-precipitation years.
Above: The distribution of total rainfall across 100 simulated years.
We verified the model's performance by checking seasonality patterns and analyzing extreme storm risks.
Above: Seasonality check—confirming the model captures the region's wet/dry cycle.
Above: Risk analysis showing the probability range for the year's most intense storm.
We also evaluated a comparison of stochastic simulation methods:
Long-term Precipitation Simulation Method Comparison
To properly evaluate precipitation simulation methods, we conducted an extensive comparison using 100-year simulations (10 runs each) to capture the full range of extreme events that occur in the 78-year historical record. This approach addresses a critical limitation of short-term simulations: they cannot adequately represent rare extreme events that may occur only once every few decades.
Methods Compared
- WGEN (Stationary): Traditional Markov-chain gamma model with fixed parameters
- WGEN + Trends (Non-stationary): Enhanced model incorporating long-term parameter evolution
- Random Bootstrap: Direct resampling from historical data (baseline reference)
Performance Metrics
| Method | Annual Total (mm) |
Bias vs Historical |
Annual Max (mm) |
Bias vs Historical |
Extreme Max (mm) |
Extreme Bias |
|---|---|---|---|---|---|---|
| Historical | 981.7 ± 165.5 | — | 50.9 ± 17.7 | — | 127.5 | — |
| WGEN | 912.9 ± 133.2 | -7.0% | 45.0 ± 11.5 | -11.5% | 106.1 | -16.8% |
| WGEN + Trends | 867.7 ± 141.3 | -11.6% | 47.6 ± 12.8 | -6.5% | 134.6 | +5.6% |
| Random Bootstrap | 979.5 ± 163.8 | -0.2% | 51.3 ± 18.5 | +0.8% | 127.5 | 0.0% |
Findings
Extreme Event Reproduction: The most significant finding is that WGEN + Trends successfully captures and even exceeds historical extreme events (134.6 mm vs 127.5 mm historical maximum), while standard WGEN significantly underestimates extremes (-16.8%). This demonstrates the critical importance of incorporating non-stationary behavior for extreme event modeling.
Statistical Fidelity: Random Bootstrap provides near-perfect reproduction of historical statistics (as expected), serving as our benchmark. The coefficient of variation analysis shows:
- Random Bootstrap: CV ratio = 0.99 (annual totals), 1.04 (annual maxes)
- WGEN + Trends: CV ratio = 0.97 (annual totals), 0.77 (annual maxes)
- Standard WGEN: CV ratio = 0.87 (annual totals), 0.73 (annual maxes)
Simulation Length Impact: This analysis demonstrates why short-term simulations (1 year) fail to capture extreme events. The 100-year simulations reveal that WGEN + Trends can generate events beyond the historical record through its dynamic parameter evolution, with parameters changing by 20-65% over the simulation period.
Practical Applications
Recommendation Summary:
- Climate Impact Studies: Use Random Bootstrap for most accurate historical representation
- Extreme Event Analysis: Use WGEN + Trends with long simulations (50+ years) to capture rare events
- Future Climate Projections: WGEN + Trends is the only method capable of generating novel extreme events beyond historical observations
- Risk Assessment: Consider ensemble approaches combining multiple methods for comprehensive uncertainty quantification
The analysis encompassed 1,000 simulated years per method (10 runs × 100 years each), providing robust statistical power to evaluate rare extreme events that define flood risk and infrastructure design criteria.
Future Projections
Finally, we enabled trend_mode=True to simulate 50 years into the future, extending the climate trends identified in the analysis phase.
# Initialize future climate simulation with trends
sim_future = pg.SimulationEngine(manifest, trend_mode=True, random_seed=99)
start_future = datetime(2025, 1, 1)
sim_future.initialize(start_future)
# Simulate 50 years of evolving climate
days = 50 * 365
future_values = [sim_future.step() for _ in range(days)]
future_dates = pd.date_range(start=start_future, periods=days, freq='D')
future_series = pd.Series(future_values, index=future_dates)
# Visualize long-term trends
viz.plot_future_projection(future_series)
viz.plot_seasonal_trends(future_series)
Above: A 50-year projection showing the drift in rainfall patterns over time.
Above: Breakdown of future trends by season.
Model Verification
To validate the model, we compared a 100-year synthetic history (using trend_mode=False) against the original 78-year historical record. Three specific tests were conducted:
# Generate synthetic "history" for validation
years_to_sim = 100
sim = pg.SimulationEngine(manifest, trend_mode=False, random_seed=42)
sim.initialize(datetime(2000, 1, 1))
synth_values = [sim.step() for _ in range(years_to_sim * 365)]
synth_dates = pd.date_range(start='2000-01-01', periods=len(synth_values), freq='D')
synth_series = pd.Series(synth_values, index=synth_dates)
# Run validation tests
viz.plot_validation_climatology(hist_series, synth_series) # Seasonality
viz.plot_validation_dry_spells(hist_series, synth_series) # Drought persistence
viz.plot_validation_qq(hist_series, synth_series) # Intensity distribution
1. Seasonality Test
We overlaid synthetic monthly averages (blue line) against historical data (gray bars) to verify that the model correctly reproduces the region's seasonal characteristics.
Above: The synthetic data tracks historical seasonality, capturing the wet winters and dry summers.
2. Drought Persistence Test
We verified the model's ability to simulate dry spells by comparing the frequency of drought durations (e.g., 2-day, 10-day, 20-day) using a log-scale plot.
Above: The Dry Spell Duration plot. The overlap indicates accurate reproduction of drought persistence.
3. Intensity Test (Q-Q Plot)
A Quantile-Quantile (Q-Q) plot was used to compare the distribution of historical storms versus synthetic ones. Alignment with the red diagonal line confirms a distributional match.
Above: The Q-Q Plot. Points aligned with the red line indicate the synthetic storm intensity distribution matches historical records.
Advantages for PrecipGen Applications
- Automated Quality Control: Built-in data validation prevents simulation failures from incomplete datasets
- Parallel Processing: Monte Carlo simulations scale efficiently across multiple CPU cores
- Trend-Aware Modeling: Distinguishes between stationary historical patterns and non-stationary future projections
- Comprehensive Validation: Three-tier testing (seasonality, persistence, intensity) ensures model fidelity
Recommendations for Enhancement
- Spatial Interpolation: Extend to multi-station networks for regional coverage
- Sub-daily Resolution: Incorporate hourly precipitation patterns for flood modeling
- Extreme Value Theory: Add specialized handling for rare events (>100-year storms)
- Ensemble Forecasting: Interface with numerical weather prediction models
- Point-Scale Only: Current implementation assumes spatial homogeneity
- Linear Trends: Climate change may exhibit non-linear acceleration
- Independence Assumption: Does not model teleconnections (El Niño, PDO)
- Validation Period: 78-year record may not capture full climate variability
Conclusion
This case study demonstrates PrecipGen's capability to transform raw weather station data into actionable stochastic simulations. The library's automated workflow (from station discovery through Monte Carlo analysis) reduces the technical barrier for climate risk assessment while maintaining scientific rigor through comprehensive validation protocols.
For operational applications, the combination of parallel processing, trend analysis, and uncertainty quantification provides a robust foundation for infrastructure planning, agricultural modeling, and hydrological forecasting. The modular design allows researchers to extend functionality while preserving the core statistical framework.
View Source Code & Documentation →
Comments
Post a Comment