Beyond the Historical Record: How I Simulated Seattle’s Future Climate

This case study demonstrates the end-to-end application of PrecipGen for stochastic weather simulation. We examine the workflow for generating synthetic daily precipitation data for Seattle-Tacoma International Airport, covering automated data acquisition, statistical parameter estimation, trend analysis, and model validation against historical records.

Raw weather data is often incomplete or inconsistent. For this project, rather than manually retrieving data, we established an automated pipeline to identify, download, and validate the necessary records.

Using PrecipGen, we automated the station search. We targeted Sea-Tac Airport and filtered for stations within a 25km radius containing at least 15 years of historical data.

Code Example: Automated Station Discovery

import precipgen as pg

# Sea-Tac coordinates
LAT = 47.4502
LON = -122.3088

# Find nearby stations with sufficient data
stations = pg.find_nearby_stations(
    latitude=LAT, 
    longitude=LON, 
    radius_km=25, 
    min_years=15
)

# Display results as DataFrame
stations_df = pd.DataFrame(stations)
display(stations_df[['id', 'name', 'distance_km', 'years_available']])

	id	name	distance_km	years_available
0	USW00024233	SEATTLE TACOMA AP	0.742529	78
1	USC00454169	KENT	6.143065	112
2	US1WAKG0081	RENTON 0.5 SSW	8.503486	18
3	USW00094248	RENTON MUNI AP	8.669373	28
4	US1WAKG0041	RENTON 3.6 SSE	10.578712	18

The script identified Station USW00024233 (Seattle Tacoma Intl AP) as the optimal source. We downloaded the station's daily records (.dly file) and parsed the text format into a clean dataset.

Code Example: One-Line Data Download & Parse

# Download and parse in one operation
dly_path = pg.download_station(station_id, cache_dir='data')
parser = pg.GHCNParser(dly_path)
ghcn_data = parser.parse_dly_file(dly_path)

# Extract precipitation and convert to mm
precip_data = parser.extract_precipitation(ghcn_data)
print(f"Extracted {len(precip_data)} daily records.")

To ensure data integrity, we ran a Quality Check using the library's standard validator. This assessed the dataset's completeness to prevent simulation errors caused by missing values.

Code Example: Automated Quality Assessment

# Built-in quality validation with fallback logic
validator = pg.DataValidator(pg.QualityPresets.standard())
quality_report = validator.assess_with_fallback(
    precip_data,
    quality_flags=ghcn_data['quality_flag'],
    site_id=station_id
)

print(f"Data Completeness: {quality_report.completeness_percentage:.1f}%")
print(f"Is Acceptable: {quality_report.is_acceptable}")

2025-12-22 16:27:58,078 - precipgen.data_validator - INFO - assess_data_quality:375 - Starting comprehensive quality assessment for USW00024233
2025-12-22 16:27:58,084 - precipgen.data_validator - INFO - assess_data_quality:464 - Data quality assessment passed for USW00024233
2025-12-22 16:27:58,085 - precipgen.data_validator - INFO - assess_with_fallback:529 - Data acceptable with 'permissive' quality standards
Data Completeness: 100.0%
Is Acceptable: True

After the data passed the validation check, we visualized the summary statistics to confirm the precipitation patterns were consistent with the region before saving the cleaned CSV file.

Time series plots showing daily and annual historical total precipitation.

Summary statistics (3-9 are specific to wet days):

	Metric	Value
0	Total Records	28,443
1	Wet-day count	12,113 (42.59%)
2	Dry-day fraction	57.41%
3	Mean	6.32
4	Std	7.97
5	Min	0.30
6	25%	1.30
7	50%	3.60
8	75%	8.40
9	Max	127.50

Parameter Estimation and Trend Analysis

Once the data was prepared, we used the AnalyticalEngine to calculate stochastic parameters for each month. The engine identifies two primary patterns:

The Markov Chain: The probability of precipitation occurring given the previous day's state (wet/dry), which captures storm persistence.
The Gamma Distribution: The distribution of rainfall intensity on wet days.

Code Example: Statistical Parameter Extraction

# Initialize analytical engine with 0.1mm threshold
engine = pg.AnalyticalEngine(precip_data, wet_day_threshold=0.1)

# Calculate Markov chain probabilities and Gamma parameters for each month
monthly_params = engine.calculate_monthly_parameters()

# Visualize the climate "fingerprint"
viz.plot_monthly_parameters(monthly_params)

Charts showing P(Wet|Wet) and Alpha/Beta

The resulting charts act as a statistical profile of the location, distinguishing between the high persistence of winter storms and the fleeting nature of summer precipitation.

To account for potential climate shifts, we performed a Trend Analysis using a 30-year sliding window. This determines if annual rainfall or storm frequency is changing over time, allowing us to decide whether to simulate a stationary past or a non-stationary future.

Code Example: Climate Trend Detection

# Perform 30-year sliding window analysis
window_analysis = engine.perform_sliding_window_analysis(window_years=30)
trend_results = engine.extract_trends(window_analysis)

# Display trend significance by season
for season, metrics in trend_results.seasonal_slopes.items():
    print(f"\nSeason: {season}")
    for param, slope in metrics.items():
        sig = trend_results.trend_confidence[season].get(param, 'N/A')
        print(f"  {param:<10 code="" f="" sig="" slope:="" year="">

Finally, we consolidated these statistical probabilities, intensities, and trends into a single JSON "manifest" to drive the simulation engine.

Monte Carlo Simulation

In this phase, we input the parameter manifest into the SimulationEngine. To quantify uncertainty and risk, we ran a Monte Carlo Simulation, generating 1000 independent realizations of the coming year using parallel processing for efficiency.

Code Example: Parallel Monte Carlo Simulation

import multiprocessing as mp
from datetime import datetime

# Configuration
n_realizations = 1000
start_date = datetime(2025, 1, 1)
days_to_simulate = 365

# Prepare parallel tasks
tasks = [
    (manifest, False, i, start_date, days_to_simulate) 
    for i in range(n_realizations)
]

# Execute on multiple cores
with mp.Pool(processes=mp.cpu_count() - 1) as pool:
    results_list = pool.map(io.run_simulation_worker, tasks)

# Assemble into DataFrame
dates = pd.date_range(start=start_date, periods=days_to_simulate, freq='D')
mc_df = pd.DataFrame(
    dict(zip([f"Run_{i}" for i in range(n_realizations)], results_list)), 
    index=dates
)

Above: The envelope plot. Each gray line represents one possible future. The blue band displays the 10th-to-90th percentile range, offering a clear visualization of uncertainty.

Using 100 years of synthetic data allows for the analysis of extremes that may be rare in the historical record. We examined the distribution of total annual rainfall to understand the spread between drought conditions and high-precipitation years.

Above: The distribution of total rainfall across 100 simulated years.

We verified the model's performance by checking seasonality patterns and analyzing extreme storm risks.

Above: Seasonality check—confirming the model captures the region's wet/dry cycle.

Above: Risk analysis showing the probability range for the year's most intense storm.

We also evaluated a comparison of stochastic simulation methods:

Long-term Precipitation Simulation Method Comparison

To properly evaluate precipitation simulation methods, we conducted an extensive comparison using 100-year simulations (10 runs each) to capture the full range of extreme events that occur in the 78-year historical record. This approach addresses a critical limitation of short-term simulations: they cannot adequately represent rare extreme events that may occur only once every few decades.

Methods Compared

WGEN (Stationary): Traditional Markov-chain gamma model with fixed parameters
WGEN + Trends (Non-stationary): Enhanced model incorporating long-term parameter evolution
Random Bootstrap: Direct resampling from historical data (baseline reference)

Performance Metrics

Method	Annual Total (mm)	Bias vs Historical	Annual Max (mm)	Bias vs Historical	Extreme Max (mm)	Extreme Bias
Historical	981.7 ± 165.5	—	50.9 ± 17.7	—	127.5	—
WGEN	912.9 ± 133.2	-7.0%	45.0 ± 11.5	-11.5%	106.1	-16.8%
WGEN + Trends	867.7 ± 141.3	-11.6%	47.6 ± 12.8	-6.5%	134.6	+5.6%
Random Bootstrap	979.5 ± 163.8	-0.2%	51.3 ± 18.5	+0.8%	127.5	0.0%

Findings

Extreme Event Reproduction: The most significant finding is that WGEN + Trends successfully captures and even exceeds historical extreme events (134.6 mm vs 127.5 mm historical maximum), while standard WGEN significantly underestimates extremes (-16.8%). This demonstrates the critical importance of incorporating non-stationary behavior for extreme event modeling.

Statistical Fidelity: Random Bootstrap provides near-perfect reproduction of historical statistics (as expected), serving as our benchmark. The coefficient of variation analysis shows:

Random Bootstrap: CV ratio = 0.99 (annual totals), 1.04 (annual maxes)
WGEN + Trends: CV ratio = 0.97 (annual totals), 0.77 (annual maxes)
Standard WGEN: CV ratio = 0.87 (annual totals), 0.73 (annual maxes)

Simulation Length Impact: This analysis demonstrates why short-term simulations (1 year) fail to capture extreme events. The 100-year simulations reveal that WGEN + Trends can generate events beyond the historical record through its dynamic parameter evolution, with parameters changing by 20-65% over the simulation period.

Practical Applications

Recommendation Summary:

Climate Impact Studies: Use Random Bootstrap for most accurate historical representation
Extreme Event Analysis: Use WGEN + Trends with long simulations (50+ years) to capture rare events
Future Climate Projections: WGEN + Trends is the only method capable of generating novel extreme events beyond historical observations
Risk Assessment: Consider ensemble approaches combining multiple methods for comprehensive uncertainty quantification

The analysis encompassed 1,000 simulated years per method (10 runs × 100 years each), providing robust statistical power to evaluate rare extreme events that define flood risk and infrastructure design criteria.

Future Projections
Finally, we enabled trend_mode=True to simulate 50 years into the future, extending the climate trends identified in the analysis phase.

Code Example: Long-term Climate Projection

# Initialize future climate simulation with trends
sim_future = pg.SimulationEngine(manifest, trend_mode=True, random_seed=99)
start_future = datetime(2025, 1, 1)
sim_future.initialize(start_future)

# Simulate 50 years of evolving climate
days = 50 * 365
future_values = [sim_future.step() for _ in range(days)]
future_dates = pd.date_range(start=start_future, periods=days, freq='D')
future_series = pd.Series(future_values, index=future_dates)

# Visualize long-term trends
viz.plot_future_projection(future_series)
viz.plot_seasonal_trends(future_series)

Above: A 50-year projection showing the drift in rainfall patterns over time.

Above: Breakdown of future trends by season.

Model Verification

To validate the model, we compared a 100-year synthetic history (using trend_mode=False) against the original 78-year historical record. Three specific tests were conducted:

Code Example: Comprehensive Model Validation

# Generate synthetic "history" for validation
years_to_sim = 100
sim = pg.SimulationEngine(manifest, trend_mode=False, random_seed=42)
sim.initialize(datetime(2000, 1, 1))

synth_values = [sim.step() for _ in range(years_to_sim * 365)]
synth_dates = pd.date_range(start='2000-01-01', periods=len(synth_values), freq='D')
synth_series = pd.Series(synth_values, index=synth_dates)

# Run validation tests
viz.plot_validation_climatology(hist_series, synth_series)    # Seasonality
viz.plot_validation_dry_spells(hist_series, synth_series)     # Drought persistence  
viz.plot_validation_qq(hist_series, synth_series)            # Intensity distribution

1. Seasonality Test
We overlaid synthetic monthly averages (blue line) against historical data (gray bars) to verify that the model correctly reproduces the region's seasonal characteristics.

Above: The synthetic data tracks historical seasonality, capturing the wet winters and dry summers.

2. Drought Persistence Test
We verified the model's ability to simulate dry spells by comparing the frequency of drought durations (e.g., 2-day, 10-day, 20-day) using a log-scale plot.

Above: The Dry Spell Duration plot. The overlap indicates accurate reproduction of drought persistence.

3. Intensity Test (Q-Q Plot)
A Quantile-Quantile (Q-Q) plot was used to compare the distribution of historical storms versus synthetic ones. Alignment with the red diagonal line confirms a distributional match.

Above: The Q-Q Plot. Points aligned with the red line indicate the synthetic storm intensity distribution matches historical records.

Advantages for PrecipGen Applications

Production-Ready Features

Automated Quality Control: Built-in data validation prevents simulation failures from incomplete datasets
Parallel Processing: Monte Carlo simulations scale efficiently across multiple CPU cores
Trend-Aware Modeling: Distinguishes between stationary historical patterns and non-stationary future projections
Comprehensive Validation: Three-tier testing (seasonality, persistence, intensity) ensures model fidelity

Recommendations for Enhancement

For Research Applications

Spatial Interpolation: Extend to multi-station networks for regional coverage
Sub-daily Resolution: Incorporate hourly precipitation patterns for flood modeling
Extreme Value Theory: Add specialized handling for rare events (>100-year storms)
Ensemble Forecasting: Interface with numerical weather prediction models

Limitations

Point-Scale Only: Current implementation assumes spatial homogeneity
Linear Trends: Climate change may exhibit non-linear acceleration
Independence Assumption: Does not model teleconnections (El Niño, PDO)
Validation Period: 78-year record may not capture full climate variability

Conclusion

This case study demonstrates PrecipGen's capability to transform raw weather station data into actionable stochastic simulations. The library's automated workflow (from station discovery through Monte Carlo analysis) reduces the technical barrier for climate risk assessment while maintaining scientific rigor through comprehensive validation protocols.

For operational applications, the combination of parallel processing, trend analysis, and uncertainty quantification provides a robust foundation for infrastructure planning, agricultural modeling, and hydrological forecasting. The modular design allows researchers to extend functionality while preserving the core statistical framework.

Complete Implementation Available
View Source Code & Documentation →

References

Richardson, C. W., 1981: Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour. Res.,17, 182–190.

Search This Blog

Integrated Water Resources Management