Seasonal trends with the GOZCARDS Dataset¶
Here we calculate seasonal trends using the GOZCARDS dataset by regressing to the VMR monthly zonal means using seasonal terms in our predictors.
[2]:
import xarray as xr
import numpy as np
from LOTUS_regression.regression import regress_all_bins
from LOTUS_regression.predictors.seasonal import add_seasonal_components
from LOTUS_regression.predictors import load_data
The GOZCARDS data is in multiple NetCDF4 files. Load them all in and concatenate on the time dimension. Also only take values in the latitude range -60 to 60.
[3]:
GOZCARDS_FILES = r'/home/runner/work/lotus-regression/lotus-regression/test_data/GOZCARDS/*.nc4'
data = xr.decode_cf(xr.open_mfdataset(GOZCARDS_FILES, combine='nested', concat_dim='time', group='Merged', engine='netcdf4'))
data = data.loc[dict(lat=slice(-60, 60))]
print(data)
<xarray.Dataset> Size: 47MB
Dimensions: (time: 384, data_source: 6, overlap: 6, lev: 25,
lat: 12)
Coordinates:
* lat (lat) float32 48B -55.0 -45.0 -35.0 ... 35.0 45.0 55.0
* lev (lev) float32 100B 1e+03 681.3 464.2 ... 0.1468 0.1
* time (time) datetime64[ns] 3kB 1979-01-15 ... 2012-12-15
* data_source (data_source) int32 24B 1 2 3 4 5 6
* overlap (overlap) int32 24B 1 2 3 4 5 6
Data variables: (12/13)
data_source_name (time, data_source) |S20 46kB dask.array<chunksize=(12, 6), meta=np.ndarray>
overlap_start_time (time, overlap) datetime64[ns] 18kB dask.array<chunksize=(12, 6), meta=np.ndarray>
overlap_end_time (time, overlap) datetime64[ns] 18kB dask.array<chunksize=(12, 6), meta=np.ndarray>
overlap_used_source (time, overlap, data_source) int8 14kB dask.array<chunksize=(12, 6, 6), meta=np.ndarray>
overlap_source_total (time, overlap, data_source, lev, lat) float64 33MB dask.array<chunksize=(12, 6, 6, 25, 12), meta=np.ndarray>
average (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
... ...
std_dev (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
std_error (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
minimum (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
maximum (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
offset (time, data_source, lev, lat) float32 3MB dask.array<chunksize=(12, 6, 25, 12), meta=np.ndarray>
offset_std_error (time, data_source, lev, lat) float32 3MB dask.array<chunksize=(12, 6, 25, 12), meta=np.ndarray>
Attributes:
OriginalFileVersion: 1.01
OriginalFileProcessingCenter: Jet Propulsion Laboratory / California Ins...
InputOriginalFile: All available original source data for the...
DataQuality: All data screening according to the specif...
DataSource: SAGE-I v5.9_rev, SAGE-II v6.2, HALOE v19, ...
DataSubset: Full Dataset
Load some standard predictors and add a constant
[4]:
predictors = load_data('pred_baseline_pwlt.csv')
print(predictors.columns)
Index(['enso', 'solar', 'qboA', 'qboB', 'aod', 'linear_pre', 'linear_post',
'constant'],
dtype='object')
Currently our predictors have no seasonal dependence. Add in some seasonal terms with different numbers of Fourier components.
[5]:
predictors = add_seasonal_components(predictors, {'constant': 4, 'linear_pre': 4, 'linear_post': 4, 'qboA': 2, 'qboB': 2})
print(predictors[:5])
enso solar qboA qboB aod linear_pre \
time
1979-01-01 0.534698 1.578955 -1.067789 0.952231 -0.411760 -1.800000
1979-02-01 0.351147 1.633527 -0.831698 0.892204 -0.412330 -1.791667
1979-03-01 0.004440 1.318208 -1.168007 0.628544 -0.413462 -1.783333
1979-04-01 0.269569 1.129935 -1.229261 0.561389 -0.416360 -1.775000
1979-05-01 0.330752 1.002783 -1.226434 0.052783 -0.420368 -1.766667
linear_post constant qboA_sin0 qboA_cos0 ... \
time ...
1979-01-01 0.0 1.0 -0.000000 -1.067789 ...
1979-02-01 0.0 1.0 -0.422799 -0.716214 ...
1979-03-01 0.0 1.0 -0.992164 -0.616320 ...
1979-04-01 0.0 1.0 -1.228947 -0.027752 ...
1979-05-01 0.0 1.0 -1.080099 0.580970 ...
linear_post_sin3 linear_post_cos3 constant_sin0 constant_cos0 \
time
1979-01-01 0.0 0.0 0.000000 1.000000
1979-02-01 0.0 -0.0 0.508356 0.861147
1979-03-01 -0.0 -0.0 0.849450 0.527668
1979-04-01 -0.0 0.0 0.999745 0.022576
1979-05-01 0.0 -0.0 0.880683 -0.473706
constant_sin1 constant_cos1 constant_sin2 constant_cos2 \
time
1979-01-01 0.000000 1.000000 0.000000 1.000000
1979-02-01 0.875539 0.483147 0.999579 -0.029025
1979-03-01 0.896456 -0.443132 0.096613 -0.995322
1979-04-01 0.045141 -0.998981 -0.997707 -0.067683
1979-05-01 -0.834370 -0.551205 -0.090190 0.995925
constant_sin3 constant_cos3
time
1979-01-01 0.000000 1.000000
1979-02-01 0.846029 -0.533137
1979-03-01 -0.794497 -0.607268
1979-04-01 -0.090190 0.995925
1979-05-01 0.919817 -0.392347
[5 rows x 40 columns]
Perform the regression and convert the coefficients to percent anomaly. We pass include_monthly_fits = True so that the seasonal fits are used to calculate monthly trends. The results at the end will include ‘linear_post_monthly’ and ‘linear_post_monthly_std’ that are the monthly trend terms and errors respectively
[6]:
data = data.sel(lev=slice(100, 0.2)).sel(lat=slice(-65, 65)) # remove bins without data
results = regress_all_bins(predictors, data['average'], tolerance=0.1, include_monthly_fits=True)
# Convert to ~ percent
results /= data['average'].mean(dim='time')
results *= 100
print(results)
<xarray.Dataset> Size: 327kB
Dimensions: (lev: 17, lat: 12, month: 12)
Coordinates:
* lev (lev) float32 68B 100.0 68.13 ... 0.3162 0.2154
* lat (lat) float32 48B -55.0 -45.0 -35.0 ... 45.0 55.0
* month (month) int64 96B 1 2 3 4 5 6 7 8 9 10 11 12
Data variables: (12/90)
enso (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
enso_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
solar (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
solar_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
qboA (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
qboA_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
... ...
constant_cos2 (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_cos2_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_sin3 (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_sin3_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_cos3 (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_cos3_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
[7]:
import LOTUS_regression.plotting.trends as trends
import matplotlib.pyplot as plt
# Plot the seasonal post trends at the level closest to 2 hPa (2.15 hPa)
trends.plot_with_confidence(results.sel(lev=2, method='nearest'), 'linear_post_monthly', x='lat', y='month')
plt.xlabel('Latitude')
plt.ylabel('Month')
[7]:
Text(0, 0.5, 'Month')