Trends with the GOZCARDS DatasetΒΆ
Here we calculate trends using the GOZCARDS dataset by regressing to the VMR monthly zonal means using seasonal terms in our predictors.
[2]:
import xarray as xr
import numpy as np
from LOTUS_regression.regression import regress_all_bins
from LOTUS_regression.predictors.seasonal import add_seasonal_components
from LOTUS_regression.predictors import load_data
The GOZCARDS data is in multiple NetCDF4 files. Load them all in and concatenate on the time dimension. Also only take values in the latitude range -60 to 60.
[3]:
GOZCARDS_FILES = r'/home/runner/work/lotus-regression/lotus-regression/test_data//GOZCARDS/*.nc4'
data = xr.decode_cf(xr.open_mfdataset(GOZCARDS_FILES, combine='nested', concat_dim='time', group='Merged', engine='netcdf4'))
data = data.loc[dict(lat=slice(-60, 60))]
print(data)
<xarray.Dataset> Size: 47MB
Dimensions: (time: 384, data_source: 6, overlap: 6, lev: 25,
lat: 12)
Coordinates:
* lat (lat) float32 48B -55.0 -45.0 -35.0 ... 35.0 45.0 55.0
* lev (lev) float32 100B 1e+03 681.3 464.2 ... 0.1468 0.1
* time (time) datetime64[ns] 3kB 1979-01-15 ... 2012-12-15
* data_source (data_source) int32 24B 1 2 3 4 5 6
* overlap (overlap) int32 24B 1 2 3 4 5 6
Data variables: (12/13)
data_source_name (time, data_source) |S20 46kB dask.array<chunksize=(12, 6), meta=np.ndarray>
overlap_start_time (time, overlap) datetime64[ns] 18kB dask.array<chunksize=(12, 6), meta=np.ndarray>
overlap_end_time (time, overlap) datetime64[ns] 18kB dask.array<chunksize=(12, 6), meta=np.ndarray>
overlap_used_source (time, overlap, data_source) int8 14kB dask.array<chunksize=(12, 6, 6), meta=np.ndarray>
overlap_source_total (time, overlap, data_source, lev, lat) float64 33MB dask.array<chunksize=(12, 6, 6, 25, 12), meta=np.ndarray>
average (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
... ...
std_dev (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
std_error (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
minimum (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
maximum (time, lev, lat) float32 461kB dask.array<chunksize=(12, 25, 12), meta=np.ndarray>
offset (time, data_source, lev, lat) float32 3MB dask.array<chunksize=(12, 6, 25, 12), meta=np.ndarray>
offset_std_error (time, data_source, lev, lat) float32 3MB dask.array<chunksize=(12, 6, 25, 12), meta=np.ndarray>
Attributes:
OriginalFileVersion: 1.01
OriginalFileProcessingCenter: Jet Propulsion Laboratory / California Ins...
InputOriginalFile: All available original source data for the...
DataQuality: All data screening according to the specif...
DataSource: SAGE-I v5.9_rev, SAGE-II v6.2, HALOE v19, ...
DataSubset: Full Dataset
Load some standard predictors and add a constant
[4]:
predictors = load_data('pred_baseline_pwlt.csv')
print(predictors.columns)
Index(['enso', 'solar', 'qboA', 'qboB', 'aod', 'linear_pre', 'linear_post',
'constant'],
dtype='object')
Currently, our predictors have no seasonal dependence. Add in some seasonal terms with different numbers of Fourier components.
[5]:
predictors = add_seasonal_components(predictors, {'constant': 4, 'linear_pre': 2, 'linear_post': 2, 'qboA': 2, 'qboB': 2})
print(predictors[:5])
enso solar qboA qboB aod linear_pre \
time
1979-01-01 0.534698 1.578955 -1.067789 0.952231 -0.411760 -1.800000
1979-02-01 0.351147 1.633527 -0.831698 0.892204 -0.412330 -1.791667
1979-03-01 0.004440 1.318208 -1.168007 0.628544 -0.413462 -1.783333
1979-04-01 0.269569 1.129935 -1.229261 0.561389 -0.416360 -1.775000
1979-05-01 0.330752 1.002783 -1.226434 0.052783 -0.420368 -1.766667
linear_post constant qboA_sin0 qboA_cos0 ... \
time ...
1979-01-01 0.0 1.0 -0.000000 -1.067789 ...
1979-02-01 0.0 1.0 -0.422799 -0.716214 ...
1979-03-01 0.0 1.0 -0.992164 -0.616320 ...
1979-04-01 0.0 1.0 -1.228947 -0.027752 ...
1979-05-01 0.0 1.0 -1.080099 0.580970 ...
linear_post_sin1 linear_post_cos1 constant_sin0 constant_cos0 \
time
1979-01-01 0.0 0.0 0.000000 1.000000
1979-02-01 0.0 0.0 0.508356 0.861147
1979-03-01 0.0 -0.0 0.849450 0.527668
1979-04-01 0.0 -0.0 0.999745 0.022576
1979-05-01 -0.0 -0.0 0.880683 -0.473706
constant_sin1 constant_cos1 constant_sin2 constant_cos2 \
time
1979-01-01 0.000000 1.000000 0.000000 1.000000
1979-02-01 0.875539 0.483147 0.999579 -0.029025
1979-03-01 0.896456 -0.443132 0.096613 -0.995322
1979-04-01 0.045141 -0.998981 -0.997707 -0.067683
1979-05-01 -0.834370 -0.551205 -0.090190 0.995925
constant_sin3 constant_cos3
time
1979-01-01 0.000000 1.000000
1979-02-01 0.846029 -0.533137
1979-03-01 -0.794497 -0.607268
1979-04-01 -0.090190 0.995925
1979-05-01 0.919817 -0.392347
[5 rows x 32 columns]
Perform the regression and convert the coefficients to percent anomaly.
[6]:
data = data.sel(lev=slice(100, 0.2)).sel(lat=slice(-65, 65)) # remove bins without data
results = regress_all_bins(predictors, data['average'], tolerance=0.1)
# Convert to ~ percent
results /= data['average'].mean(dim='time')
results *= 100
print(results)
<xarray.Dataset> Size: 105kB
Dimensions: (lev: 17, lat: 12)
Coordinates:
* lev (lev) float32 68B 100.0 68.13 46.42 ... 0.3162 0.2154
* lat (lat) float32 48B -55.0 -45.0 -35.0 ... 35.0 45.0 55.0
Data variables: (12/64)
enso (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
enso_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
solar (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
solar_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
qboA (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
qboA_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
... ...
constant_cos2 (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_cos2_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_sin3 (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_sin3_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_cos3 (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
constant_cos3_std (lev, lat) float64 2kB dask.array<chunksize=(17, 12), meta=np.ndarray>
[7]:
import LOTUS_regression.plotting.trends as trends
trends.pre_post_with_confidence(results, x='lat', y='lev', ylim=(100, 0.5), log_y=True, figsize=(16, 6),
x_label='Latitude [$^\circ$]', y_label='Pressure [hPa]', pre_title='Pre 1997',
post_title='Post 1997')