Univariate linear regression#
In this section, we introduce the default PSM in cfr
based on univariate linear regression. It can be applied to any proxy type that is believed to have a univariate linear relationship with a certain climate variable. It also supports a seasonality searching procedure to help determine the seasonality of a specific site.
For instance, this PSM can be applied to the tree.MXD
records, which we believe have high linear correlation with the local temperature condition over a growing season.
[5]:
%load_ext autoreload
%autoreload 2
import cfr
print(cfr.__version__)
import numpy as np
import os
os.chdir('/glade/u/home/fengzhu/Github/cfr/docsrc/notebooks')
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
2024.8.1
Data preparation#
Proxy#
[2]:
pdb = cfr.ProxyDatabase().fetch('PAGES2kv2')
[3]:
pobj = pdb.records['NAm_164']
fig, ax = pobj.plot()
Model#
[6]:
model_tas = cfr.ClimateField().fetch('iCESM_past1000historical/tas')
model_pr = cfr.ClimateField().fetch('iCESM_past1000historical/pr')
>>> The target file seems existed at: ./data/tas_sfc_Amon_iCESM_past1000historical_085001-200512.nc . Loading from it instead of downloading ...
>>> The target file seems existed at: ./data/pr_sfc_Amon_iCESM_past1000historical_085001-200512.nc . Loading from it instead of downloading ...
[7]:
model_tas.da
[7]:
<xarray.DataArray 'tas' (time: 13872, lat: 96, lon: 144)> Size: 767MB [191766528 values with dtype=float32] Coordinates: * time (time) object 111kB 0850-01-17 00:00:00 ... 2005-12-17 00:00:00 * lat (lat) float32 384B -90.0 -88.11 -86.21 -84.32 ... 86.21 88.11 90.0 * lon (lon) float32 576B 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5 Attributes: long_name: Reference height temperature units: K
[8]:
print(np.median(np.diff(model_tas.da.lat)))
print(np.median(np.diff(model_tas.da.lon)))
1.8947372
2.5
Instrumental observations#
[9]:
obs_tas = cfr.ClimateField().fetch('CRUTSv4.07/tas', vn='tmp')
obs_pr = cfr.ClimateField().fetch('CRUTSv4.07/pr', vn='pre')
>>> The target file seems existed at: ./data/cru_ts4.07.1901.2022.tmp.dat.nc.gz . Loading from it instead of downloading ...
>>> The target file seems existed at: ./data/cru_ts4.07.1901.2022.pre.dat.nc.gz . Loading from it instead of downloading ...
[10]:
obs_tas = obs_tas.rename('tas')
obs_pr = obs_pr.rename('pr')
[11]:
fig, ax = obs_tas.plot(levels=np.linspace(-40, 40, 11))
[12]:
obs_tas_new = obs_tas.wrap_lon()
[13]:
obs_tas_new.da.coords
[13]:
Coordinates:
* lat (lat) float32 1kB -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
* time (time) object 12kB 1901-01-16 00:00:00 ... 2022-12-16 00:00:00
* lon (lon) float32 3kB 0.25 0.75 1.25 1.75 ... 358.2 358.8 359.2 359.8
[14]:
fig, ax = obs_tas_new.plot(levels=np.linspace(-40, 40, 11))
[14]:
obs_pr.da.coords
[14]:
Coordinates:
* lon (lon) float32 0.25 0.75 1.25 1.75 2.25 ... 358.2 358.8 359.2 359.8
* lat (lat) float32 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
* time (time) object 1901-01-16 00:00:00 ... 2022-12-16 00:00:00
[15]:
obs_pr_new = obs_pr.wrap_lon()
[16]:
obs_pr_new.da.coords
[16]:
Coordinates:
* lon (lon) float32 0.25 0.75 1.25 1.75 2.25 ... 358.2 358.8 359.2 359.8
* lat (lat) float32 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
* time (time) object 1901-01-16 00:00:00 ... 2022-12-16 00:00:00
Get climate data for a specific ProxyRecord
#
[17]:
%%time
pobj.del_clim()
pobj.get_clim(model_tas, tag='model')
pobj.get_clim(model_pr, tag='model')
pobj.get_clim(obs_tas_new, tag='obs')
pobj.get_clim(obs_pr_new, tag='obs')
CPU times: user 9.65 ms, sys: 198 ms, total: 207 ms
Wall time: 2.3 s
[18]:
pobj.clim['obs.tas'].da
[18]:
<xarray.DataArray 'tas' (time: 1464)> array([ -9.5 , -11.8 , -8.1 , ..., 2.5 , -6.3 , -9.400001], dtype=float32) Coordinates: lon float32 254.2 lat float32 40.25 * time (time) object 1901-01-16 00:00:00 ... 2022-12-16 00:00:00 Attributes: long_name: near-surface temperature units: degrees Celsius correlation_decay_distance: 1200.0
Create a PSM object#
[19]:
lr_mdl = cfr.psm.Linear(pobj)
[20]:
%%time
sn_list = [
[1,2,3,4,5,6,7,8,9,10,11,12],
[6,7,8],
[3,4,5,6,7,8],
[6,7,8,9,10,11],
[-12,1,2],
[-9,-10,-11,-12,1,2],
[-12,1,2,3,4,5]
]
lr_mdl.calibrate(season_list=sn_list)
CPU times: user 121 ms, sys: 29.3 ms, total: 150 ms
Wall time: 152 ms
[21]:
lr_mdl.calib_details
[21]:
{'df': proxy tas
time
1901.0 1.013 10.933333
1902.0 1.038 9.900001
1903.0 1.014 9.733334
1904.0 0.935 8.933333
1905.0 1.009 10.166667
... ... ...
1978.0 1.001 10.666667
1979.0 0.959 10.366667
1980.0 0.987 11.700000
1981.0 0.947 11.266666
1982.0 1.019 10.433333
[82 rows x 2 columns],
'nobs': 82.0,
'fitR2adj': 0.20442391329166198,
'PSMresid': time
1901.0 -0.004534
1902.0 0.050338
1903.0 0.031156
1904.0 -0.024717
1905.0 0.013629
...
1978.0 -0.008825
1979.0 -0.042152
1980.0 -0.052697
1981.0 -0.080170
1982.0 0.015920
Length: 82, dtype: float64,
'PSMmse': 0.0019213288992947159,
'SNR': 0.5221711278110613,
'seasonality': [6, 7, 8]}
[22]:
%%time
pp = lr_mdl.forward()
CPU times: user 121 ms, sys: 42 ms, total: 163 ms
Wall time: 159 ms
[23]:
fig, ax = pp.plot()
[ ]: