API Reference#
Proxy#
- class cfr.proxy.ProxyRecord(pid=None, time=None, value=None, lat=None, lon=None, elev=None, ptype=None, climate=None, tags=None, value_name=None, value_unit=None, time_name=None, time_unit=None, seasonality=None)#
The class for a proxy record.
- annualize(months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], force=False, verbose=False)#
Annualize/seasonalize the proxy record based on a list of months.
- Parameters:
months (list) – the months based on which for annualization; e.g., [6, 7, 8] means JJA annualization
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.
- center(ref_period=None, thresh=5, force=False, verbose=False)#
Centering the proxy timeseries regarding a reference period.
- Parameters:
ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
thresh (int) – the minimum number of data points required to perform the centering. If not satisfied, and force=False, the record will not be centered.
force (bool) – if True, the record will be centered regardless of the number of data points. Defaults to False.
verbose (bool, optional) – print verbose information. Defaults to False.
- Returns:
contains the centered values.
- Return type:
new (ProxyRecord)
- concat(rec_list)#
Concatenate the record with a list of other records assuming they share the same location and other metadata.
- Parameters:
rec_list (list or ProxyRecord) – a list of ProxyRecord objects.
- copy()#
Make a deepcopy of the object.
- correct_elev_tas(t_rate=-9.8, verbose=False)#
Correct the tas with t_rate = -9.8 degC/km upward by default.
- Parameters:
t_rage (float) – the temperature adjustment rate based on elevation bias.
verbose (bool, optional) – print verbose information. Defaults to False.
- dashboard(figsize=[10, 8], ms=200, stock_img=True, edge_clr='w', self_lb='real', psd_lb='PSD', pseudo_lb='pseudo', wspace=0.1, hspace=0.3, spec_method='wwz', spec_settings=None, pseudo_clr=None, **kwargs)#
Plot a dashboard of the proxy/pseudoproxy.
- Parameters:
figsize (list or tuple) – the figure size.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
spec_method (str) – the spectral analysis method to apply.
spec_settings (dict) – the dictionary of the keyword arguments for the specified spectral analysis method.
pseudo_clr (str) – the color for the pseudoproxy.
pseudo_lb (str) – the label for the pseudoproxy.
self_lb (str) – the label for the self ProxyRecord.
- dashboard_clim(clim_units=None, clim_colors=None, figsize=[14, 8], scaled_pr=False, ms=200, stock_img=True, edge_clr='w', wspace=0.3, hspace=0.5, spec_method='wwz', **kwargs)#
Plot a dashboard of the proxy/pseudoproxy along with the climate signal.
- Parameters:
figsize (list or tuple) – the figure size.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
spec_method (str) – the spectral analysis method to apply.
pseudo_clr (str) – the color for the pseudoproxy.
clim_units (dict, optional) – the dictionary of units for climate signals. Defaults to None.
clim_colors (dict, optional) – the dictionary of colors for climate signals. Defaults to None.
scaled_pr (bool) – scale the precipitation values.
- del_clim(verbose=False)#
Delete the “clim” attribute of the ProxyRecord object.
- Parameters:
verbose (bool, optional) – print verbose information. Defaults to False.
- del_pseudo(verbose=False)#
Delete the pseudo attribute of the ProxyRecord object.
- Parameters:
verbose (bool, optional) – print verbose information. Defaults to False.
- from_da(da)#
Get the time and value axis from the given xarray.DataArray
- Parameters:
da (xarray.DataArray) – the xarray.DataArray object to load from.
- get_clim(fields, tag=None, verbose=False, search_dist=5, load=True, **kwargs)#
Get the nearest climate from climate fields
- Parameters:
fields (list of cfr.climate.ClimateField) – the climate fields
tag (str) – the tag to put on the obtained climate field, which will be named in the format of “tag.variable_name”.
search_dist (float) – the farest distance to search for climate data in degree
verbose (bool, optional) – print verbose information. Defaults to False.
load (bool) – if True, the list of climate fields will be loaded into the memory instead of lazy loading.
- get_pseudo(psm=None, signal=None, calibrate=True, add_noise=False, noise='white', SNR=10, seed=None, match_mean=False, match_var=False, verbose=False, calib_kws=None, forward_kws=None, colored_noise_kws=None)#
Generate the pseudoproxy
- Parameters:
psm (object) – the PSM objects in cfr.psm
signal (cfr.ProxyRecord) – the signal part for the pseudoproxy. If not provided, it will be generated using the specified psm; if provided, the PSM part will be skipped.
calibrate (bool) – if True and the PSM supports calibration, then the PSM will be calibrated.
add_noise (bool) – if True, noise will be added onto the signal.
noise (str) – noise type; supports “white” for white noise and “colored” for colored noise.
colored_noise_kws (dict) – the dictionary of the keyword arguments for colored noise generation.
match_mean (bool) – match the mean of the pseudoproxy to the real record.
match_var (bool) – match the variance of the pseudoproxy to the real record.
verbose (bool, optional) – print verbose information. Defaults to False.
calib_kws (dict) – the dictionary of the keyword arguments for the calibration step of the PMSs.
forward_kws (dict) – the dictionary of the keyword arguments for the forward step of the PMSs.
- load_nc(path, **kwargs)#
Load the record from a netCDF file.
- Parameters:
path (str) – the path to save the file.
- plot(figsize=[12, 4], legend=False, ms=200, stock_img=True, edge_clr='w', wspace=0.1, hspace=0.1, plot_map=True, p=<class 'cfr.visual.STYLE'>, **kwargs)#
Visualize the ProxyRecord
- Parameters:
figsize (list or tuple) – the figure size.
legend (bool) – if True, plot the legend.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
plot_map (bool) – if True, plot the record on a map. Defaults to True.
- plot_compare(ref, label=None, title=None, ref_label=None, ref_color=None, ref_zorder=2, figsize=[12, 4], legend=False, ms=200, stock_img=True, edge_clr='w', wspace=0.1, hspace=0.1, plot_map=True, lgd_kws=None, **kwargs)#
Plot against another reference record.
- Parameters:
ref (cfr.proxy.ProxyRecord) – the reference record.
label (str) – the label of the self record.
ref_label (str) – the label of the reference record.
ref_color (str) – the color to visualize the reference record.
ref_zorder (int) – the z-axis ordering of the reference record.
title (str) – the title of the figure.
figsize (list or tuple) – the figure size.
legend (bool) – if True, plot the legend.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
plot_map (bool) – if True, plot the record on a map.
lgd_kws (diction) – the dictionary of keyword arguments for the legend.
- plot_dups(figsize=[12, 4], legend=False, ms=200, stock_img=True, edge_clr='w', wspace=0.1, hspace=0.1, plot_map=True, lgd_kws=None, **kwargs)#
Plot the against other duplicated records
- Parameters:
figsize (list or tuple) – the figure size.
legend (bool) – if True, plot the legend.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
plot_map (bool) – if True, plot the record on a map.
lgd_kws (diction) – the dictionary of keyword arguments for the legend.
- plotly(**kwargs)#
Visualize the ProxyRecord with plotly
- slice(timespan)#
Slicing the timeseries with a timespan (tuple or list)
- Parameters:
timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].
- Returns:
The sliced Series object.
- Return type:
- standardize(ref_period=None, thresh=5, force=False, verbose=False)#
Standardizes the record. If the record is constant, a vector of 0s is returned.
- Parameters:
ref_period (list, optional) – [min_time, max_time]. The default is None.
- Returns:
contains standardized values.
- Return type:
new (ProxyRecord)
- to_da()#
Convert to Xarray.DataArray for computation purposes
- to_nc(path, verbose=True, **kwargs)#
Convert the record to a netCDF file.
- Parameters:
path (str) – the path to save the file.
verbose (bool, optional) – print verbose information. Defaults to False.
- class cfr.proxy.ProxyDatabase(records=None, source=None)#
The class for a proxy database.
- annualize(months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], force=False, verbose=False)#
Annualize the records in the proxy database.
- Parameters:
months (list) – the months based on which for annualization; e.g., [6, 7, 8] means JJA annualization
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.
- center(ref_period, force=False, thresh=5, verbose=False)#
Center the proxy timeseries against a reference time period.
- Parameters:
ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.
thresh (int) – the minimum number of data points to perform the processing.
- Returns
new (cfr.ProxyDatabase)
- copy()#
Make a deepcopy of the object.
- correct_elev_tas(t_rate=-9.8, verbose=False)#
Correct the tas with t_rate = -9.8 degC/km upward by default.
- Parameters:
t_rage (float) – the temperature adjustment rate based on elevation bias.
verbose (bool, optional) – print verbose information. Defaults to False.
- count_availability(year=array([0, 1, 2, ..., 1998, 1999, 2000]))#
Count the proxy availability in time
- Parameters:
year (array-like) – count the proxy availability based on the given years.
- del_clim(verbose=False)#
Delete the nearest climate data for the records in the proxy database.
- Parameters:
verbose (bool, optional) – print verbose information. Defaults to False.
- fetch(name=None, **kwargs)#
Fetch a proxy database from cloud
- Parameters:
name (str) – a predifined database name or an URL starting with “http”
- filter(by, keys, mode='fuzzy')#
Filter the proxy database according to given ptype list.
- Parameters:
by (str) – filter by a keyword {‘ptype’, ‘pid’, ‘dt’, ‘lat’, ‘lon’, ‘loc’, ‘tag’}
keys (set) –
a set of keywords
For by = ‘ptype’ or ‘pid’, keys take a fuzzy match
For by = ‘dt’ or ‘lat’ or ‘lon’, keys = (min, max)
For by = ‘loc-squre’, keys = (lat_min, lat_max, lon_min, lon_max)
For by = ‘loc-circle’, keys = (center_lat, center_lon, distance)
For by = ‘tag’, keys should be a list of tags
mode (str) – ‘fuzzy’ or ‘exact’ search when by = ‘ptype’ or ‘pid’
- find_duplicates(r_thresh=0.9, time_period=[0, 2000])#
Find duplicated proxy records based on a correlation threshold.
- Parameters:
r_thresh (float) – the correlation threshold to determine if two records are duplicated. Defaults to 0.9.
time_period (tuple or list) – the timespan over which to compare two records. Defaults to [0, 2000].
- from_df(df, pid_column='paleoData_pages2kID', lat_column='geo_meanLat', lon_column='geo_meanLon', elev_column='geo_meanElev', time_column='year', value_column='paleoData_values', proxy_type_column='paleoData_proxy', archive_type_column='archiveType', ptype_column='ptype', value_name_column='paleoData_variableName', value_unit_column='paleoData_units', R_column='R', climate_column='climateInterpretation_variable', verbose=False)#
Load database from a pandas.DataFrame. Note that in most cases, the column names have to be specified.
- Parameters:
df (pandas.DataFrame) – a Pandas DataFrame include at least lat, lon, time, value, proxy_type
pid_column (str) – the column name for proxy ID.
lat_column (str) – the column name for latitude.
lon_column (str) – the column name for longitude.
elev_column (str) – the column name for elevation.
time_column (str) – the column name for time axis.
value_column (str) – the column name for value axis.
proxy_type_column (str) – the column name for proxy type information.
archive_type_column (str) – the column name for archive type information.
ptype_column (str) – the column name for proxy type information in format “archive.proxy”.
value_name_column (str) – the column name for proxy variable name.
value_unit_column (str) – the column name for proxy variable unit.
verbose (bool, optional) – print verbose information. Defaults to False.
- from_ds(ds)#
Load the proxy database from a xarray.Dataset
- Parameters:
ds (xarray.Dataset) – the xarray.Dataset to load from
- get_clim(field, tag=None, verbose=False, load=True, **kwargs)#
Get the nearest climate data for the records in the proxy database.
- Parameters:
fields (list of cfr.climate.ClimateField) – the climate fields
tag (str) – the tag to put on the obtained climate field, which will be named in the format of “tag.variable_name”.
verbose (bool, optional) – print verbose information. Defaults to False.
load (bool) – if True, the list of climate fields will be loaded into the memory instead of lazy loading.
- load_multi_nc(dirpath, nproc=None)#
Load from multiple netCDF files.
- Parameters:
dirpath (str) – the directory path of the multiple .nc files
nproc (int) – the number of processors for loading, the default is by multiprocessing.cpu_count()
- load_nc(path, use_cftime=True, **kwargs)#
Load the database from a netCDF file.
- Parameters:
path (str) – the path to save the file.
use_cftime (bool) – if True, use the cftime convention. Defaults to True.
- make_composite(obs=None, obs_nc_path=None, vn='tas', lat_name=None, lon_name=None, bin_width=10, n_bootstraps=1000, qs=(0.025, 0.975), stat_func=<function nanmean>, anom_period=[1951, 1980])#
Make composites of the records in the proxy database.
- Parameters:
obs (cfr.climate.ClimateField) – the observation field as a reference for scaling the proxy values.
obs_nc_path (str) – the path of the netCDF file of the reference observation.
vn (str) – the variable name of the referenced observation.
lat_name (str) – the name of the latitude dimension in the referenced observation.
lon_name (str) – the name of the longitude dimension in the referenced observation.
bin_width (int) – the width for binning.
n_bootstraps (int) – the number of bootstraps for uncertainty quantification.
qs (list or tuple) – the quantiles to plot.
stat_func (function) – the function to apply for the calculation of the binned value.
anom_period (list or tuple) – the time period over which to calculate the anomaly.
- nrec_tags(keys)#
Check the number of tagged records.
- Parameters:
keys (list) – list of tag strings
- plot(**kws)#
Visualize the proxy database. See
cfr.visual.plot_proxies()
for more information.
- plot_composite(figsize=[10, 4], clr_proxy=None, clr_count='tab:gray', clr_obs='tab:red', left_ylim=[-2, 2], right_ylim=None, ylim_num=5, xlim=[0, 2000], base_n=60, ax=None)#
Plot the composites of the records in the proxy database.
- Parameters:
figsize (list or tuple) – the figure size.
clr_proxy (str) – the color to visualize the proxy composite curve.
clr_count (str) – the color to visualize the record count.
clr_obs (str) – the color to visualize the referenced observation.
left_ylim (list) – the limit for the left y-axis.
right_ylim (list) – the limit for the right y-axis.
ylim_num (int) – the number of ticks for the left y-axis
xlim (list) – the limit for the x-axis.
base_n (int) – the number to determine the upper bound for the record count.
ax (object, optional) – matplotlib.axes. Defaults to None.
- plotly(**kwargs)#
Plot the database on an interactive map utilizing Plotly
- plotly_concise(**kwargs)#
Plot the database on an interactive map utilizing Plotly
- plotly_count(**kwargs)#
Plot the database number-counting on an interactive map utilizing Plotly
- refresh()#
Refresh a bunch of attributes.
- slice(timespan)#
Slice the records in the proxy database.
- Parameters:
timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].
- squeeze_dups(pids_to_keep=None)#
Remove the duplicated records and keep only one.
- Parameters:
pids_to_keep (list) – a list of proxy IDs forced to keep.
- standardize(ref_period, force=False, thresh=5, verbose=False)#
- Standardize elements of a proxy database against a reference time period.
Elements that have no values over the reference period are dropped
- Parameters:
ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.
thresh (int) – the minimum number of data points to perform the processing.
- Returns
new (cfr.ProxyDatabase)
- to_df()#
Convert the proxy database to a pandas.DataFrame.
- to_ds(annualize=False, months=None, verbose=True)#
Convert the proxy database to a xarray.Dataset
- Parameters:
annualize (bool) – annualize the proxy records with months
months (list) – months for annulization
verbose (bool, optional) – print verbose information. Defaults to False.
- to_multi_nc(dirpath, verbose=True, compress_params={'zlib': True})#
Convert the proxy database to multiple netCDF files. One for each record.
- Parameters:
dirpath (str) – the directory path of the multiple .nc files
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
verbose (bool, optional) – print verbose information. Defaults to False.
- to_nc(path, annualize=False, months=None, verbose=True, compress_params={'zlib': True})#
Convert the database to a netCDF file.
- Parameters:
path (str) – the path to save the file.
annualize (bool) – annualize the proxy records with months
months (list) – months for annulization
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
verbose (bool, optional) – print verbose information. Defaults to False.
Climate#
- class cfr.climate.ClimateField(da=None)#
The class for the gridded climate field data.
- annualize(months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])#
Annualize/seasonalize the climate field based on a list of months.
- Parameters:
months (list) – the months based on which for annualization; e.g., [6, 7, 8] means JJA annualization
- center(ref_period=[1951, 1980])#
Center the climate field against a reference time period.
- Parameters:
ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
time_name (str) – name of the time dimention
- compare(ref, timespan=None, stat='corr', interp_target='ref', interp=True)#
Compare against a reference field.
- Parameters:
ref (cfr.climate.ClimateField) – the reference to compare against, assuming the first dimension to be time
timespan (tuple or list) – the timespan over which to compare two ClimateField objects.
interp_target (str, optional) –
the direction to interpolate the fields:
’ref’: interpolate from self to ref
’self’: interpolate from ref to self
stat (str) –
the statistics to calculate. Supported quantaties:
’corr’: correlation coefficient
’R2’: coefficient of determination
’CE’: coefficient of efficiency
- copy()#
Make a deepcopy of the object.
- crop(lat_min=-90, lat_max=90, lon_min=0, lon_max=360)#
Crop the climate field based on the range of latitude and longitude.
Note that in cases when the crop range is crossing the 0 degree of longitude, lon_min should be less than 0.
- Parameters:
lat_min (float) – the lower bound of latitude to crop.
lat_max (float) – the upper bound of latitude to crop.
lon_min (float) – the lower bound of longitude to crop.
lon_max (float) – the upper bound of longitude to crop.
- fetch(name=None, **load_nc_kws)#
Fetch a gridded climate field from cloud
- Parameters:
name (str) – a predefined name, or an URL starting with “http”, or a local file path. If not set, the method will return hints of available predefined names.
load_nc_kws (dict) – the dictionary of keyword arguments for loading a netCDF file.
- from_np(time, lat, lon, value)#
Load data from a numpy.ndarray.
- Parameters:
time (array-like) – the array of the time axis.
lat (array-like) – the array of the lat axis.
lon (array-like) – the array of the lon axis.
value (array-like) – the array of the values.
- geo_mean(lat_min=-90, lat_max=90, lon_min=0, lon_max=360)#
Calculate the geographical mean value of the climate field.
- Parameters:
lat_min (float) – the lower bound of latitude for the calculation.
lat_max (float) – the upper bound of latitude for the calculation.
lon_min (float) – the lower bound of longitude for the calculation.
lon_max (float) – the upper bound of longitude for the calculation.
- get_anom(ref_period=[1951, 1980])#
Get the anomaly against a reference time period.
- Parameters:
ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
- get_eof(n=1, time_period=None, verbose=False, flip=False)#
Get the EOF analysis result of the ClimateField object
- Parameters:
n (int) – perform EOF analysis and return the first n modes.
time_period (tuple or list) – the timespan over which to perfom the EOF analysis.
verbose (bool, optional) – print verbose information. Defaults to False.
flip (bool, optional) – flip the sign of the field values. Defaults to False.
- index(name)#
Calculate the predefined indices.
- Parameters:
name (str) –
the predefined index name; supports the below:
’nino3.4’
’nino1+2’
’nino3’
’nino4’
’tpi’
’wp’
’dmi’
’iobw’
- load_nc(path, vn=None, time_name='time', lat_name='lat', lon_name='lon', load=False, return_ds=False, use_cftime=True, **kwargs)#
Load the climate field from a netCDF file.
- Parameters:
path (str) – the path where to load data from.
vn (str) – the variable name to load.
time_name (str) – the name for the time axis. Defaults to ‘time’.
lat_name (str) – the name for the lat axis. Defaults to ‘lat’.
lon_name (str) – the name for the lon axis. Defaults to ‘lon’.
load (bool) – if True, the netCDF file will be loaded into the memory; if False, will take the advantage of lazy loading. Defaults to False.
return_ds (bool) – if True, will return a xarray.Dataset object instead of a ClimateField object. Defaults to False.
use_cftime (bool) – if True, use the cftime convention. Defaults to True.
- plot(**kwargs)#
Plot a climate field at a time point.
See also
cfr.visual.plot_field_map : Visualize a field on a map.
- plot_eof(n=1, eof_title=None, pc_title=None)#
Plot the EOF analysis result
- Parameters:
n (int) – plot the n-th mode.
eof_title (str) – the subplot title for the mode field.
pc_title (str) – the subplot title for the PC time series.
- plotly_grid(site_lats=None, site_lons=None, **kwargs)#
Plot the grid on an interactive map utilizing Plotly
- Parameters:
site_lats (list) – a list of the latitudes of the sites to plot
site_lons (list) – a list of the longitudes of the sites to plot
- regrid(lats, lons, periodic_lon=False)#
Regrid the climate field.
- rename(new_vn)#
Rename the variable name of the climate field.
- Parameters:
new_vn (str) – the new variable name.
- to_nc(path, verbose=True, compress_params=None)#
Convert the climate field to a netCDF file.
- Parameters:
path (str) – the path where to save
verbose (bool, optional) – print verbose information. Defaults to False.
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
- wrap_lon(mode='360')#
Convert the longitude values
- Parameters:
mode (str) – if ‘360’, convert the longitude values from the range (-180, 180) to (0, 360); if ‘180’, convert the longitude values from the range (0, 360) to (-180, 180);
PSM#
- class cfr.psm.Linear(pobj=None, climate_required=['tas'])#
A PSM that is based on univariate linear regression.
- Parameters:
pobj (cfr.proxy.ProxyRecord) – the proxy record object
climate_required (cfr.climate.ClimateField) – the required climate field object for running this PSM
- class cfr.psm.Bilinear(pobj=None, climate_required=['tas', 'pr'])#
A PSM that is based on bivariate linear regression.
- Parameters:
pobj (cfr.proxy.ProxyRecord) – the proxy record object
climate_required (cfr.climate.ClimateField) – the required climate field object for running this PSM
- class cfr.psm.Ice_d18O(pobj=None, tas_name='model.tas', pr_name='model.pr', psl_name='model.psl', d18O_name='model.d18O', climate_required=['tas', 'pr', 'psl', 'd18O'])#
The ice core d18O model adopted from PRYSM (sylvia-dee/PRYSM)
- Parameters:
pobj (cfr.proxy.ProxyRecord) – the proxy record object
climate_required (cfr.climate.ClimateField) – the required climate field object for running this PSM
- forward(alt_diff=0, nproc=None)#
The ice d18O model
It takes model simulated montly tas, pr, psl, d18O as input.
- class cfr.psm.Lake_VarveThickness(pobj=None, model_tas_name='model.tas', climate_required=['tas'])#
The varve thickness model.
It takes summer temperature as input (JJA for NH and DJF for SH).
- class cfr.psm.Coral_SrCa(pobj=None, model_tos_name='model.tos', climate_required='tos')#
The coral Sr/Ca model
- forward(b=10.553, a=None, seed=None)#
Sensor model for Coral Sr/Ca = a * tos + b
- Parameters:
tos (1-D array) – sea surface temperature in [degC]
- class cfr.psm.Coral_d18O(pobj=None, model_tos_name='model.tos', model_d18Osw_name='model.d18Osw', climate_required=['tos', 'd18Osw'], species='default')#
The PSM is based on the forward model published by [Thompson, 2011]: <Thompson, D. M., T. R. Ault, M. N. Evans, J. E. Cole, and J. Emile-Geay (2011), Comparison of observed and simulated tropical climate trends using a forward model of coral d18O, Geophys.Res.Lett., 38, L14706, doi:10.1029/2011GL048224.> Returns a numpy array that is the same size and shape as the input vectors for SST, SSS.
- class cfr.psm.VSLite(pobj=None, obs_tas_name='obs.tas', obs_pr_name='obs.pr', model_tas_name='model.tas', model_pr_name='model.pr', climate_required=['tas', 'pr'])#
The VS-Lite tree-ring width model that takes monthly tas, pr as input.
DA#
- class cfr.da.EnKF(prior, proxydb, seed=0, nens=100, recon_vars=['tas'])#
The class for ensemble Kalman filter.
- Parameters:
prior (dict) – a dictionary of
cfr.climate.ClimateField
proxydb (cfr.proxy.ProxyDatabase) – the proxy database
seed (int, optional) – random seed. Defaults to 0.
nens (int, optional) – the ensemble size. Defaults to 100.
recon_vars (list, optional) – the list of variables to reconstruct. Defaults to [‘tas’].
ReconJob#
- class cfr.reconjob.ReconJob(configs=None, verbose=False)#
The class for a reconstruction Job.
- annualize_clim(tag, verbose=False, months=None)#
Annualize the grided climate data, either model simulations or instrumental observations.
- Parameters:
tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
months (list) – the list of months for annualization.
verbose (bool, optional) – print verbose information. Defaults to False.
- annualize_proxydb(months=None, ptypes=None, inplace=True, verbose=False, **kwargs)#
Annualize the proxy database.
- Parameters:
months (list) – the list of months for annualization.
ptypes (list) – the list of proxy types.
inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.
- calib_psms(ptype_psm_dict=None, ptype_season_dict=None, ptype_clim_dict=None, calib_period=None, use_predefined_R=False, verbose=False, **kwargs)#
Calibrate the PSMs.
- Parameters:
ptype_psm_dict (dict) – the dictionary to denote the PSM for each proxy type; ‘Linear’ for all by default.
ptype_season_dict (dict) – the dictionary to denote the seasonality for each proxy type; calendar annual for all by default.
ptype_clim_dict (dict) – the dictionary to denote the required climate variables for each proxy type; [‘tas’] for all by default.
calib_period (tuple or list) – the time period for calibration.
use_predefined_R (bool) – use the predefined observation error covariance instead of by calibration.
verbose (bool, optional) – print verbose information. Defaults to False.
- center_proxydb(ref_period=None, inplace=True, verbose=False)#
Center the proxy timeseries against a reference time period.
- Parameters:
ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.
- clear_proxydb_tags(verbose=False)#
Clear the tags for each proxy record in the proxy database.
- Parameters:
verbose (bool, optional) – print verbose information. Defaults to False.
- copy()#
Make a deep copy of the object itself.
- crop_clim(tag, lat_min=None, lat_max=None, lon_min=None, lon_max=None, verbose=False)#
Crop the grided climate data, either model simulations or instrumental observations.
- Parameters:
tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
lat_min (float) – the minimum latitude of the cropped grid.
lat_max (float) – the maximum latitude of the cropped grid.
lon_min (float) – the minimum longitude of the cropped grid.
lon_max (float) – the maximum longitude of the cropped grid.
verbose (bool, optional) – print verbose information. Defaults to False.
- erase_cfg(keys, verbose=False)#
Erase configuration items from self.configs.
- Parameters:
keys (list) – a list of configuration item strings.
verbose (bool, optional) – print verbose information. Defaults to False.
- filter_proxydb(*args, inplace=True, verbose=False, **kwargs)#
Filter the proxy database.
- Parameters:
inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.
See
cfr.proxy.ProxyDatabase.filter()
for more information.
- forward_psms(verbose=False, ptype_forward_dict=None)#
Forward the PSMs.
- Parameters:
verbose (bool, optional) – print verbose information. Defaults to False.
- graphem_kcv(cv_time, ctrl_params, graph_type='neighborhood', stat='MSE', n_splits=5)#
k-fold cross-validation
- Parameters:
cv_time (array-like, 1d) – cross validation time points
ctrl_params (array-like, 1d) – array of control parameters to try
graph_type (str) – type of graph. Either “neighborhood” or “glasso”
stat (str) – name of objective function. Choices are “MSE”, “RE”, “CE” or “R2”.
n_splits (int) – number of splits (default = 5)
- io_cfg(k, v, default=None, verbose=False)#
Add-to or read-from configurations.
- Parameters:
k (str) – the name of a configuration item
v (object) – any value of the configuration item
default (object) – the default value of the configuration item
verbose (bool, optional) – print verbose information. Defaults to False.
- load(save_dirpath=None, filename='job.pkl', verbose=False)#
Load a ReconJob object from a pickle file.
- Parameters:
save_dirpath (str) – the directory path for saving the
cfr.ReconJob
object.filename (str) – the filename of the to-be-saved
cfr.ReconJob
object.verbose (bool, optional) – print verbose information. Defaults to False.
- load_clim(tag, path_dict=None, rename_dict=None, anom_period=None, time_name=None, load=False, lat_name=None, lon_name=None, verbose=False)#
Load grided climate data, either model simulations or instrumental observations.
- Parameters:
tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
path_dict (dict) – the dictionary of paths of climate data files with keys to be the variable names, e.g., ‘tas’ and ‘pr’, etc.
rename_dict (dict) – the dictionary for renaming the variable names in the climate data files.
anom_period (tuple or list) – the time period for computing the anomaly.
time_name (str) – the name of the time dimension in the climate data files.
load (bool) – if True, the data will be loaded into the memory instead of lazy-loading.
lon_name (str) – the name of the longitude dimension in the climate data files.
verbose (bool, optional) – print verbose information. Defaults to False.
- load_proxydb(path=None, verbose=False, **kwargs)#
Load the proxy database from a pandas.DataFrame.
- Parameters:
path (str, optional) – the path to the pickle file of the pandas.DataFrame. Defaults to None.
verbose (bool, optional) – print verbose information. Defaults to False.
- mark_pids(verbose=False)#
Mark proxy IDs to self.configs.
- Parameters:
verbose (bool, optional) – print verbose information. Defaults to False.
- prep_da_cfg(cfg_path, seeds=None, save_job=False, verbose=False)#
Prepare the configuration items.
- Parameters:
cfg_path (str) – the path of the configuration YAML file.
seeds (list, optional) – the list of random seeds.
save_job (bool, optional) – if True, export the job object to a file.
verbose (bool, optional) – print verbose information. Defaults to False.
- prep_graphem(recon_time=None, calib_time=None, recon_period=None, recon_timescale=None, calib_period=None, uniform_pdb=None, verbose=False)#
A shortcut of the steps for GraphEM data preparation
- Parameters:
recon_time (array list, optional) – the time points to reconstruct
calib_time (array list, optional) – the time points for calibration
recon_period (list or tuple, optional) – the reconstruction timespan. Effective when recon_time or calib_time is None. Defaults to (1001, 2000).
recon_timescale (float, optional) – the reconstruction timescale. Effective when recon_time or calib_time is None. Defaults to 1 (annual).
calib_period (list or tuple, optional) – the calibration timespan. Defaults to (1850, 2000).
unitform_pdb (bool, optional) – if True, filter the proxy database to make it more uniform in length. Defaults to True.
verbose (bool, optional) – print verbose information. Defaults to False.
- regrid_clim(tag, verbose=False, lats=None, lons=None, nlat=None, nlon=None, periodic_lon=True)#
Regrid the grided climate data, either model simulations or instrumental observations.
- Parameters:
tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
lats (list or numpy.array) – the latitudes of the regridded grid.
lons (list or numpy.array) – the longitudes of the regridded grid.
nlat (int) – the number of latitudes of the regridded grid; effective when lats = None.
nlon (int) – the number of longitudes of the regridded grid; effective when lons = None..
periodic_lon (bool) – if True, then assume the original longitudes form a loop.
- run_da(recon_period=None, recon_loc_rad=None, recon_timescale=None, recon_sampling_mode=None, recon_sampling_dist=None, recon_vars=None, normal_sampling_sigma=None, normal_sampling_cutoff_factor=None, trim_prior=None, nens=None, seed=0, verbose=False, debug=False, allownan=None)#
Run the data assimilation workflows.
- Parameters:
recon_period (tuple or list) – the time period for reconstruction.
recon_loc_rad (float) – the localization radius; unit: km.
recon_timescale (int or float) – the timescale for reconstruction.
recon_sampling_mode (str) – ‘fixed’ or ‘rolling’ window for prior sampling.
recon_sampling_dist (str) – ‘normal’ or ‘uniform’ distribution for prior sampling.
recon_vars (list) – the list of variables to reconstruct. Defaults to [‘tas’].
normal_sampling_sigma (str) – the standard deviation of the normal distribution for prior sampling.
normal_sampling_cutoff_factor (int) – the cutoff factor for the window for prior sampling.
allownan (bool) – if True, NaNs in prior is allowed.
nens (int) – the ensemble size.
seed (int) – the random seed.
verbose (bool, optional) – print verbose information. Defaults to False.
debug (bool) – if True, the debug mode is turned on and more information will be printed out.
- run_da_cfg(cfg_path, load_precalculated=False, seeds=None, run_mc=True, verbose=False)#
Running DA according to a configuration YAML file.
- Parameters:
cfg_path (str) – the path of the configuration YAML file.
load_precalculated (bool, optional) – load the precalculated job object. Defaults to False.
run_mc (bool) – if False, the reconstruction part will not executed for the convenience of checking the preparation part.
seeds (list, optional) – the list of random seeds.
verbose (bool, optional) – print verbose information. Defaults to False.
- run_da_mc(recon_period=None, recon_loc_rad=None, recon_timescale=None, nens=None, output_full_ens=None, recon_sampling_mode=None, recon_sampling_dist=None, recon_vars=None, normal_sampling_sigma=None, normal_sampling_cutoff_factor=None, trim_prior=None, recon_seeds=None, assim_frac=None, save_dirpath=None, compress_params=None, allownan=None, output_indices=None, verbose=False)#
Run the Monte-Carlo iterations of data assimilation workflows.
- Parameters:
recon_period (tuple or list) – the time period for reconstruction.
recon_loc_rad (float) – the localization radius; unit: km.
recon_timescale (int or float) – the timescale for reconstruction.
recon_sampling_mode (str) – ‘fixed’ or ‘rolling’ window for prior sampling.
recon_sampling_dist (str) – ‘normal’ or ‘uniform’ distribution for prior sampling.
recon_vars (list) – the list of variables to reconstruct. Defaults to [‘tas’].
normal_sampling_sigma (str) – the standard deviation of the normal distribution for prior sampling.
normal_sampling_cutoff_factor (int) – the cutoff factor for the window for prior sampling.
output_full_ens (bool) – if True, the full ensemble fields will be stored to netCDF files.
nens (int) – the ensemble size.
recon_seed (int) – the random seeds.
allownan (bool) – if True, NaNs in prior is allowed.
assim_frac (float, optional) – the fraction of proxies for assimilation. Defaults to None.
verbose (bool, optional) – print verbose information. Defaults to False.
save_dirpath (str) – the directory path for saving the reconstruction results.
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
output_indices (list) –
the list of indices to output; supported indices:
’nino3.4’
’nino1+2’
’nino3’
’nino4’
’tpi’
’wp’
’dmi’
’iobw’
- run_graphem(save_recon=True, save_dirpath=None, save_filename=None, load_precalc_solver=False, solver_save_path=None, compress_params=None, verbose=False, output_indices=None, **fit_kws)#
Run the GraphEM solver, essentially the GraphEM.solver.GraphEM.fit method
Note that the arguments for GraphEM.solver.GraphEM.fit can be appended in the argument list of this function directly. For instance, to pass a pre-calculated graph, use estimate_graph=False and graph=g.adj, where g is the Graph object.
- Parameters:
save_dirpath (str) – the path to save the related results
save_filename (str) – the filename to save the reconstruction file. Defaults to “job_r01_recon.nc”.
solver_save_path (str) – the path to save the solver object.
load_precalculated (bool, optional) – load the precalculated Graph object. Defaults to False.
verbose (bool, optional) – print verbose information. Defaults to False.
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
output_indices (list) –
the list of indices to output; supported indices:
’nino3.4’
’nino1+2’
’nino3’
’nino4’
’tpi’
’wp’
’dmi’
’iobw’
fit_kws (dict) – the arguments for :py:meth: GraphEM.solver.GraphEM.fit The most important one is “graph_method”; available options include “neighborhood”, “glasso”, and “hybrid”, where “hybrid” means run “neighborhood” first with default cutoff_radius=1500 to infill the data matrix and then ran “glasso” with default sp_FF=3, sp_FP=3, sp_PP=3 to improve the result further.
See also
cfr.graphem.solver.GraphEM.fit : fitting the GraphEM method
- run_graphem_cfg(cfg_path, verbose=False)#
Running GraphEM according to a configuration YAML file.
- Parameters:
cfg_path (str) – the path of the configuration YAML file.
verbose (bool, optional) – print verbose information. Defaults to False.
- save(save_dirpath=None, filename='job.pkl', verbose=False)#
Save the ReconJob object to a pickle file.
- Parameters:
save_dirpath (str) – the directory path for saving the
cfr.ReconJob
object.filename (str) – the filename of the to-be-saved
cfr.ReconJob
object.verbose (bool, optional) – print verbose information. Defaults to False.
- save_cfg(save_dirpath=None, verbose=False)#
Save self.configs to a directory.
- Parameters:
save_dirpath (str) – the directory path for saving self.configs. The filename will be configs.yml.
verbose (bool, optional) – print verbose information. Defaults to False.
- save_recon(save_path, compress_params=None, verbose=False, output_full_ens=False, mark_assim_pids=False, output_indices=None, grid='prior')#
Save the reconstruction results.
- Parameters:
tag (str) – ‘da’ or ‘graphem’
save_path (str) – the path for saving the reconstruciton results.
verbose (bool, optional) – print verbose information. Defaults to False.
output_full_ens (bool) – if True, the full ensemble fields will be stored to netCDF files.
output_indices (list) –
the list of indices to output; supported indices:
’nino3.4’
’nino1+2’
’nino3’
’nino4’
’tpi’
’wp’
’dmi’
’iobw’
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
verbose – print verbose information. Defaults to False.
- slice_proxydb(timespan=None, inplace=True, verbose=False)#
Slice the proxy database over a timespan.
- Parameters:
timespan (list or tuple) – the timespan over which to slice the proxy database.
inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.
- split_proxydb(tag='calibrated', assim_frac=None, seed=0, verbose=False)#
Split the proxy database.
- Parameters:
tag (str, optional) – the tag for filtering the proxy database. Defaults to ‘calibrated’.
assim_frac (float, optional) – the fraction of proxies for assimilation. Defaults to None.
seed (int, optional) – random seed. Defaults to 0.
verbose (bool, optional) – print verbose information. Defaults to False.
- write_cfg(k, v, verbose=False)#
Right a configurations item to self.configs.
- Parameters:
k (str) – the name of a configuration item
v (object) – any value of the configuration item
verbose (bool, optional) – print verbose information. Defaults to False.
ReconRes#
- class cfr.reconres.ReconRes(dirpath, load_num=None, verbose=False)#
The class for reconstruction results
- load(vn_list, verbose=False)#
Load reconstruction results.
- Parameters:
vn_list (list) –
list of variable names; supported names, taking ‘tas’ as an example:
ensemble fields: ‘tas’
ensemble timeseries: ‘tas_gm’, ‘tas_nhm’, ‘tas_shm’
verbose (bool, optional) – print verbose information. Defaults to False.
- plot_valid(recon_name_dict=None, target_name_dict=None, valid_ts_kws=None, valid_fd_kws=None)#
Plot the validation result
- Parameters:
recon_name_dict (dict) – the dictionary for variable names in the reconstruction. For example, {‘tas’: ‘LMR/tas’, ‘nino3.4’: ‘NINO3.4 [K]’}.
target_name_dict (dict) – the dictionary for variable names in the validation target. For example, {‘tas’: ‘20CRv3’, ‘nino3.4’: ‘BC09’}.
valid_ts_kws (dict) – the dictionary of keyword arguments for validating the timeseries.
valid_fd_kws (dict) – the dictionary of keyword arguments for validating the field.
- valid(target_dict, stat=['corr'], timespan=None, verbose=False)#
Validate against a target dictionary
- Parameters:
target_dict (dict) – a dictionary of multiple variables for validation.
stat (list of str) –
the statistics to calculate. Supported quantaties:
’corr’: correlation coefficient
’R2’: coefficient of determination
’CE’: coefficient of efficiency
timespan (list or tuple) – the timespan over which to perform the validation.
verbose (bool, optional) – print verbose information. Defaults to False.
EnsTS#
- class cfr.ts.EnsTS(time=None, value=None, value_name=None)#
The class for ensemble timeseries
The ensembles variable should be in shape of (nt, nEns), where nt is the number of years, and nEns is the number of ensemble members.
- Parameters:
time (numpy.array) – the time axis of the series
value (numpy.array) – the value axis of the series
value_name (str) – the name of value axis; will be used as ylabel in plots
- nt#
the size of the time axis
- Type:
int
- nEns#
the size of the ensemble
- Type:
int
- median#
the median of the ensemble timeseries
- Type:
numpy.array
- compare(ref=None, ref_time=None, ref_value=None, ref_name='reference', stats=['corr', 'CE'], timespan=None)#
Compare against a reference timeseries.
- Parameters:
ref (cfr.ts.EnsTS) – the reference time series object
ref_time (numpy.array) – the time axis of the reference timeseries
ref_value (numpy.array) – the value axis of the reference timeseries
stats (list, optional) – the list of validation statistics to calculate. Defaults to [‘corr’, ‘CE’].
timespan (tuple, optional) – the time period for validation. Defaults to None.
- copy()#
Make a deepcopy of the object.
- fetch(name=None, **from_df_kws)#
Fetch a proxy database from cloud
- Parameters:
name (str) – a predifined database name or an URL starting with “http”
- from_df(df, time_column='time', value_columns=None)#
Load data from a pandas.DataFrame
- Parameters:
df (pandas.DataFrame) – The pandas.DataFrame object.
time_column (str) – The label of the column for the time axis.
value_columns (list of str) – The list of the labels for the value axis of the ensemble members.
- line_density(figsize=[12, 4], cmap='plasma', color_scale='linear', bins=None, num_fine=None, xlabel='Year (CE)', ylabel=None, title=None, ylim=None, xlim=None, title_kws=None, ax=None, **pcolormesh_kwargs)#
Plot the timeseries 2-D histogram
- Parameters:
cmap (str) – The colormap for the histogram.
color_scale (str) – The scale of the colorbar; should be either ‘linear’ or ‘log’.
bins (list or tuple) – The number of bins for each axis: nx, ny = bins.
- plot(figsize=[12, 4], color='indianred', xlabel='Year (CE)', ylabel=None, title=None, ylim=None, xlim=None, lgd_kws=None, title_kws=None, plot_valid=True, ax=None, **plot_kws)#
Plot the raw values (multiple series).
- Parameters:
plot_valid (bool, optional) – If True, will plot the validation target series if existed. Defaults to True.
- plot_qs(figsize=[12, 4], qs=[0.025, 0.25, 0.5, 0.75, 0.975], color='indianred', xlabel='Year (CE)', ylabel=None, title=None, ylim=None, xlim=None, alphas=[0.5, 0.1], lgd_kws=None, title_kws=None, ax=None, plot_valid=True, **plot_kws)#
Plot the quantiles
- Parameters:
figsize (list, optional) – The size of the figure. Defaults to [12, 4].
qs (list, optional) – The list to denote the quantiles plotted. Defaults to [0.025, 0.25, 0.5, 0.75, 0.975].
color (str, optional) – The basic color for the quantile envelopes. Defaults to ‘indianred’.
xlabel (str, optional) – The label for the x-axis. Defaults to ‘Year (CE)’.
ylabel (str, optional) – The label for the y-axis. Defaults to None.
title (str, optional) – The title of the figure. Defaults to None.
ylim (tuple or list, optional) – The limit of the y-axis. Defaults to None.
xlim (tuple or list, optional) – The limit of the x-axis. Defaults to None.
alphas (list, optional) – The alphas for the quantile envelopes. Defaults to [0.5, 0.1].
lgd_kws (dict, optional) – The keyward arguments for the ax.legend() function. Defaults to None.
title_kws (dict, optional) – The keyward arguments for the ax.title() function. Defaults to None.
ax (matplotlib.axes, optional) – The matplotlib.axes object. If set the image will be plotted in the existing ax. Defaults to None.
plot_valid (bool, optional) – If True, will plot the validation target series if existed. Defaults to True.
**kwargs (dict, optional) – The keyward arguments for the ax.plot() function. Defaults to None.
- to_df(time_column=None, value_column='ens')#
Convert an EnsTS to a pandas.DataFrame
- Parameters:
time_column (str) – The label of the column for the time axis.
value_column (str) – The base column label for the ensemble members. By default, the columns for the members will be labeled as “ens.0”, “ens.1”, “ens.2”, etc.