API Reference

API Reference#

Proxy#

class cfr.proxy.ProxyRecord(pid=None, time=None, value=None, lat=None, lon=None, elev=None, ptype=None, climate=None, tags=None, value_name=None, value_unit=None, time_name=None, time_unit=None, seasonality=None)#

The class for a proxy record.

annualize(months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], force=False, verbose=False)#

Annualize/seasonalize the proxy record based on a list of months.

Parameters:

months (list) – the months based on which for annualization; e.g., [6, 7, 8] means JJA annualization
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.

center(ref_period=None, thresh=5, force=False, verbose=False)#

Centering the proxy timeseries regarding a reference period.

Parameters:

ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
thresh (int) – the minimum number of data points required to perform the centering. If not satisfied, and force=False, the record will not be centered.
force (bool) – if True, the record will be centered regardless of the number of data points. Defaults to False.
verbose (bool, optional) – print verbose information. Defaults to False.

Returns:

contains the centered values.

Return type:

new (ProxyRecord)

concat(rec_list)#

Concatenate the record with a list of other records assuming they share the same location and other metadata.

Parameters:: rec_list (list or ProxyRecord) – a list of ProxyRecord objects.

copy()#: Make a deepcopy of the object.

correct_elev_tas(t_rate=-9.8, verbose=False)#

Correct the tas with t_rate = -9.8 degC/km upward by default.

Parameters:

t_rage (float) – the temperature adjustment rate based on elevation bias.
verbose (bool, optional) – print verbose information. Defaults to False.

dashboard(figsize=[10, 8], ms=200, stock_img=True, edge_clr='w', self_lb='real', psd_lb='PSD', pseudo_lb='pseudo', wspace=0.1, hspace=0.3, spec_method='wwz', spec_settings=None, pseudo_clr=None, **kwargs)#

Plot a dashboard of the proxy/pseudoproxy.

Parameters:

figsize (list or tuple) – the figure size.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
spec_method (str) – the spectral analysis method to apply.
spec_settings (dict) – the dictionary of the keyword arguments for the specified spectral analysis method.
pseudo_clr (str) – the color for the pseudoproxy.
pseudo_lb (str) – the label for the pseudoproxy.
self_lb (str) – the label for the self ProxyRecord.

dashboard_clim(clim_units=None, clim_colors=None, figsize=[14, 8], scaled_pr=False, ms=200, stock_img=True, edge_clr='w', wspace=0.3, hspace=0.5, spec_method='wwz', **kwargs)#

Plot a dashboard of the proxy/pseudoproxy along with the climate signal.

Parameters:

figsize (list or tuple) – the figure size.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
spec_method (str) – the spectral analysis method to apply.
pseudo_clr (str) – the color for the pseudoproxy.
clim_units (dict, optional) – the dictionary of units for climate signals. Defaults to None.
clim_colors (dict, optional) – the dictionary of colors for climate signals. Defaults to None.
scaled_pr (bool) – scale the precipitation values.

del_clim(verbose=False)#

Delete the “clim” attribute of the ProxyRecord object.

Parameters:: verbose (bool, optional) – print verbose information. Defaults to False.

del_pseudo(verbose=False)#

Delete the pseudo attribute of the ProxyRecord object.

Parameters:: verbose (bool, optional) – print verbose information. Defaults to False.

from_da(da)#

Get the time and value axis from the given xarray.DataArray

Parameters:: da (xarray.DataArray) – the xarray.DataArray object to load from.

get_clim(fields, tag=None, verbose=False, search_dist=5, load=True, **kwargs)#

Get the nearest climate from climate fields

Parameters:

fields (list of cfr.climate.ClimateField) – the climate fields
tag (str) – the tag to put on the obtained climate field, which will be named in the format of “tag.variable_name”.
search_dist (float) – the farest distance to search for climate data in degree
verbose (bool, optional) – print verbose information. Defaults to False.
load (bool) – if True, the list of climate fields will be loaded into the memory instead of lazy loading.

get_pseudo(psm=None, signal=None, calibrate=True, add_noise=False, noise='white', SNR=10, seed=None, match_mean=False, match_var=False, verbose=False, calib_kws=None, forward_kws=None, colored_noise_kws=None)#

Generate the pseudoproxy

Parameters:

psm (object) – the PSM objects in cfr.psm
signal (cfr.ProxyRecord) – the signal part for the pseudoproxy. If not provided, it will be generated using the specified psm; if provided, the PSM part will be skipped.
calibrate (bool) – if True and the PSM supports calibration, then the PSM will be calibrated.
add_noise (bool) – if True, noise will be added onto the signal.
noise (str) – noise type; supports “white” for white noise and “colored” for colored noise.
colored_noise_kws (dict) – the dictionary of the keyword arguments for colored noise generation.
match_mean (bool) – match the mean of the pseudoproxy to the real record.
match_var (bool) – match the variance of the pseudoproxy to the real record.
verbose (bool, optional) – print verbose information. Defaults to False.
calib_kws (dict) – the dictionary of the keyword arguments for the calibration step of the PMSs.
forward_kws (dict) – the dictionary of the keyword arguments for the forward step of the PMSs.

load_nc(path, **kwargs)#

Load the record from a netCDF file.

Parameters:: path (str) – the path to save the file.

plot(figsize=[12, 4], legend=False, ms=200, stock_img=True, edge_clr='w', wspace=0.1, hspace=0.1, plot_map=True, p=<class 'cfr.visual.STYLE'>, **kwargs)#

Visualize the ProxyRecord

Parameters:

figsize (list or tuple) – the figure size.
legend (bool) – if True, plot the legend.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
plot_map (bool) – if True, plot the record on a map. Defaults to True.

plot_compare(ref, label=None, title=None, ref_label=None, ref_color=None, ref_zorder=2, figsize=[12, 4], legend=False, ms=200, stock_img=True, edge_clr='w', wspace=0.1, hspace=0.1, plot_map=True, lgd_kws=None, **kwargs)#

Plot against another reference record.

Parameters:

ref (cfr.proxy.ProxyRecord) – the reference record.
label (str) – the label of the self record.
ref_label (str) – the label of the reference record.
ref_color (str) – the color to visualize the reference record.
ref_zorder (int) – the z-axis ordering of the reference record.
title (str) – the title of the figure.
figsize (list or tuple) – the figure size.
legend (bool) – if True, plot the legend.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
plot_map (bool) – if True, plot the record on a map.
lgd_kws (diction) – the dictionary of keyword arguments for the legend.

plot_dups(figsize=[12, 4], legend=False, ms=200, stock_img=True, edge_clr='w', wspace=0.1, hspace=0.1, plot_map=True, lgd_kws=None, **kwargs)#

Plot the against other duplicated records

Parameters:

figsize (list or tuple) – the figure size.
legend (bool) – if True, plot the legend.
ms (int) – marker size.
stock_img (bool) – if True, use the stock image background of Cartopy. Defaults to True.
edge_clr (str) – the edge color of the record on the map.
wspace (float) – the width spacing between the subplots.
hspace (float) – the height spacing between the subplots.
plot_map (bool) – if True, plot the record on a map.
lgd_kws (diction) – the dictionary of keyword arguments for the legend.

plotly(**kwargs)#: Visualize the ProxyRecord with plotly

slice(timespan)#

Slicing the timeseries with a timespan (tuple or list)

Parameters:: timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].
Returns:: The sliced Series object.
Return type:: ProxyRecord

standardize(ref_period=None, thresh=5, force=False, verbose=False)#

Standardizes the record. If the record is constant, a vector of 0s is returned.

Parameters:: ref_period (list, optional) – [min_time, max_time]. The default is None.
Returns:: contains standardized values.
Return type:: new (ProxyRecord)

to_da()#: Convert to Xarray.DataArray for computation purposes

to_nc(path, verbose=True, **kwargs)#

Convert the record to a netCDF file.

Parameters:

path (str) – the path to save the file.
verbose (bool, optional) – print verbose information. Defaults to False.

class cfr.proxy.ProxyDatabase(records=None, source=None)#

The class for a proxy database.

annualize(months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], force=False, verbose=False)#

Annualize the records in the proxy database.

Parameters:

months (list) – the months based on which for annualization; e.g., [6, 7, 8] means JJA annualization
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.

center(ref_period, force=False, thresh=5, verbose=False)#

Center the proxy timeseries against a reference time period.

Parameters:

ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.
thresh (int) – the minimum number of data points to perform the processing.

Returns: new (cfr.ProxyDatabase)

copy()#: Make a deepcopy of the object.

correct_elev_tas(t_rate=-9.8, verbose=False)#

Correct the tas with t_rate = -9.8 degC/km upward by default.

Parameters:

t_rage (float) – the temperature adjustment rate based on elevation bias.
verbose (bool, optional) – print verbose information. Defaults to False.

count_availability(year=array([0, 1, 2, ..., 1998, 1999, 2000], shape=(2001,)))#

Count the proxy availability in time

Parameters:: year (array-like) – count the proxy availability based on the given years.

del_clim(verbose=False)#

Delete the nearest climate data for the records in the proxy database.

Parameters:: verbose (bool, optional) – print verbose information. Defaults to False.

fetch(name=None, **kwargs)#

Fetch a proxy database from cloud

Parameters:: name (str) – a predifined database name or an URL starting with “http”

filter(by, keys, mode='fuzzy')#

Filter the proxy database according to given ptype list.

Parameters:

by (str) – filter by a keyword {‘ptype’, ‘pid’, ‘dt’, ‘lat’, ‘lon’, ‘loc’, ‘tag’}
keys (set) –
a set of keywords
- For by = ‘ptype’ or ‘pid’, keys take a fuzzy match
- For by = ‘dt’ or ‘lat’ or ‘lon’, keys = (min, max)
- For by = ‘loc-squre’, keys = (lat_min, lat_max, lon_min, lon_max)
- For by = ‘loc-circle’, keys = (center_lat, center_lon, distance)
- For by = ‘tag’, keys should be a list of tags
mode (str) – ‘fuzzy’ or ‘exact’ search when by = ‘ptype’ or ‘pid’

find_duplicates(r_thresh=0.9, time_period=[0, 2000])#

Find duplicated proxy records based on a correlation threshold.

Parameters:

r_thresh (float) – the correlation threshold to determine if two records are duplicated. Defaults to 0.9.
time_period (tuple or list) – the timespan over which to compare two records. Defaults to [0, 2000].

from_df(df, pid_column='paleoData_pages2kID', lat_column='geo_meanLat', lon_column='geo_meanLon', elev_column='geo_meanElev', time_column='year', value_column='paleoData_values', proxy_type_column='paleoData_proxy', archive_type_column='archiveType', ptype_column='ptype', value_name_column='paleoData_variableName', value_unit_column='paleoData_units', R_column='R', climate_column='climateInterpretation_variable', verbose=False)#

Load database from a pandas.DataFrame. Note that in most cases, the column names have to be specified.

Parameters:

df (pandas.DataFrame) – a Pandas DataFrame include at least lat, lon, time, value, proxy_type
pid_column (str) – the column name for proxy ID.
lat_column (str) – the column name for latitude.
lon_column (str) – the column name for longitude.
elev_column (str) – the column name for elevation.
time_column (str) – the column name for time axis.
value_column (str) – the column name for value axis.
proxy_type_column (str) – the column name for proxy type information.
archive_type_column (str) – the column name for archive type information.
ptype_column (str) – the column name for proxy type information in format “archive.proxy”.
value_name_column (str) – the column name for proxy variable name.
value_unit_column (str) – the column name for proxy variable unit.
verbose (bool, optional) – print verbose information. Defaults to False.

from_ds(ds)#

Load the proxy database from a xarray.Dataset

Parameters:: ds (xarray.Dataset) – the xarray.Dataset to load from

get_clim(field, tag=None, verbose=False, load=True, **kwargs)#

Get the nearest climate data for the records in the proxy database.

Parameters:

fields (list of cfr.climate.ClimateField) – the climate fields
tag (str) – the tag to put on the obtained climate field, which will be named in the format of “tag.variable_name”.
verbose (bool, optional) – print verbose information. Defaults to False.
load (bool) – if True, the list of climate fields will be loaded into the memory instead of lazy loading.

load_multi_nc(dirpath, nproc=None)#

Load from multiple netCDF files.

Parameters:

dirpath (str) – the directory path of the multiple .nc files
nproc (int) – the number of processors for loading, the default is by multiprocessing.cpu_count()

load_nc(path, use_cftime=True, **kwargs)#

Load the database from a netCDF file.

Parameters:

path (str) – the path to save the file.
use_cftime (bool) – if True, use the cftime convention. Defaults to True.

make_composite(obs=None, obs_nc_path=None, vn='tas', lat_name=None, lon_name=None, bin_width=10, n_bootstraps=1000, qs=(0.025, 0.975), stat_func=<function nanmean>, anom_period=[1951, 1980])#

Make composites of the records in the proxy database.

Parameters:

obs (cfr.climate.ClimateField) – the observation field as a reference for scaling the proxy values.
obs_nc_path (str) – the path of the netCDF file of the reference observation.
vn (str) – the variable name of the referenced observation.
lat_name (str) – the name of the latitude dimension in the referenced observation.
lon_name (str) – the name of the longitude dimension in the referenced observation.
bin_width (int) – the width for binning.
n_bootstraps (int) – the number of bootstraps for uncertainty quantification.
qs (list or tuple) – the quantiles to plot.
stat_func (function) – the function to apply for the calculation of the binned value.
anom_period (list or tuple) – the time period over which to calculate the anomaly.

nrec_tags(keys)#

Check the number of tagged records.

Parameters:: keys (list) – list of tag strings

plot(**kws)#: Visualize the proxy database. See cfr.visual.plot_proxies() for more information.

plot_composite(figsize=[10, 4], clr_proxy=None, clr_count='tab:gray', clr_obs='tab:red', left_ylim=[-2, 2], right_ylim=None, ylim_num=5, xlim=[0, 2000], base_n=60, ax=None)#

Plot the composites of the records in the proxy database.

Parameters:

figsize (list or tuple) – the figure size.
clr_proxy (str) – the color to visualize the proxy composite curve.
clr_count (str) – the color to visualize the record count.
clr_obs (str) – the color to visualize the referenced observation.
left_ylim (list) – the limit for the left y-axis.
right_ylim (list) – the limit for the right y-axis.
ylim_num (int) – the number of ticks for the left y-axis
xlim (list) – the limit for the x-axis.
base_n (int) – the number to determine the upper bound for the record count.
ax (object, optional) – matplotlib.axes. Defaults to None.

plotly(**kwargs)#: Plot the database on an interactive map utilizing Plotly

plotly_concise(**kwargs)#: Plot the database on an interactive map utilizing Plotly

plotly_count(**kwargs)#: Plot the database number-counting on an interactive map utilizing Plotly

refresh()#: Refresh a bunch of attributes.

slice(timespan)#

Slice the records in the proxy database.

Parameters:: timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].

squeeze_dups(pids_to_keep=None)#

Remove the duplicated records and keep only one.

Parameters:: pids_to_keep (list) – a list of proxy IDs forced to keep.

standardize(ref_period, force=False, thresh=5, verbose=False)#

Standardize elements of a proxy database against a reference time period.: Elements that have no values over the reference period are dropped

Parameters:

ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
force (bool) – if True, perform a calendar year annualization if the given months cannot be applied to the data due to missing months in the data. Defaults to False.
thresh (int) – the minimum number of data points to perform the processing.

Returns: new (cfr.ProxyDatabase)

to_df()#: Convert the proxy database to a pandas.DataFrame.

to_ds(annualize=False, months=None, verbose=True)#

Convert the proxy database to a xarray.Dataset

Parameters:

annualize (bool) – annualize the proxy records with months
months (list) – months for annulization
verbose (bool, optional) – print verbose information. Defaults to False.

to_multi_nc(dirpath, verbose=True, compress_params={'zlib': True})#

Convert the proxy database to multiple netCDF files. One for each record.

Parameters:

dirpath (str) – the directory path of the multiple .nc files
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
verbose (bool, optional) – print verbose information. Defaults to False.

to_nc(path, annualize=False, months=None, verbose=True, compress_params={'zlib': True})#

Convert the database to a netCDF file.

Parameters:

path (str) – the path to save the file.
annualize (bool) – annualize the proxy records with months
months (list) – months for annulization
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
verbose (bool, optional) – print verbose information. Defaults to False.

Climate#

class cfr.climate.ClimateField(da=None)#

The class for the gridded climate field data.

annualize(months=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])#

Annualize/seasonalize the climate field based on a list of months.

Parameters:: months (list) – the months based on which for annualization; e.g., [6, 7, 8] means JJA annualization

center(ref_period=[1951, 1980])#

Center the climate field against a reference time period.

Parameters:

ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
time_name (str) – name of the time dimention

compare(ref, timespan=None, stat='corr', interp_target='ref', interp=True)#

Compare against a reference field.

Parameters:

ref (cfr.climate.ClimateField) – the reference to compare against, assuming the first dimension to be time
timespan (tuple or list) – the timespan over which to compare two ClimateField objects.
interp_target (str, optional) –
the direction to interpolate the fields:
- ’ref’: interpolate from self to ref
- ’self’: interpolate from ref to self
stat (str) –
the statistics to calculate. Supported quantaties:
- ’corr’: correlation coefficient
- ’R2’: coefficient of determination
- ’CE’: coefficient of efficiency

copy()#: Make a deepcopy of the object.

crop(lat_min=-90, lat_max=90, lon_min=0, lon_max=360)#

Crop the climate field based on the range of latitude and longitude.

Note that in cases when the crop range is crossing the 0 degree of longitude, lon_min should be less than 0.

Parameters:

lat_min (float) – the lower bound of latitude to crop.
lat_max (float) – the upper bound of latitude to crop.
lon_min (float) – the lower bound of longitude to crop.
lon_max (float) – the upper bound of longitude to crop.

fetch(name=None, **load_nc_kws)#

Fetch a gridded climate field from cloud

Parameters:

name (str) – a predefined name, or an URL starting with “http”, or a local file path. If not set, the method will return hints of available predefined names.
load_nc_kws (dict) – the dictionary of keyword arguments for loading a netCDF file.

from_np(time, lat, lon, value)#

Load data from a numpy.ndarray.

Parameters:

time (array-like) – the array of the time axis.
lat (array-like) – the array of the lat axis.
lon (array-like) – the array of the lon axis.
value (array-like) – the array of the values.

geo_mean(lat_min=-90, lat_max=90, lon_min=0, lon_max=360)#

Calculate the geographical mean value of the climate field.

Parameters:

lat_min (float) – the lower bound of latitude for the calculation.
lat_max (float) – the upper bound of latitude for the calculation.
lon_min (float) – the lower bound of longitude for the calculation.
lon_max (float) – the upper bound of longitude for the calculation.

get_anom(ref_period=[1951, 1980])#

Get the anomaly against a reference time period.

Parameters:: ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)

get_eof(n=1, time_period=None, verbose=False, flip=False)#

Get the EOF analysis result of the ClimateField object

Parameters:

n (int) – perform EOF analysis and return the first n modes.
time_period (tuple or list) – the timespan over which to perfom the EOF analysis.
verbose (bool, optional) – print verbose information. Defaults to False.
flip (bool, optional) – flip the sign of the field values. Defaults to False.

index(name)#

Calculate the predefined indices.

Parameters:

name (str) –

the predefined index name; supports the below:

’nino3.4’
’nino1+2’
’nino3’
’nino4’
’tpi’
’wp’
’dmi’
’iobw’

load_nc(path, vn=None, time_name='time', lat_name='lat', lon_name='lon', load=True, return_ds=False, use_cftime=True, **kwargs)#

Load the climate field from a netCDF file.

Parameters:

path (str) – the path where to load data from.
vn (str) – the variable name to load.
time_name (str) – the name for the time axis. Defaults to ‘time’.
lat_name (str) – the name for the lat axis. Defaults to ‘lat’.
lon_name (str) – the name for the lon axis. Defaults to ‘lon’.
load (bool) – if True, the netCDF file will be loaded into the memory; if False, will take the advantage of lazy loading. Defaults to False.
return_ds (bool) – if True, will return a xarray.Dataset object instead of a ClimateField object. Defaults to False.
use_cftime (bool) – if True, use the cftime convention. Defaults to True.

plot(**kwargs)#: Plot a climate field at a time point.

See also

cfr.visual.plot_field_map : Visualize a field on a map.

plot_eof(n=1, eof_title=None, pc_title=None)#

Plot the EOF analysis result

Parameters:

n (int) – plot the n-th mode.
eof_title (str) – the subplot title for the mode field.
pc_title (str) – the subplot title for the PC time series.

plotly_grid(site_lats=None, site_lons=None, **kwargs)#

Plot the grid on an interactive map utilizing Plotly

Parameters:

site_lats (list) – a list of the latitudes of the sites to plot
site_lons (list) – a list of the longitudes of the sites to plot

regrid(lats, lons, periodic_lon=False)#: Regrid the climate field.

rename(new_vn)#

Rename the variable name of the climate field.

Parameters:: new_vn (str) – the new variable name.

to_nc(path, verbose=True, compress_params=None)#

Convert the climate field to a netCDF file.

Parameters:

path (str) – the path where to save
verbose (bool, optional) – print verbose information. Defaults to False.
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.

wrap_lon(mode='360')#

Convert the longitude values

Parameters:: mode (str) – if ‘360’, convert the longitude values from the range (-180, 180) to (0, 360); if ‘180’, convert the longitude values from the range (0, 360) to (-180, 180);

PSM#

class cfr.psm.Linear(pobj=None, climate_required=['tas'])#

A PSM that is based on univariate linear regression.

Parameters:

pobj (cfr.proxy.ProxyRecord) – the proxy record object
climate_required (cfr.climate.ClimateField) – the required climate field object for running this PSM

class cfr.psm.Bilinear(pobj=None, climate_required=['tas', 'pr'])#

A PSM that is based on bivariate linear regression.

Parameters:

pobj (cfr.proxy.ProxyRecord) – the proxy record object
climate_required (cfr.climate.ClimateField) – the required climate field object for running this PSM

class cfr.psm.Ice_d18O(pobj=None, tas_name='model.tas', pr_name='model.pr', psl_name='model.psl', d18O_name='model.d18O', climate_required=['tas', 'pr', 'psl', 'd18O'])#

The ice core d18O model adopted from PRYSM (sylvia-dee/PRYSM)

Parameters:

pobj (cfr.proxy.ProxyRecord) – the proxy record object
climate_required (cfr.climate.ClimateField) – the required climate field object for running this PSM

forward(alt_diff=0, nproc=None)#

The ice d18O model

It takes model simulated montly tas, pr, psl, d18O as input.

class cfr.psm.Lake_VarveThickness(pobj=None, model_tas_name='model.tas', climate_required=['tas'])#

The varve thickness model.

It takes summer temperature as input (JJA for NH and DJF for SH).

class cfr.psm.Coral_SrCa(pobj=None, model_tos_name='model.tos', climate_required='tos')#

The coral Sr/Ca model

forward(b=10.553, a=None, seed=None)#

Sensor model for Coral Sr/Ca = a * tos + b

Parameters:: tos (1-D array) – sea surface temperature in [degC]

class cfr.psm.Coral_d18O(pobj=None, model_tos_name='model.tos', model_d18Osw_name='model.d18Osw', climate_required=['tos', 'd18Osw'], species='default')#: The PSM is based on the forward model published by [Thompson, 2011]: <Thompson, D. M., T. R. Ault, M. N. Evans, J. E. Cole, and J. Emile-Geay (2011), Comparison of observed and simulated tropical climate trends using a forward model of coral d18O, Geophys.Res.Lett., 38, L14706, doi:10.1029/2011GL048224.> Returns a numpy array that is the same size and shape as the input vectors for SST, SSS.

class cfr.psm.VSLite(pobj=None, obs_tas_name='obs.tas', obs_pr_name='obs.pr', model_tas_name='model.tas', model_pr_name='model.pr', climate_required=['tas', 'pr'])#: The VS-Lite tree-ring width model that takes monthly tas, pr as input.

DA#

class cfr.da.EnKF(prior, proxydb, seed=0, nens=100, recon_vars=['tas'])#

The class for ensemble Kalman filter.

Parameters:

prior (dict) – a dictionary of cfr.climate.ClimateField
proxydb (cfr.proxy.ProxyDatabase) – the proxy database
seed (int, optional) – random seed. Defaults to 0.
nens (int, optional) – the ensemble size. Defaults to 100.
recon_vars (list, optional) – the list of variables to reconstruct. Defaults to [‘tas’].

ReconJob#

class cfr.reconjob.ReconJob(configs=None, verbose=False)#

The class for a reconstruction Job.

annualize_clim(tag, verbose=False, months=None)#

Annualize the grided climate data, either model simulations or instrumental observations.

Parameters:

tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
months (list) – the list of months for annualization.
verbose (bool, optional) – print verbose information. Defaults to False.

annualize_proxydb(months=None, ptypes=None, inplace=True, verbose=False, **kwargs)#

Annualize the proxy database.

Parameters:

months (list) – the list of months for annualization.
ptypes (list) – the list of proxy types.
inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.

calib_psms(ptype_psm_dict=None, ptype_season_dict=None, ptype_clim_dict=None, calib_period=None, use_predefined_R=False, verbose=False, **kwargs)#

Calibrate the PSMs.

Parameters:

ptype_psm_dict (dict) – the dictionary to denote the PSM for each proxy type; ‘Linear’ for all by default.
ptype_season_dict (dict) – the dictionary to denote the seasonality for each proxy type; calendar annual for all by default.
ptype_clim_dict (dict) – the dictionary to denote the required climate variables for each proxy type; [‘tas’] for all by default.
calib_period (tuple or list) – the time period for calibration.
use_predefined_R (bool) – use the predefined observation error covariance instead of by calibration.
verbose (bool, optional) – print verbose information. Defaults to False.

center_proxydb(ref_period=None, inplace=True, verbose=False)#

Center the proxy timeseries against a reference time period.

Parameters:

ref_period (tuple or list) – the reference time period in the form or (start_yr, end_yr)
inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.

clear_proxydb_tags(verbose=False)#

Clear the tags for each proxy record in the proxy database.

Parameters:: verbose (bool, optional) – print verbose information. Defaults to False.

copy()#: Make a deep copy of the object itself.

crop_clim(tag, lat_min=None, lat_max=None, lon_min=None, lon_max=None, verbose=False)#

Crop the grided climate data, either model simulations or instrumental observations.

Parameters:

tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
lat_min (float) – the minimum latitude of the cropped grid.
lat_max (float) – the maximum latitude of the cropped grid.
lon_min (float) – the minimum longitude of the cropped grid.
lon_max (float) – the maximum longitude of the cropped grid.
verbose (bool, optional) – print verbose information. Defaults to False.

erase_cfg(keys, verbose=False)#

Erase configuration items from self.configs.

Parameters:

keys (list) – a list of configuration item strings.
verbose (bool, optional) – print verbose information. Defaults to False.

filter_proxydb(*args, inplace=True, verbose=False, **kwargs)#

Filter the proxy database.

Parameters:

inplace (bool) – if True, the annualized proxy database will replace the current self.proxydb.
verbose (bool, optional) – print verbose information. Defaults to False.

See cfr.proxy.ProxyDatabase.filter() for more information.

forward_psms(verbose=False, ptype_forward_dict=None)#

Forward the PSMs.

Parameters:: verbose (bool, optional) – print verbose information. Defaults to False.

graphem_kcv(cv_time, ctrl_params, graph_type='neighborhood', stat='MSE', n_splits=5)#

k-fold cross-validation

Parameters:

cv_time (array-like, 1d) – cross validation time points
ctrl_params (array-like, 1d) – array of control parameters to try
graph_type (str) – type of graph. Either “neighborhood” or “glasso”
stat (str) – name of objective function. Choices are “MSE”, “RE”, “CE” or “R2”.
n_splits (int) – number of splits (default = 5)

io_cfg(k, v, default=None, verbose=False)#

Add-to or read-from configurations.

Parameters:

k (str) – the name of a configuration item
v (object) – any value of the configuration item
default (object) – the default value of the configuration item
verbose (bool, optional) – print verbose information. Defaults to False.

load(save_dirpath=None, filename='job.pkl', verbose=False)#

Load a ReconJob object from a pickle file.

Parameters:

save_dirpath (str) – the directory path for saving the cfr.ReconJob object.
filename (str) – the filename of the to-be-saved cfr.ReconJob object.
verbose (bool, optional) – print verbose information. Defaults to False.

load_clim(tag, path_dict=None, rename_dict=None, anom_period=None, time_name=None, load=False, lat_name=None, lon_name=None, verbose=False)#

Load grided climate data, either model simulations or instrumental observations.

Parameters:

tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
path_dict (dict) – the dictionary of paths of climate data files with keys to be the variable names, e.g., ‘tas’ and ‘pr’, etc.
rename_dict (dict) – the dictionary for renaming the variable names in the climate data files.
anom_period (tuple or list) – the time period for computing the anomaly.
time_name (str) – the name of the time dimension in the climate data files.
load (bool) – if True, the data will be loaded into the memory instead of lazy-loading.
lon_name (str) – the name of the longitude dimension in the climate data files.
verbose (bool, optional) – print verbose information. Defaults to False.

load_proxydb(path=None, verbose=False, **kwargs)#

Load the proxy database from a pandas.DataFrame.

Parameters:

path (str, optional) – the path to the pickle file of the pandas.DataFrame. Defaults to None.
verbose (bool, optional) – print verbose information. Defaults to False.

mark_pids(verbose=False)#

Mark proxy IDs to self.configs.

Parameters:: verbose (bool, optional) – print verbose information. Defaults to False.

prep_da_cfg(cfg_path, seeds=None, save_job=False, verbose=False)#

Prepare the configuration items.

Parameters:

cfg_path (str) – the path of the configuration YAML file.
seeds (list, optional) – the list of random seeds.
save_job (bool, optional) – if True, export the job object to a file.
verbose (bool, optional) – print verbose information. Defaults to False.

prep_graphem(recon_time=None, calib_time=None, recon_period=None, recon_timescale=None, calib_period=None, uniform_pdb=None, verbose=False)#

A shortcut of the steps for GraphEM data preparation

Parameters:

recon_time (array list, optional) – the time points to reconstruct
calib_time (array list, optional) – the time points for calibration
recon_period (list or tuple, optional) – the reconstruction timespan. Effective when recon_time or calib_time is None. Defaults to (1001, 2000).
recon_timescale (float, optional) – the reconstruction timescale. Effective when recon_time or calib_time is None. Defaults to 1 (annual).
calib_period (list or tuple, optional) – the calibration timespan. Defaults to (1850, 2000).
unitform_pdb (bool, optional) – if True, filter the proxy database to make it more uniform in length. Defaults to True.
verbose (bool, optional) – print verbose information. Defaults to False.

regrid_clim(tag, verbose=False, lats=None, lons=None, nlat=None, nlon=None, periodic_lon=True)#

Regrid the grided climate data, either model simulations or instrumental observations.

Parameters:

tag (str) – the tag to denote identity; either ‘prior’ or ‘obs.
lats (list or numpy.array) – the latitudes of the regridded grid.
lons (list or numpy.array) – the longitudes of the regridded grid.
nlat (int) – the number of latitudes of the regridded grid; effective when lats = None.
nlon (int) – the number of longitudes of the regridded grid; effective when lons = None..
periodic_lon (bool) – if True, then assume the original longitudes form a loop.

run_da(recon_period=None, recon_loc_rad=None, recon_timescale=None, recon_sampling_mode=None, recon_sampling_dist=None, recon_vars=None, normal_sampling_sigma=None, normal_sampling_cutoff_factor=None, trim_prior=None, nens=None, seed=0, verbose=False, debug=False, allownan=None)#

Run the data assimilation workflows.

Parameters:

recon_period (tuple or list) – the time period for reconstruction.
recon_loc_rad (float) – the localization radius; unit: km.
recon_timescale (int or float) – the timescale for reconstruction.
recon_sampling_mode (str) – ‘fixed’ or ‘rolling’ window for prior sampling.
recon_sampling_dist (str) – ‘normal’ or ‘uniform’ distribution for prior sampling.
recon_vars (list) – the list of variables to reconstruct. Defaults to [‘tas’].
normal_sampling_sigma (str) – the standard deviation of the normal distribution for prior sampling.
normal_sampling_cutoff_factor (int) – the cutoff factor for the window for prior sampling.
allownan (bool) – if True, NaNs in prior is allowed.
nens (int) – the ensemble size.
seed (int) – the random seed.
verbose (bool, optional) – print verbose information. Defaults to False.
debug (bool) – if True, the debug mode is turned on and more information will be printed out.

run_da_cfg(cfg_path, load_precalculated=False, seeds=None, run_mc=True, verbose=False)#

Running DA according to a configuration YAML file.

Parameters:

cfg_path (str) – the path of the configuration YAML file.
load_precalculated (bool, optional) – load the precalculated job object. Defaults to False.
run_mc (bool) – if False, the reconstruction part will not executed for the convenience of checking the preparation part.
seeds (list, optional) – the list of random seeds.
verbose (bool, optional) – print verbose information. Defaults to False.

run_da_mc(recon_period=None, recon_loc_rad=None, recon_timescale=None, nens=None, output_full_ens=None, recon_sampling_mode=None, recon_sampling_dist=None, recon_vars=None, normal_sampling_sigma=None, normal_sampling_cutoff_factor=None, trim_prior=None, recon_seeds=None, assim_frac=None, save_dirpath=None, compress_params=None, allownan=None, output_indices=None, verbose=False)#

Run the Monte-Carlo iterations of data assimilation workflows.

Parameters:

recon_period (tuple or list) – the time period for reconstruction.
recon_loc_rad (float) – the localization radius; unit: km.
recon_timescale (int or float) – the timescale for reconstruction.
recon_sampling_mode (str) – ‘fixed’ or ‘rolling’ window for prior sampling.
recon_sampling_dist (str) – ‘normal’ or ‘uniform’ distribution for prior sampling.
recon_vars (list) – the list of variables to reconstruct. Defaults to [‘tas’].
normal_sampling_sigma (str) – the standard deviation of the normal distribution for prior sampling.
normal_sampling_cutoff_factor (int) – the cutoff factor for the window for prior sampling.
output_full_ens (bool) – if True, the full ensemble fields will be stored to netCDF files.
nens (int) – the ensemble size.
recon_seed (int) – the random seeds.
allownan (bool) – if True, NaNs in prior is allowed.
assim_frac (float, optional) – the fraction of proxies for assimilation. Defaults to None.
verbose (bool, optional) – print verbose information. Defaults to False.
save_dirpath (str) – the directory path for saving the reconstruction results.
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
output_indices (list) –
the list of indices to output; supported indices:
- ’nino3.4’
- ’nino1+2’
- ’nino3’
- ’nino4’
- ’tpi’
- ’wp’
- ’dmi’
- ’iobw’

run_graphem(save_recon=True, save_dirpath=None, save_filename=None, load_precalc_solver=False, solver_save_path=None, compress_params=None, verbose=False, output_indices=None, **fit_kws)#

Run the GraphEM solver, essentially the GraphEM.solver.GraphEM.fit method

Note that the arguments for GraphEM.solver.GraphEM.fit can be appended in the argument list of this function directly. For instance, to pass a pre-calculated graph, use estimate_graph=False and graph=g.adj, where g is the Graph object.

Parameters:

save_dirpath (str) – the path to save the related results
save_filename (str) – the filename to save the reconstruction file. Defaults to “job_r01_recon.nc”.
solver_save_path (str) – the path to save the solver object.
load_precalculated (bool, optional) – load the precalculated Graph object. Defaults to False.
verbose (bool, optional) – print verbose information. Defaults to False.
compress_params (dict) – the paramters for compression when storing the reconstruction results to netCDF files.
output_indices (list) –
the list of indices to output; supported indices:
- ’nino3.4’
- ’nino1+2’
- ’nino3’
- ’nino4’
- ’tpi’
- ’wp’
- ’dmi’
- ’iobw’
fit_kws (dict) – the arguments for :py:meth: GraphEM.solver.GraphEM.fit The most important one is “graph_method”; available options include “neighborhood”, “glasso”, and “hybrid”, where “hybrid” means run “neighborhood” first with default cutoff_radius=1500 to infill the data matrix and then ran “glasso” with default sp_FF=3, sp_FP=3, sp_PP=3 to improve the result further.

ReconRes#

class cfr.reconres.ReconRes(dirpath, load_num=None, verbose=False)#

The class for reconstruction results

indpdt_verif(job_path, verbose=False, calib_period=(1850, 2000), min_verif_len=10)#: Perform independent verification. job_path (str): the path to the job. verbose (bool, optional): print verbose information. Defaults to False.

load(vn_list, verbose=False)#

Load reconstruction results.

Parameters:

vn_list (list) –
list of variable names; supported names, taking ‘tas’ as an example:
- ensemble fields: ‘tas’
- ensemble timeseries: ‘tas_gm’, ‘tas_nhm’, ‘tas_shm’
verbose (bool, optional) – print verbose information. Defaults to False.

load_proxylabels(verbose=False)#: Load proxy labels from the reconstruction results. Proxy with “assim” means it is assimilated. Proxy with “eval” means it is used for evaluation.

plot_indpdt_verif()#: Plot the indpdt verification results.

plot_valid(recon_name_dict=None, target_name_dict=None, valid_ts_kws=None, valid_fd_kws=None)#

Plot the validation result

Parameters:

recon_name_dict (dict) – the dictionary for variable names in the reconstruction. For example, {‘tas’: ‘LMR/tas’, ‘nino3.4’: ‘NINO3.4 [K]’}.
target_name_dict (dict) – the dictionary for variable names in the validation target. For example, {‘tas’: ‘20CRv3’, ‘nino3.4’: ‘BC09’}.
valid_ts_kws (dict) – the dictionary of keyword arguments for validating the timeseries.
valid_fd_kws (dict) – the dictionary of keyword arguments for validating the field.

valid(target_dict, stat=['corr'], timespan=None, verbose=False)#

Validate against a target dictionary

Parameters:

target_dict (dict) – a dictionary of multiple variables for validation.
stat (list of str) –
the statistics to calculate. Supported quantaties:
- ’corr’: correlation coefficient
- ’R2’: coefficient of determination
- ’CE’: coefficient of efficiency
timespan (list or tuple) – the timespan over which to perform the validation.
verbose (bool, optional) – print verbose information. Defaults to False.

EnsTS#

class cfr.ts.EnsTS(time=None, value=None, value_name=None)#

The class for ensemble timeseries

The ensembles variable should be in shape of (nt, nEns), where nt is the number of years, and nEns is the number of ensemble members.

Parameters:

time (numpy.array) – the time axis of the series
value (numpy.array) – the value axis of the series
value_name (str) – the name of value axis; will be used as ylabel in plots

nt#

the size of the time axis

Type:: int

nEns#

the size of the ensemble

Type:: int

median#

the median of the ensemble timeseries

Type:: numpy.array

compare(ref=None, ref_time=None, ref_value=None, ref_name='reference', stats=['corr', 'CE'], timespan=None)#

Compare against a reference timeseries.

Parameters:

ref (cfr.ts.EnsTS) – the reference time series object
ref_time (numpy.array) – the time axis of the reference timeseries
ref_value (numpy.array) – the value axis of the reference timeseries
stats (list, optional) – the list of validation statistics to calculate. Defaults to [‘corr’, ‘CE’].
timespan (tuple, optional) – the time period for validation. Defaults to None.

copy()#: Make a deepcopy of the object.

fetch(name=None, **from_df_kws)#

Fetch a proxy database from cloud

Parameters:: name (str) – a predifined database name or an URL starting with “http”

from_df(df, time_column='time', value_columns=None)#

Load data from a pandas.DataFrame

Parameters:

df (pandas.DataFrame) – The pandas.DataFrame object.
time_column (str) – The label of the column for the time axis.
value_columns (list of str) – The list of the labels for the value axis of the ensemble members.

line_density(figsize=[12, 4], cmap='plasma', color_scale='linear', bins=None, num_fine=None, xlabel='Year (CE)', ylabel=None, title=None, ylim=None, xlim=None, title_kws=None, ax=None, **pcolormesh_kwargs)#

Plot the timeseries 2-D histogram

Parameters:

cmap (str) – The colormap for the histogram.
color_scale (str) – The scale of the colorbar; should be either ‘linear’ or ‘log’.
bins (list or tuple) – The number of bins for each axis: nx, ny = bins.

Referneces:: https://matplotlib.org/3.5.0/gallery/statistics/time_series_histogram.html

plot(figsize=[12, 4], color='indianred', xlabel='Year (CE)', ylabel=None, title=None, ylim=None, xlim=None, lgd_kws=None, title_kws=None, plot_valid=True, ax=None, **plot_kws)#

Plot the raw values (multiple series).

Parameters:: plot_valid (bool, optional) – If True, will plot the validation target series if existed. Defaults to True.

plot_qs(figsize=[12, 4], qs=[0.025, 0.25, 0.5, 0.75, 0.975], color='indianred', xlabel='Year (CE)', ylabel=None, title=None, ylim=None, xlim=None, alphas=[0.5, 0.1], lgd_kws=None, title_kws=None, ax=None, plot_valid=True, **plot_kws)#

Plot the quantiles

Parameters:

figsize (list, optional) – The size of the figure. Defaults to [12, 4].
qs (list, optional) – The list to denote the quantiles plotted. Defaults to [0.025, 0.25, 0.5, 0.75, 0.975].
color (str, optional) – The basic color for the quantile envelopes. Defaults to ‘indianred’.
xlabel (str, optional) – The label for the x-axis. Defaults to ‘Year (CE)’.
ylabel (str, optional) – The label for the y-axis. Defaults to None.
title (str, optional) – The title of the figure. Defaults to None.
ylim (tuple or list, optional) – The limit of the y-axis. Defaults to None.
xlim (tuple or list, optional) – The limit of the x-axis. Defaults to None.
alphas (list, optional) – The alphas for the quantile envelopes. Defaults to [0.5, 0.1].
lgd_kws (dict, optional) – The keyward arguments for the ax.legend() function. Defaults to None.
title_kws (dict, optional) – The keyward arguments for the ax.title() function. Defaults to None.
ax (matplotlib.axes, optional) – The matplotlib.axes object. If set the image will be plotted in the existing ax. Defaults to None.
plot_valid (bool, optional) – If True, will plot the validation target series if existed. Defaults to True.
**kwargs (dict, optional) – The keyward arguments for the ax.plot() function. Defaults to None.

to_df(time_column=None, value_column='ens')#

Convert an EnsTS to a pandas.DataFrame

Parameters:

time_column (str) – The label of the column for the time axis.
value_column (str) – The base column label for the ensemble members. By default, the columns for the members will be labeled as “ens.0”, “ens.1”, “ens.2”, etc.

API Reference

Contents

API Reference#

Proxy#

Climate#

PSM#

DA#

ReconJob#

ReconRes#

EnsTS#