Database Filtering#

In this section, we illustrate how to filter the pseudoPAGES2k dataset in various ways with cfr.

Required data to complete this tutorial:

[1]:
%load_ext autoreload
%autoreload 2

import cfr
print(cfr.__version__)
2023.7.21

Load the pseudoPAEGS2k dataset#

[2]:
# load the pseudoPAGES2k database from a netCDF file

# load from a local copy
# pdb = cfr.ProxyDatabase().load_nc('./data/ppwn_SNRinf_rta.nc')

# load from the cloud
pdb = cfr.ProxyDatabase().fetch('pseudoPAGES2k/ppwn_SNRinf_rta')

# plot to have a check
fig, ax = pdb.plot()
../_images/notebooks_pp2k-pdb-filter_3_0.png

Filter the pseudoPAGES2k dataset#

The cfr.ProxyDatabase class comes with a .filter() method that can help us filter the database in various ways.

By proxy types#

The most common way to filter a proxy database is by the proxy types. For instance, to get a subset of the database for TRW records:

[3]:
pdb_trw = pdb.filter(by='ptype', keys='tree.TRW')
fig, ax =  pdb_trw.plot()
../_images/notebooks_pp2k-pdb-filter_6_0.png

The method supports fuzzy search, so the below works as well:

[4]:
pdb_trw = pdb.filter(by='ptype', keys='RW')
fig, ax =  pdb_trw.plot()
../_images/notebooks_pp2k-pdb-filter_8_0.png

With this feature, we may search multiple types:

[5]:
pdb_tree = pdb.filter(by='ptype', keys='tree')
fig, ax =  pdb_tree.plot()
../_images/notebooks_pp2k-pdb-filter_10_0.png
[6]:
pdb_d18O = pdb.filter(by='ptype', keys='d18O')
fig, ax =  pdb_d18O.plot()
../_images/notebooks_pp2k-pdb-filter_11_0.png

To search arbitrary multiple types, simply use a list of the keys:

[7]:
pdb_mix = pdb.filter(by='ptype', keys=['tree.TRW', 'coral.d18O'])
fig, ax =  pdb_mix.plot()
../_images/notebooks_pp2k-pdb-filter_13_0.png

By proxy IDs#

In some cases, we would like to get a subset of the database consisting of certain records that we know the IDs ahead. For instance, we may list several proxy IDs (pid):

[8]:
pdb_sub = pdb.filter(by='pid', keys=['NAm_153', 'NAm_154'])
fig, ax = pdb_sub.plot()
../_images/notebooks_pp2k-pdb-filter_15_0.png

With the fuzzy search feature, we may get a subset of all the North America sites:

[9]:
pdb_NAm = pdb.filter(by='pid', keys='NAm')
fig, ax = pdb_NAm.plot()
../_images/notebooks_pp2k-pdb-filter_17_0.png

By a latitude range#

Sometimes, we only need to use the records within a latitude range. For instance, the tropical records:

[10]:
pdb_lat = pdb.filter(by='lat', keys=[-20, 20])
fig, ax = pdb_lat.plot()
../_images/notebooks_pp2k-pdb-filter_19_0.png

By a longitude range#

Similarly, we may filter the database by a longitude range. For instance:

[11]:
pdb_lon = pdb.filter(by='lon', keys=[100, 120])
fig, ax = pdb_lon.plot()
../_images/notebooks_pp2k-pdb-filter_21_0.png

By a square (latitude + longitude ranges)#

We may also filter the database by a square specified by the min & max of the lat & lon. In this case, the argument keys represents a list [lat_min, lat_max, lon_min, lon_max]. For instance:

[12]:
pdb_square = pdb.filter(by='loc-square', keys=[-20, 20, 100, 120])
fig, ax = pdb_square.plot()
../_images/notebooks_pp2k-pdb-filter_23_0.png

By a circle (center + distance)#

Sometimes, we would like to search for the records around a center location. In this case, the argument keys represents a list [lat, lon, distance]. For instance:

[13]:
pdb_circle = pdb.filter(by='loc-circle', keys=[15, 100, 3000])
fig, ax = pdb_circle.plot()
../_images/notebooks_pp2k-pdb-filter_25_0.png