Database Filtering#
In this section, we illustrate how to filter the pseudoPAGES2k dataset in various ways with cfr
.
Required data to complete this tutorial:
pseudoPAGES2k: ppwn_SNRinf_rta.nc
[1]:
%load_ext autoreload
%autoreload 2
import cfr
print(cfr.__version__)
2023.7.21
Load the pseudoPAEGS2k dataset#
[2]:
# load the pseudoPAGES2k database from a netCDF file
# load from a local copy
# pdb = cfr.ProxyDatabase().load_nc('./data/ppwn_SNRinf_rta.nc')
# load from the cloud
pdb = cfr.ProxyDatabase().fetch('pseudoPAGES2k/ppwn_SNRinf_rta')
# plot to have a check
fig, ax = pdb.plot()
data:image/s3,"s3://crabby-images/2a4e4/2a4e4dbd3199ba76056511ea1af8afee4af89c4f" alt="../_images/notebooks_pp2k-pdb-filter_3_0.png"
Filter the pseudoPAGES2k dataset#
The cfr.ProxyDatabase
class comes with a .filter()
method that can help us filter the database in various ways.
By proxy types#
The most common way to filter a proxy database is by the proxy types. For instance, to get a subset of the database for TRW records:
[3]:
pdb_trw = pdb.filter(by='ptype', keys='tree.TRW')
fig, ax = pdb_trw.plot()
data:image/s3,"s3://crabby-images/8ed72/8ed72921e05a608faf862929669504d746c8d5f5" alt="../_images/notebooks_pp2k-pdb-filter_6_0.png"
The method supports fuzzy search, so the below works as well:
[4]:
pdb_trw = pdb.filter(by='ptype', keys='RW')
fig, ax = pdb_trw.plot()
data:image/s3,"s3://crabby-images/7c2fa/7c2fa2bcc03970e45ea9af4a1393a7bdc1df20dd" alt="../_images/notebooks_pp2k-pdb-filter_8_0.png"
With this feature, we may search multiple types:
[5]:
pdb_tree = pdb.filter(by='ptype', keys='tree')
fig, ax = pdb_tree.plot()
data:image/s3,"s3://crabby-images/a5f98/a5f98d430ab480b91be771d7dd796d60bfc2c611" alt="../_images/notebooks_pp2k-pdb-filter_10_0.png"
[6]:
pdb_d18O = pdb.filter(by='ptype', keys='d18O')
fig, ax = pdb_d18O.plot()
data:image/s3,"s3://crabby-images/9dbae/9dbaea07200f03743c6b32f8aec19fae726f6879" alt="../_images/notebooks_pp2k-pdb-filter_11_0.png"
To search arbitrary multiple types, simply use a list of the keys:
[7]:
pdb_mix = pdb.filter(by='ptype', keys=['tree.TRW', 'coral.d18O'])
fig, ax = pdb_mix.plot()
data:image/s3,"s3://crabby-images/853fb/853fb0af286eeaa3a6766c192e2c82bc06b23e80" alt="../_images/notebooks_pp2k-pdb-filter_13_0.png"
By proxy IDs#
In some cases, we would like to get a subset of the database consisting of certain records that we know the IDs ahead. For instance, we may list several proxy IDs (pid
):
[8]:
pdb_sub = pdb.filter(by='pid', keys=['NAm_153', 'NAm_154'])
fig, ax = pdb_sub.plot()
data:image/s3,"s3://crabby-images/704c8/704c8deecff90c257ca6301f959f05cdeaccb6ab" alt="../_images/notebooks_pp2k-pdb-filter_15_0.png"
With the fuzzy search feature, we may get a subset of all the North America sites:
[9]:
pdb_NAm = pdb.filter(by='pid', keys='NAm')
fig, ax = pdb_NAm.plot()
data:image/s3,"s3://crabby-images/a55b9/a55b991d40d1fe6f1880e06033b7c0a4fa56ac24" alt="../_images/notebooks_pp2k-pdb-filter_17_0.png"
By a latitude range#
Sometimes, we only need to use the records within a latitude range. For instance, the tropical records:
[10]:
pdb_lat = pdb.filter(by='lat', keys=[-20, 20])
fig, ax = pdb_lat.plot()
data:image/s3,"s3://crabby-images/694e1/694e1799c035e7391d129d74bba5c46eabd0dd74" alt="../_images/notebooks_pp2k-pdb-filter_19_0.png"
By a longitude range#
Similarly, we may filter the database by a longitude range. For instance:
[11]:
pdb_lon = pdb.filter(by='lon', keys=[100, 120])
fig, ax = pdb_lon.plot()
data:image/s3,"s3://crabby-images/40901/40901999c0bbcfa6ff2d7ce550b306e79cea7977" alt="../_images/notebooks_pp2k-pdb-filter_21_0.png"
By a square (latitude + longitude ranges)#
We may also filter the database by a square specified by the min & max of the lat & lon. In this case, the argument keys
represents a list [lat_min, lat_max, lon_min, lon_max]
. For instance:
[12]:
pdb_square = pdb.filter(by='loc-square', keys=[-20, 20, 100, 120])
fig, ax = pdb_square.plot()
data:image/s3,"s3://crabby-images/f4ce4/f4ce4e528375b721a6ccda4557a166b4adf11e6f" alt="../_images/notebooks_pp2k-pdb-filter_23_0.png"
By a circle (center + distance)#
Sometimes, we would like to search for the records around a center location. In this case, the argument keys
represents a list [lat, lon, distance]
. For instance:
[13]:
pdb_circle = pdb.filter(by='loc-circle', keys=[15, 100, 3000])
fig, ax = pdb_circle.plot()
data:image/s3,"s3://crabby-images/d941d/d941d465c0c2bbdef1a6de3390eb5d5fb60d3800" alt="../_images/notebooks_pp2k-pdb-filter_25_0.png"