Running GraphEM with the Command Line Interface (CLI)#

In this tutorial, we first run through a demo case with the GraphEM method in a notebook style, after which we generate the configurations file configs.yml based on the demo case.

Then we repeat the case by calling the ReconJob.run_graphem_cfg() method using the configs.yml in a notebook style.

Finally, we repeat the case by calling the command line interface using the configs.yml.

A validation section is presented in the end to make sure the reconstruction job yields a reasonable result.

Generating the configs.yml file#

[1]:
%load_ext autoreload
%autoreload 2

import cfr
print(cfr.__version__)
2023.7.24
[2]:
job = cfr.ReconJob()
job.load_proxydb('PAGES2kv2')
[3]:
job.filter_proxydb(by='ptype', keys=['coral'])
job.annualize_proxydb(months=[12, 1, 2], ptypes=['coral'])

# obs: fetching & preprocessing
job.load_clim(tag='obs', path_dict={'tas': 'gistemp1200_GHCNv4_ERSSTv5'}, anom_period=[1951, 1980], rename_dict={'tas': 'tempanomaly'})

# obs: processing
job.annualize_clim(tag='obs', months=[12, 1, 2])
job.regrid_clim(tag='obs', nlat=42, nlon=63)
job.crop_clim(tag='obs', lat_min=-20, lat_max=20, lon_min=150, lon_max=260)
Annualizing ProxyDatabase: 100%|██████████| 104/104 [00:01<00:00, 74.12it/s]
>>> The target file seems existed at: ./data/gistemp1200_GHCNv4_ERSSTv5.nc.gz . Loading from it instead of downloading ...

[4]:
job.prep_graphem(
    recon_period=(1871, 2000),  # period to reconstruct
    calib_period=(1901, 2000),  # period for calibration
    uniform_pdb=True,           # filter the proxydb to be more uniform
    verbose=True,
)
>>> job.configs["recon_period"] = [1871, 2000]
>>> job.configs["recon_timescale"] = 1
>>> job.configs["calib_period"] = [1901, 2000]
>>> job.configs["uniform_pdb"] = True
>>> ProxyDatabase filtered to be more uniform. 54 records remaining.
>>> job.configs["proxydb_center_ref_period"] = [1901, 2000]

Centering each of the ProxyRecord: 100%|██████████| 54/54 [00:00<00:00, 1794.78it/s]
>>> job.proxydb updated
>>> job.graphem_params["recon_time"] created
>>> job.graphem_params["calib_time"] created


>>> job.graphem_params["field_obs"] created
>>> job.graphem_params["calib_idx"] created
>>> job.graphem_params["field"] created
>>> job.graphem_params["df_proxy"] created
>>> job.graphem_params["proxy"] created
>>> job.graphem_params["lonlat"] created

[5]:
%%time

# GraphEM
job.run_graphem(
    save_dirpath='./recons/test-run-graphem-cfg',
    graph_method='hybrid',
    cutoff_radius=1500,
    sp_FF=3, sp_FP=4,
    verbose=True,
)
>>> job.configs["compress_params"] = {'zlib': True}
>>> job.configs["save_dirpath"] = ./recons/test-run-graphem-cfg
>>> job.configs["save_filename"] = job_r01_recon.nc
>>> job.configs["graph_method"] = hybrid
>>> job.configs["cutoff_radius"] = 1500
>>> job.configs["sp_FF"] = 3
>>> job.configs["sp_FP"] = 4
Computing a neighborhood graph with R = 1500.0 km
Estimating graph using neighborhood method
Running GraphEM:

EM | dXmis: 0.0026; rdXmis: 0.0046:   4%|▍         | 9/200 [00:02<00:47,  4.06it/s]
GraphEM.EM(): Tolerance achieved.
Solving graphical LASSO using greedy search
graph_greedy_search | FF:  3.720; FP:  3.644; PP:  0.000:   3%|▎         | 17/500 [00:01<00:38, 12.61it/s]
Using specified graph
Running GraphEM:

EM | dXmis: 0.0021; rdXmis: 0.0049:   8%|▊         | 15/200 [00:05<01:06,  2.80it/s]
GraphEM.EM(): Tolerance achieved.
>>> job.graphem_solver created and saved to: None
>>> job.recon_fields created
>>> Reconstructed fields saved to: ./recons/test-run-graphem-cfg/job_r01_recon.nc
CPU times: user 1min 24s, sys: 16.9 s, total: 1min 41s
Wall time: 9.24 s

[6]:
job.save_cfg('./recons/test-run-graphem-cfg')

Testing running the reconstruction job based on the generated configs.yml file#

In a configs.yml file, the important parameters that might be considered to be modified by the users:

  • calib_period: the calibration period for the GraphEM algorithm

  • recon_period: the reconstruction period (i.e., the timespan for the output reconstruction)

  • annualize_proxydb_months: the seasonality for the reconstruction (e.g., calendar annual by default, or just a season average like [12, 1, 2] for winter)

  • filter_proxydb_kwargs: this controls what proxy records to use for the reconstruction

  • cutoff_radius: the cutoff radius for the GraphEM algorithm; reasonable values range from hundreds to thousands of kilometers; default value: 1500

  • sp_FF: target sparsity (in percentage) of the in-field part of the inverse covariance matrix; default value: 3

  • sp_FP: target sparsity (in percentage) of the climate field/proxy part of the inverse covariance matrix; default value: 3

[7]:
job_cfg = cfr.ReconJob()
job_cfg.run_graphem_cfg('./recons/test-run-graphem-cfg/configs.yml', verbose=True)
>>> job.configs["proxydb_path"] = PAGES2kv2
>>> 692 records loaded
>>> job.proxydb created
>>> job.configs["filter_proxydb_args"] = []
>>> job.configs["filter_proxydb_kwargs"] = {'by': 'ptype', 'keys': ['coral']}
>>> 104 records remaining
>>> job.proxydb updated

Annualizing ProxyDatabase: 100%|██████████| 104/104 [00:01<00:00, 70.70it/s]
Centering each of the ProxyRecord: 100%|██████████| 99/99 [00:00<00:00, 1094.88it/s]
>>> job.configs["obs_path"] = {'tas': 'gistemp1200_GHCNv4_ERSSTv5'}
>>> job.configs["obs_rename_dict"] = {'tas': 'tempanomaly'}
>>> job.configs["obs_anom_period"] = [1951, 1980]
>>> The target file seems existed at: ./data/gistemp1200_GHCNv4_ERSSTv5.nc.gz . Loading from it instead of downloading ...
>>> obs variables ['tas'] loaded
>>> job.obs created
>>> job.configs["obs_annualize_months"] = [12, 1, 2]
>>> Processing tas ...
>>> job.obs updated
>>> job.configs["obs_regrid_nlat"] = 42
>>> job.configs["obs_regrid_nlon"] = 63
>>> Processing tas ...
>>> job.configs["obs_lat_min"] = -20
>>> job.configs["obs_lat_max"] = 20
>>> job.configs["obs_lon_min"] = 150
>>> job.configs["obs_lon_max"] = 260
>>> Processing tas ...
>>> job.configs["recon_period"] = [1871, 2000]
>>> job.configs["calib_period"] = [1901, 2000]
>>> job.configs["uniform_pdb"] = True
>>> ProxyDatabase filtered to be more uniform. 52 records remaining.
>>> job.configs["proxydb_center_ref_period"] = [1901, 2000]

Centering each of the ProxyRecord: 100%|██████████| 52/52 [00:00<00:00, 1930.14it/s]
>>> job.proxydb updated
>>> job.graphem_params["recon_time"] created
>>> job.graphem_params["calib_time"] created
>>> job.graphem_params["field_obs"] created
>>> job.graphem_params["calib_idx"] created
>>> job.graphem_params["field"] created
>>> job.graphem_params["df_proxy"] created
>>> job.graphem_params["proxy"] created
>>> job.graphem_params["lonlat"] created
>>> job.configs["save_dirpath"] = ./recons/test-run-graphem-cfg
>>> job.configs saved to: ./recons/test-run-graphem-cfg/configs.yml
>>> job.configs["save_dirpath"] = ./recons/test-run-graphem-cfg
>>> job.configs["graph_method"] = hybrid
>>> job.configs["cutoff_radius"] = 1500
>>> job.configs["sp_FF"] = 3
>>> job.configs["sp_FP"] = 4
Computing a neighborhood graph with R = 1500.0 km
Estimating graph using neighborhood method
Running GraphEM:

EM | dXmis: 0.0026; rdXmis: 0.0046:   4%|▍         | 9/200 [00:02<00:57,  3.33it/s]
GraphEM.EM(): Tolerance achieved.
Solving graphical LASSO using greedy search
graph_greedy_search | FF:  3.720; FP:  3.644; PP:  0.000:   3%|▎         | 17/500 [00:01<00:41, 11.57it/s]
Using specified graph
Running GraphEM:

EM | dXmis: 0.0021; rdXmis: 0.0049:   8%|▊         | 15/200 [00:05<01:06,  2.80it/s]
GraphEM.EM(): Tolerance achieved.
>>> job.graphem_solver created and saved to: None
>>> job.recon_fields created
>>> Reconstructed fields saved to: ./recons/test-run-graphem-cfg/job_r01_recon.nc


Leveraging the CLI#

[8]:
!cfr -h
/Users/fengzhu/Apps/miniconda3/envs/py310/bin/cfr:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('cfr==2023.7.24')
usage: cfr [-h] [-v] {da,graphem} ...

========================================================================================
 cfr: a scripting system for CFR
----------------------------------------------------------------------------------------
 Usage example for DA:
    cfr da -c config.yml -vb -s 1 2 -r
    # -c config.yml: run the reconstruction job according to config.yml
    # -vb: output the verbose runtime information
    # -s 1 2: set seeds as integers from 1 to 2
    # -r: run the Monte-Carlo iterations for PDA

 Usage example for GraphEM:
    cfr graphem -c config.yml -vb
    # -c config.yml: run the reconstruction job according to config.yml
    # -vb: output the verbose runtime information
========================================================================================


positional arguments:
  {da,graphem}   running mode
    da           run a DA-based reconstruction
    graphem      run a GraphEM-based reconstruction

options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit
[9]:
!cfr graphem -c ./recons/test-run-graphem-cfg/configs.yml -vb
/Users/fengzhu/Apps/miniconda3/envs/py310/bin/cfr:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('cfr==2023.7.24')
>>> job.configs["proxydb_path"] = PAGES2kv2
>>> 692 records loaded
>>> job.proxydb created
>>> job.configs["filter_proxydb_args"] = []
>>> job.configs["filter_proxydb_kwargs"] = {'by': 'ptype', 'keys': ['coral']}
>>> 104 records remaining
>>> job.proxydb updated
Annualizing ProxyDatabase: 100%|██████████████| 104/104 [00:01<00:00, 77.58it/s]
Centering each of the ProxyRecord: 100%|██████| 99/99 [00:00<00:00, 1373.28it/s]
>>> job.configs["obs_path"] = {'tas': 'gistemp1200_GHCNv4_ERSSTv5'}
>>> job.configs["obs_rename_dict"] = {'tas': 'tempanomaly'}
>>> job.configs["obs_anom_period"] = [1951, 1980]
>>> The target file seems existed at: ./data/gistemp1200_GHCNv4_ERSSTv5.nc.gz . Loading from it instead of downloading ...
>>> obs variables ['tas'] loaded
>>> job.obs created
>>> job.configs["obs_annualize_months"] = [12, 1, 2]
>>> Processing tas ...
>>> job.obs updated
>>> job.configs["obs_regrid_nlat"] = 42
>>> job.configs["obs_regrid_nlon"] = 63
>>> Processing tas ...
>>> job.configs["obs_lat_min"] = -20
>>> job.configs["obs_lat_max"] = 20
>>> job.configs["obs_lon_min"] = 150
>>> job.configs["obs_lon_max"] = 260
>>> Processing tas ...
>>> job.configs["recon_period"] = [1871, 2000]
>>> job.configs["calib_period"] = [1901, 2000]
>>> job.configs["uniform_pdb"] = True
>>> ProxyDatabase filtered to be more uniform. 52 records remaining.
>>> job.configs["proxydb_center_ref_period"] = [1901, 2000]
Centering each of the ProxyRecord: 100%|██████| 52/52 [00:00<00:00, 2433.33it/s]
>>> job.proxydb updated
>>> job.graphem_params["recon_time"] created
>>> job.graphem_params["calib_time"] created
>>> job.graphem_params["field_obs"] created
>>> job.graphem_params["calib_idx"] created
>>> job.graphem_params["field"] created
>>> job.graphem_params["df_proxy"] created
>>> job.graphem_params["proxy"] created
>>> job.graphem_params["lonlat"] created
>>> job.configs["save_dirpath"] = ./recons/test-run-graphem-cfg
>>> job.configs saved to: ./recons/test-run-graphem-cfg/configs.yml
>>> job.configs["save_dirpath"] = ./recons/test-run-graphem-cfg
>>> job.configs["graph_method"] = hybrid
>>> job.configs["cutoff_radius"] = 1500
>>> job.configs["sp_FF"] = 3
>>> job.configs["sp_FP"] = 4
Computing a neighborhood graph with R = 1500.0 km
Estimating graph using neighborhood method
Running GraphEM:

EM | dXmis: 0.0026; rdXmis: 0.0046:   4%|▎      | 9/200 [00:02<00:52,  3.61it/s]GraphEM.EM(): Tolerance achieved.
EM | dXmis: 0.0026; rdXmis: 0.0046:   4%|▎      | 9/200 [00:02<00:54,  3.50it/s]
Solving graphical LASSO using greedy search
graph_greedy_search | FF:  3.720; FP:  3.644; PP:  0.000:   3%| | 17/500 [00:01<
Using specified graph
Running GraphEM:

EM | dXmis: 0.0021; rdXmis: 0.0049:   8%|▍     | 15/200 [00:05<01:06,  2.79it/s]GraphEM.EM(): Tolerance achieved.
EM | dXmis: 0.0021; rdXmis: 0.0049:   8%|▍     | 15/200 [00:05<01:08,  2.72it/s]
>>> job.graphem_solver created and saved to: None
>>> job.recon_fields created
>>> Reconstructed fields saved to: ./recons/test-run-graphem-cfg/job_r01_recon.nc

Validation#

[10]:
res = cfr.ReconRes('./recons/test-run-graphem-cfg', verbose=True)
res.load(['tas', 'nino3.4'], verbose=True)
>>> res.paths:
['./recons/test-run-graphem-cfg/job_r01_recon.nc']
>>> ReconRes.recons["tas"] created
>>> ReconRes.da["tas"] created
>>> ReconRes.recons["nino3.4"] created
>>> ReconRes.da["nino3.4"] created

[11]:
tas_20CRv3 = cfr.ClimateField().fetch('20CRv3/tas', vn='air').rename('tas').get_anom((1951, 1980)).annualize(months=[12, 1, 2])
tas_20CRv3 = tas_20CRv3.crop(lat_min=-20, lat_max=20, lon_min=150, lon_max=260)
nino34_bc09 = cfr.EnsTS().fetch('BC09_NINO34').annualize(months=[12, 1, 2])
>>> The target file seems existed at: ./data/air.2m.mon.mean.nc . Loading from it instead of downloading ...

[12]:
res.valid(
    target_dict={'tas':  tas_20CRv3, 'nino3.4': nino34_bc09},
    timespan=(1874, 1900), verbose=True,
    stat=['corr', 'CE'],
)

fig, ax = res.plot_valid(
    target_name_dict={'tas': '20CRv3', 'nino3.4': 'BC09'},
    recon_name_dict={'tas': 'GraphEM/tas', 'nino3.4': 'NINO3.4 [K]'},
    valid_fd_kws=dict(
        projection='PlateCarree',
        latlon_range=(-20, 20, 150, 256),
        plot_cbar=True,
    ),
    valid_ts_kws=dict(
        xlim = (1870, 1900),
        ylim = (-3, 4),
    )
)
>>> ReconRes.recons["tas"] created
>>> ReconRes.da["tas"] created
>>> ReconRes.recons["nino3.4"] created
>>> ReconRes.da["nino3.4"] created
>>> Validating variable: tas ...
>>> ReconRes.valid_fd[tas_corr] created
>>> ReconRes.valid_fd[tas_CE] created
>>> Validating variable: nino3.4 ...
>>> ReconRes.valid_ts[nino3.4] created

../_images/notebooks_graphem-cli_16_1.png
../_images/notebooks_graphem-cli_16_2.png
../_images/notebooks_graphem-cli_16_3.png
[ ]: