{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# High-level workflow in LMRt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Expected time to run through: 5 mins**\n",
    "\n",
    "\n",
    "In this tutorial, we demonstrate how to reproduce our 1st LMR reconstruction in the previous notebook with the high-level workflow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test data preparation\n",
    "\n",
    "Again, if you haven't done yet, please prepare test data following the steps:\n",
    "1. Download the test case named \"PAGES2k_CCSM4_GISTEMP\" with this [link](https://drive.google.com/drive/folders/1UGn-LNd_tGSjPUKa52E6ffEM-ms2VD-N?usp=sharing).\n",
    "2. Create a directory named \"testcases\" in the same directory of this notebook.\n",
    "3. Put the unzipped direcotry \"PAGES2k_CCSM4_GISTEMP\" into \"testcases\".\n",
    "\n",
    "Below, we first load some useful packages, including our `LMRt`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "import LMRt\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import xarray as xr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load configurations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's again create the `job` object and load the configuration YAML file.\n",
    "To distinguish between the 1st reconstruction, let's specify the `job_dirpath` in the call, which indicates the path where we store our results, to another location."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[36mLMRt: job.load_configs() >>> loading reconstruction configurations from: ./testcases/PAGES2k_CCSM4_GISTEMP/configs.yml\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.load_configs() >>> job.configs created\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.load_configs() >>> job.configs[\"job_dirpath\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.load_configs() >>> /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high created\u001b[0m\n",
      "{'anom_period': [1951, 1980],\n",
      " 'job_dirpath': '/Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high',\n",
      " 'job_id': 'LMRt_quickstart',\n",
      " 'obs_path': {'tas': './data/obs/gistemp1200_ERSSTv4.nc'},\n",
      " 'obs_varname': {'tas': 'tempanomaly'},\n",
      " 'prior_path': {'tas': './data/prior/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-184912.nc'},\n",
      " 'prior_regrid_ntrunc': 42,\n",
      " 'prior_season': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],\n",
      " 'prior_varname': {'tas': 'TREFHT'},\n",
      " 'proxy_frac': 0.75,\n",
      " 'proxydb_path': './data/proxy/pages2k_dataset.pkl',\n",
      " 'psm_calib_period': [1850, 2015],\n",
      " 'ptype_psm': {'coral.SrCa': 'linear',\n",
      "               'coral.calc': 'linear',\n",
      "               'coral.d18O': 'linear'},\n",
      " 'ptype_season': {'coral.SrCa': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],\n",
      "                  'coral.calc': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],\n",
      "                  'coral.d18O': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]},\n",
      " 'recon_loc_rad': 25000,\n",
      " 'recon_nens': 100,\n",
      " 'recon_period': [0, 2000],\n",
      " 'recon_seeds': [0,\n",
      "                 1,\n",
      "                 2,\n",
      "                 3,\n",
      "                 4,\n",
      "                 5,\n",
      "                 6,\n",
      "                 7,\n",
      "                 8,\n",
      "                 9,\n",
      "                 10,\n",
      "                 11,\n",
      "                 12,\n",
      "                 13,\n",
      "                 14,\n",
      "                 15,\n",
      "                 16,\n",
      "                 17,\n",
      "                 18,\n",
      "                 19],\n",
      " 'recon_timescale': 1,\n",
      " 'recon_vars': 'tas'}\n"
     ]
    }
   ],
   "source": [
    "job = LMRt.ReconJob()\n",
    "job.load_configs(\n",
    "    job_dirpath='./testcases/PAGES2k_CCSM4_GISTEMP/recon_high',\n",
    "    cfg_path='./testcases/PAGES2k_CCSM4_GISTEMP/configs.yml',\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's prepare our `job` with the `.prepare()` method, which will take care of the steps prior to the \"Data assimilation\" section in the previous notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare the `job`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[36mLMRt: job.load_proxydb() >>> job.configs[\"proxydb_path\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/proxy/pages2k_dataset.pkl\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.load_proxydb() >>> 692 records loaded\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.load_proxydb() >>> job.proxydb created\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.filter_proxydb() >>> filtering proxy records according to: ['coral.d18O', 'coral.SrCa', 'coral.calc']\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.filter_proxydb() >>> 95 records remaining\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.seasonalize_proxydb() >>> seasonalizing proxy records according to: {'coral.d18O': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'coral.SrCa': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'coral.calc': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.seasonalize_proxydb() >>> 95 records remaining\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.seasonalize_proxydb() >>> job.proxydb updated\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.load_prior() >>> loading model prior fields from: {'tas': '/Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/prior/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-184912.nc'}\u001b[0m\n",
      "Time axis not overlap with the reference period [1951, 1980]; use its own time period as reference [850.08, 1850.00].\n",
      "\u001b[1m\u001b[30mLMRt: job.load_prior() >>> raw prior\u001b[0m\n",
      "Dataset Overview\n",
      "-----------------------\n",
      "\n",
      "     Name:  tas\n",
      "   Source:  /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/prior/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-184912.nc\n",
      "    Shape:  time:12000, lat:96, lon:144\n",
      "\u001b[1m\u001b[32mLMRt: job.load_prior() >>> job.prior created\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.load_obs() >>> loading instrumental observation fields from: {'tas': '/Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/obs/gistemp1200_ERSSTv4.nc'}\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.load_obs() >>> job.obs created\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.calibrate_psm() >>> job.configs[\"precalc\"][\"seasonalized_prior_path\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/seasonalized_prior.pkl\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.calibrate_psm() >>> job.configs[\"precalc\"][\"seasonalized_obs_path\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/seasonalized_obs.pkl\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.calibrate_psm() >>> job.configs[\"precalc\"][\"prior_loc_path\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/prior_loc.pkl\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.calibrate_psm() >>> job.configs[\"precalc\"][\"obs_loc_path\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/obs_loc.pkl\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.calibrate_psm() >>> job.configs[\"precalc\"][\"calibed_psm_path\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/calibed_psm.pkl\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.seasonalize_ds_for_psm() >>> job.configs[\"ptype_season\"] = {'coral.d18O': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'coral.SrCa': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'coral.calc': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.seasonalize_ds_for_psm() >>> Seasonalizing variables from prior with season: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Searching nearest location:  13%|█▎        | 12/95 [00:00<00:00, 112.97it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[32mLMRt: job.seasonalize_ds_for_psm() >>> job.seasonalized_prior created\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Searching nearest location: 100%|██████████| 95/95 [00:00<00:00, 107.38it/s]\n",
      "/Users/fzhu/Github/LMRt/LMRt/utils.py:243: RuntimeWarning: Mean of empty slice\n",
      "  tmp = np.nanmean(var[inds, ...], axis=0)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[32mLMRt: job.proxydb.find_nearest_loc() >>> job.proxydb.prior_lat_idx & job.proxydb.prior_lon_idx created\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.proxydb.get_var_from_ds() >>> job.proxydb.records[pid].prior_time & job.proxydb.records[pid].prior_value created\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.seasonalize_ds_for_psm() >>> job.configs[\"ptype_season\"] = {'coral.d18O': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'coral.SrCa': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 'coral.calc': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.seasonalize_ds_for_psm() >>> Seasonalizing variables from obs with season: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Searching nearest location:  11%|█         | 10/95 [00:00<00:00, 97.04it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[32mLMRt: job.seasonalize_ds_for_psm() >>> job.seasonalized_obs created\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Searching nearest location: 100%|██████████| 95/95 [00:00<00:00, 99.05it/s] \n",
      "Calibrating PSM:   9%|▉         | 9/95 [00:00<00:00, 86.19it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[32mLMRt: job.proxydb.find_nearest_loc() >>> job.proxydb.obs_lat_idx & job.proxydb.obs_lon_idx created\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.proxydb.get_var_from_ds() >>> job.proxydb.records[pid].obs_time & job.proxydb.records[pid].obs_value created\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.proxydb.init_psm() >>> job.proxydb.records[pid].psm initialized\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.calibrate_psm() >>> PSM calibration period: [1850, 2015]\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Calibrating PSM:  62%|██████▏   | 59/95 [00:00<00:00, 92.65it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The number of overlapped data points is 0 < 25. Skipping ...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Calibrating PSM: 100%|██████████| 95/95 [00:01<00:00, 91.55it/s]\n",
      "Forwarding PSM: 100%|██████████| 95/95 [00:00<00:00, 1516.82it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[32mLMRt: job.proxydb.calib_psm() >>> job.proxydb.records[pid].psm calibrated\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.proxydb.calib_psm() >>> job.proxydb.calibed created\u001b[0m\n",
      "\u001b[1m\u001b[32mLMRt: job.proxydb.forward_psm() >>> job.proxydb.records[pid].psm forwarded\u001b[0m\n",
      "\u001b[1m\u001b[30mLMRt: job.seasonalize_prior() >>> seasonalized prior w/ season [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]\u001b[0m\n",
      "Dataset Overview\n",
      "-----------------------\n",
      "\n",
      "     Name:  tas\n",
      "   Source:  /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/prior/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-184912.nc\n",
      "    Shape:  time:1001, lat:96, lon:144\n",
      "\u001b[1m\u001b[32mLMRt: job.seasonalize_ds_for_psm() >>> job.prior updated\u001b[0m\n",
      "\u001b[1m\u001b[30mLMRt: job.regrid_prior() >>> regridded prior\u001b[0m\n",
      "Dataset Overview\n",
      "-----------------------\n",
      "\n",
      "     Name:  tas\n",
      "   Source:  /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/prior/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-184912.nc\n",
      "    Shape:  time:1001, lat:42, lon:63\n",
      "\u001b[1m\u001b[32mLMRt: job.regrid_prior() >>> job.prior updated\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.prepare() >>> Prepration data saved to: /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/job.pkl\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.prepare() >>> job.configs[\"precalc\"][\"prep_savepath\"] = /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/job.pkl\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "job.prepare(verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run the `job`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, now we are ready to run the job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "KF updating:   2%|▏         | 49/2001 [00:00<00:04, 482.46it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[36mLMRt: job.run() >>> job.configs[\"recon_seeds\"] = [0]\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.run() >>> job.configs[\"save_settings\"] = {'compress_dict': {'zlib': True, 'least_significant_digit': 1}, 'output_geo_mean': False, 'target_lats': [], 'target_lons': [], 'output_full_ens': False, 'dtype': 32}\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.run() >>> job.configs saved to: /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/job_configs.yml\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.run() >>> seed: 0 | max: 0\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.run() >>> randomized indices for prior and proxies saved to: /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/job_r00_idx.pkl\u001b[0m\n",
      "Proxy Database Overview\n",
      "-----------------------\n",
      "     Source:        /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/data/proxy/pages2k_dataset.pkl\n",
      "       Size:        70\n",
      "Proxy types:        {'coral.calc': 6, 'coral.SrCa': 19, 'coral.d18O': 45}\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "KF updating: 100%|██████████| 2001/2001 [01:09<00:00, 28.68it/s] \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m\u001b[36mLMRt: job.save_recon() >>> Reconstructed fields saved to: /Users/fzhu/Github/LMRt/docsrc/tutorial/testcases/PAGES2k_CCSM4_GISTEMP/recon_high/job_r00_recon.nc\u001b[0m\n",
      "\u001b[1m\u001b[36mLMRt: job.run() >>> DONE!\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "job.run(recon_seeds=np.arange(1), verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once done, we will get the struture below in the \"recon_high\" directory:\n",
    "```\n",
    ".\n",
    "├── calibed_psm.pkl\n",
    "├── job_configs.yml\n",
    "├── job_r00_idx.pkl\n",
    "├── job_r00_recon.nc\n",
    "├── job.pkl\n",
    "├── obs_loc.pkl\n",
    "├── prior_loc.pkl\n",
    "├── seasonalized_obs.pkl\n",
    "└── seasonalized_prior.pkl\n",
    "```\n",
    "\n",
    "For the visualization of the results, please move on to the tutorial regarding visualizations."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}