01 - Compute Long Term Averages

For each case, compute the global 3D (nlat, nlon, z_t) mean

Imports

We include the lines at the beginning to make sure that any updates we make to the analysis_config.yml file are reflected in real time for this notebook

%load_ext autoreload
%autoreload 2

import intake
import ast
import yaml
from distributed import Client
from ncar_jobqueue import NCARCluster
import xarray as xr
from config import analysis_config

Spin up a Dask Cluster

cluster = NCARCluster()
cluster.scale(20)
client = Client(cluster)
client

Client

Client-1dcbcc90-1b50-11ec-a00d-3cecef1aca66

Connection method: Cluster object Cluster type: dask_jobqueue.PBSCluster
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/mgrover/proxy/8787/status

Cluster Info

cluster

Open the Intake-ESM Catalog

In the first notebook, we created a an intake-esm catalog which provides a means of accessing our data.

data_catalog = intake.open_esm_datastore(
    analysis_config["catalog_json"],
    csv_kwargs={"converters": {"variables": ast.literal_eval}},
    sep="/",
)
data_catalog

None catalog with 12 dataset(s) from 11103 asset(s):

unique
component 1
stream 4
date 2501
case 3
member_id 2
frequency 4
variables 545
path 11103
data_catalog_subset = data_catalog.search(
    frequency='month_1',
)
data_catalog_subset

None catalog with 3 dataset(s) from 3600 asset(s):

unique
component 1
stream 1
date 1200
case 3
member_id 2
frequency 1
variables 434
path 3600

Subset the last 20 years of data

dates = sorted(data_catalog_subset.df.date.unique())
data_catalog_subset = data_catalog_subset.search(variables=analysis_config['variables'],
                                                 date=dates[-240:])

Read in our dataset using to_dataset_dict()

dsets = data_catalog_subset.to_dataset_dict(cdf_kwargs={'use_cftime': True, 'chunks': {'time': 60}})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'component/stream/case'
100.00% [3/3 00:02<00:00]
dsets[f"ocn/pop.h/{analysis_config['reference_case_name']}"]
<xarray.Dataset>
Dimensions:           (time: 240, nlat: 384, nlon: 320, z_t: 60, z_t_150m: 15)
Coordinates:
  * time              (time) object 0081-02-01 00:00:00 ... 0101-01-01 00:00:00
  * z_t               (z_t) float32 500.0 1.5e+03 ... 5.125e+05 5.375e+05
  * z_t_150m          (z_t_150m) float32 500.0 1.5e+03 ... 1.35e+04 1.45e+04
    ULONG             (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
    ULAT              (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
    TLONG             (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
    TLAT              (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
Dimensions without coordinates: nlat, nlon
Data variables: (12/27)
    SFWF              (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    photoC_TOT_zint   (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    SiO3              (time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
    BSF               (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    photoC_sp_zint    (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    photoC_diat_zint  (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    ...                ...
    CaCO3_FLUX_100m   (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    diaz_Nfix         (time, z_t_150m, nlat, nlon) float32 dask.array<chunksize=(1, 15, 384, 320), meta=np.ndarray>
    SHF               (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    DOCr              (time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
    SHF_QSW           (time, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    NH4               (time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
Attributes:
    cell_methods:            cell_methods = time: mean ==> the variable value...
    contents:                Diagnostic and Prognostic Variables
    title:                   b1850.f19_g17.validation_mct.004
    revision:                $Id$
    time_period_freq:        month_1
    Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
    calendar:                All years have exactly  365 days.
    source:                  CCSM POP2, the CCSM Ocean Component
    model_doi_url:           https://doi.org/10.5065/D67H1H0V
    intake_esm_varname:      SFWF\nphotoC_TOT_zint\nSiO3\nBSF\nphotoC_sp_zint...
    history:                 none
    intake_esm_dataset_key:  ocn/pop.h/b1850.f19_g17.validation_mct.004

Loop through the data and compute!

We are computing the average over time, and merging into a single dataset, subsetting for the variables specified in the analysis_config.yml file

xr.set_options(keep_attrs=True)
<xarray.core.options.set_options at 0x2ac18cd7f990>
ds_list = []
for key in dsets.keys():
    ds = dsets[key]
    mean = ds.mean(dim='time')
    ds_list.append(mean)
merged_ds = xr.concat(ds_list, dim='case')

We also want to make sure that we keep the title, or case information

cases = []
for ds in ds_list:
    cases.append(ds.title)
merged_ds['case'] = cases
merged_ds
<xarray.Dataset>
Dimensions:           (case: 3, nlat: 384, nlon: 320, z_t: 60, z_t_150m: 15)
Coordinates:
  * z_t               (z_t) float32 500.0 1.5e+03 ... 5.125e+05 5.375e+05
  * z_t_150m          (z_t_150m) float32 500.0 1.5e+03 ... 1.35e+04 1.45e+04
    ULONG             (nlat, nlon) float64 321.1 322.3 323.4 ... 319.6 320.0
    ULAT              (nlat, nlon) float64 -78.95 -78.95 -78.95 ... 72.41 72.41
    TLONG             (nlat, nlon) float64 320.6 321.7 322.8 ... 319.4 319.8
    TLAT              (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.19 72.19
  * case              (case) <U34 'b1850.f19_g17.validation_mct.004' ... 'b18...
Dimensions without coordinates: nlat, nlon
Data variables: (12/27)
    SFWF              (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    photoC_TOT_zint   (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    SiO3              (case, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
    BSF               (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    photoC_sp_zint    (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    photoC_diat_zint  (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    ...                ...
    CaCO3_FLUX_100m   (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    diaz_Nfix         (case, z_t_150m, nlat, nlon) float32 dask.array<chunksize=(1, 15, 384, 320), meta=np.ndarray>
    SHF               (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    DOCr              (case, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
    SHF_QSW           (case, nlat, nlon) float32 dask.array<chunksize=(1, 384, 320), meta=np.ndarray>
    NH4               (case, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 60, 384, 320), meta=np.ndarray>
Attributes:
    cell_methods:            cell_methods = time: mean ==> the variable value...
    contents:                Diagnostic and Prognostic Variables
    title:                   b1850.f19_g17.validation_mct.004
    revision:                $Id$
    time_period_freq:        month_1
    Conventions:             CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netcdf...
    calendar:                All years have exactly  365 days.
    source:                  CCSM POP2, the CCSM Ocean Component
    model_doi_url:           https://doi.org/10.5065/D67H1H0V
    intake_esm_varname:      SFWF\nphotoC_TOT_zint\nSiO3\nBSF\nphotoC_sp_zint...
    history:                 none
    intake_esm_dataset_key:  ocn/pop.h/b1850.f19_g17.validation_mct.004

Export our data

We output our dataset to zarr!

merged_ds.to_zarr('cached_output/averages_year_081_100.zarr', mode='w')
<xarray.backends.zarr.ZarrStore at 0x2ac1d5273550>