00 - Build the Intake-ESM CatalogΒΆ

We can build an intake-esm catalog from the history files. During this analysis, we do not convert from history to timeseries.

from ecgtools import Builder
from ecgtools.parsers.cesm import parse_cesm_history
from config import analysis_config
import pandas as pd
analysis_config['case_data_paths']
['/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.004/ocn/hist',
 '/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.002/ocn/hist',
 '/glade/scratch/hannay/archive/b1850.f19_g17.validation_nuopc.004_copy2/ocn/hist']
b = Builder(
    # Directories with the output
    analysis_config['case_data_paths'],
    # Depth of 1 since we are sending it to the case output directory
    depth=1,
    # Exclude the timeseries and restart directories
    exclude_patterns=["*/tseries/*", "*/rest/*"],
    # Number of jobs to execute - should be equal to # threads you are using
    njobs=-1,
)
b.build(parse_cesm_history)
<class 'list'>
[PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.004/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.002/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_nuopc.004_copy2/ocn/hist')]
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done   3 out of   3 | elapsed:    0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done  82 tasks      | elapsed:    6.8s
[Parallel(n_jobs=-1)]: Done 208 tasks      | elapsed:   10.0s
[Parallel(n_jobs=-1)]: Done 370 tasks      | elapsed:   14.2s
[Parallel(n_jobs=-1)]: Done 568 tasks      | elapsed:   19.4s
[Parallel(n_jobs=-1)]: Done 802 tasks      | elapsed:   25.5s
[Parallel(n_jobs=-1)]: Done 1072 tasks      | elapsed:   32.4s
[Parallel(n_jobs=-1)]: Done 1378 tasks      | elapsed:   39.7s
[Parallel(n_jobs=-1)]: Done 1720 tasks      | elapsed:   48.1s
[Parallel(n_jobs=-1)]: Done 2098 tasks      | elapsed:   57.4s
[Parallel(n_jobs=-1)]: Done 2512 tasks      | elapsed:  1.2min
[Parallel(n_jobs=-1)]: Done 2962 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done 3448 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done 3970 tasks      | elapsed:  1.7min
[Parallel(n_jobs=-1)]: Done 4528 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done 5122 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 5752 tasks      | elapsed:  2.5min
[Parallel(n_jobs=-1)]: Done 6418 tasks      | elapsed:  2.8min
[Parallel(n_jobs=-1)]: Done 7120 tasks      | elapsed:  2.9min
[Parallel(n_jobs=-1)]: Done 7858 tasks      | elapsed:  3.2min
[Parallel(n_jobs=-1)]: Done 8632 tasks      | elapsed:  3.5min
[Parallel(n_jobs=-1)]: Done 9442 tasks      | elapsed:  3.9min
[Parallel(n_jobs=-1)]: Done 10288 tasks      | elapsed:  4.1min
[Parallel(n_jobs=-1)]: Done 11106 out of 11106 | elapsed:  4.2min finished
/glade/work/mgrover/git_repos/ecgtools/ecgtools/builder.py:193: UserWarning: Unable to parse 3 assets/files. A list of these assets can be found in `.invalid_assets` attribute.
  parsing_func, parsing_func_kwargs
Builder(root_path=[PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.004/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.002/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_nuopc.004_copy2/ocn/hist')], extension='.nc', depth=1, exclude_patterns=['*/tseries/*', '*/rest/*'], njobs=-1)
b.save(
    # File path - could save as .csv (uncompressed csv) or .csv.gz (compressed csv)
    analysis_config["catalog_csv"],
    # Column name including filepath
    path_column_name='path',
    # Column name including variables
    variable_column_name='variables',
    # Data file format - could be netcdf or zarr (in this case, netcdf)
    data_format="netcdf",
    # Which attributes to groupby when reading in variables using intake-esm
    groupby_attrs=["component", "stream", "case"],
    # Aggregations which are fed into xarray when reading in data using intake
    aggregations=[
        {
            "type": "join_existing",
            "attribute_name": "date",
            "options": {"dim": "time", "coords": "minimal", "compat": "override"},
        }
    ],
)
Saved catalog location: ../data/cesm-validation-catalog.json and ../data/cesm-validation-catalog.csv
/glade/work/mgrover/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/ipykernel_launcher.py:17: UserWarning: Unable to parse 3 assets/files. A list of these assets can be found in ../data/invalid_assets_cesm-validation-catalog.csv.