00 - Build the Intake-ESM CatalogΒΆ
We can build an intake-esm
catalog from the history files. During this analysis, we do not convert from history to timeseries.
from ecgtools import Builder
from ecgtools.parsers.cesm import parse_cesm_history
from config import analysis_config
import pandas as pd
analysis_config['case_data_paths']
['/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.004/ocn/hist',
'/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.002/ocn/hist',
'/glade/scratch/hannay/archive/b1850.f19_g17.validation_nuopc.004_copy2/ocn/hist']
b = Builder(
# Directories with the output
analysis_config['case_data_paths'],
# Depth of 1 since we are sending it to the case output directory
depth=1,
# Exclude the timeseries and restart directories
exclude_patterns=["*/tseries/*", "*/rest/*"],
# Number of jobs to execute - should be equal to # threads you are using
njobs=-1,
)
b.build(parse_cesm_history)
<class 'list'>
[PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.004/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.002/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_nuopc.004_copy2/ocn/hist')]
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 0.2s remaining: 0.0s
[Parallel(n_jobs=-1)]: Done 3 out of 3 | elapsed: 0.2s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 40 concurrent workers.
[Parallel(n_jobs=-1)]: Done 82 tasks | elapsed: 6.8s
[Parallel(n_jobs=-1)]: Done 208 tasks | elapsed: 10.0s
[Parallel(n_jobs=-1)]: Done 370 tasks | elapsed: 14.2s
[Parallel(n_jobs=-1)]: Done 568 tasks | elapsed: 19.4s
[Parallel(n_jobs=-1)]: Done 802 tasks | elapsed: 25.5s
[Parallel(n_jobs=-1)]: Done 1072 tasks | elapsed: 32.4s
[Parallel(n_jobs=-1)]: Done 1378 tasks | elapsed: 39.7s
[Parallel(n_jobs=-1)]: Done 1720 tasks | elapsed: 48.1s
[Parallel(n_jobs=-1)]: Done 2098 tasks | elapsed: 57.4s
[Parallel(n_jobs=-1)]: Done 2512 tasks | elapsed: 1.2min
[Parallel(n_jobs=-1)]: Done 2962 tasks | elapsed: 1.4min
[Parallel(n_jobs=-1)]: Done 3448 tasks | elapsed: 1.5min
[Parallel(n_jobs=-1)]: Done 3970 tasks | elapsed: 1.7min
[Parallel(n_jobs=-1)]: Done 4528 tasks | elapsed: 1.9min
[Parallel(n_jobs=-1)]: Done 5122 tasks | elapsed: 2.2min
[Parallel(n_jobs=-1)]: Done 5752 tasks | elapsed: 2.5min
[Parallel(n_jobs=-1)]: Done 6418 tasks | elapsed: 2.8min
[Parallel(n_jobs=-1)]: Done 7120 tasks | elapsed: 2.9min
[Parallel(n_jobs=-1)]: Done 7858 tasks | elapsed: 3.2min
[Parallel(n_jobs=-1)]: Done 8632 tasks | elapsed: 3.5min
[Parallel(n_jobs=-1)]: Done 9442 tasks | elapsed: 3.9min
[Parallel(n_jobs=-1)]: Done 10288 tasks | elapsed: 4.1min
[Parallel(n_jobs=-1)]: Done 11106 out of 11106 | elapsed: 4.2min finished
/glade/work/mgrover/git_repos/ecgtools/ecgtools/builder.py:193: UserWarning: Unable to parse 3 assets/files. A list of these assets can be found in `.invalid_assets` attribute.
parsing_func, parsing_func_kwargs
Builder(root_path=[PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.004/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_mct.002/ocn/hist'), PosixPath('/glade/scratch/hannay/archive/b1850.f19_g17.validation_nuopc.004_copy2/ocn/hist')], extension='.nc', depth=1, exclude_patterns=['*/tseries/*', '*/rest/*'], njobs=-1)
b.save(
# File path - could save as .csv (uncompressed csv) or .csv.gz (compressed csv)
analysis_config["catalog_csv"],
# Column name including filepath
path_column_name='path',
# Column name including variables
variable_column_name='variables',
# Data file format - could be netcdf or zarr (in this case, netcdf)
data_format="netcdf",
# Which attributes to groupby when reading in variables using intake-esm
groupby_attrs=["component", "stream", "case"],
# Aggregations which are fed into xarray when reading in data using intake
aggregations=[
{
"type": "join_existing",
"attribute_name": "date",
"options": {"dim": "time", "coords": "minimal", "compat": "override"},
}
],
)
Saved catalog location: ../data/cesm-validation-catalog.json and ../data/cesm-validation-catalog.csv
/glade/work/mgrover/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/ipykernel_launcher.py:17: UserWarning: Unable to parse 3 assets/files. A list of these assets can be found in ../data/invalid_assets_cesm-validation-catalog.csv.