4. ESDC Generation¶
This section explains how a ESDC is generated and how it can be extended by new variables.
4.1. Command-Line Tool¶
To generate new data cubes or to update existing ones a dedicated command-line tool cube-gen
is used.
After installing cablab-core
as described in section Installation, try:
$ cube-gen --help
CAB-LAB command-line interface, version 0.2.2
usage: cube-gen [-h] [-l] [-G] [-c CONFIG] [TARGET] [SOURCE [SOURCE ...]]
Generates a new CAB-LAB data cube or updates an existing one.
positional arguments:
TARGET data cube root directory
SOURCE <provider name>:dir=<directory>, use -l to list source
provider names
optional arguments:
-h, --help show this help message and exit
-l, --list list all available source providers
-G, --dont-clear-cache
do not clear data cache before updating the cube
(faster)
-c CONFIG, --cube-conf CONFIG
data cube configuration file
The list
option lists all currently installed source data providers:
$ cube-gen --list
ozone -> cablab.providers.ozone.OzoneProvider
net_ecosystem_exchange -> cablab.providers.mpi_bgc.MPIBGCProvider
air_temperature -> cablab.providers.air_temperature.AirTemperatureProvider
interception_loss -> cablab.providers.gleam.GleamProvider
transpiration -> cablab.providers.gleam.GleamProvider
open_water_evaporation -> cablab.providers.gleam.GleamProvider
...
Source data providers are the pluggable software components used by cube-gen
to read data from a
source directory and transform it into a common data cube structure. The list above shows the mapping from
short names to be used by the cube-gen
command-line to the actual Python code, e.g. for ozone
,
the OzoneProvider
class of the cablab/providers/ozone.py module is used.
The common cube structure is established by a cube configuration file provided by the cube-config
option.
Here is the configuration file that is used to produce the low-resolution ESDC. It will produce a 0.25 degrees global
cube that whose source data will aggregated/interpolated to match 8 day periods and then resampled to match
1440 x 720 spatial grid cells:
model_version = '0.2.4'
spatial_res = 0.25
temporal_res = 8
grid_width = 1440
grid_height = 720
start_time = datetime.datetime(2001, 1, 1, 0, 0)
end_time = datetime.datetime(2012, 1, 1, 0, 0)
ref_time = datetime.datetime(2001, 1, 1, 0, 0)
calendar = 'gregorian'
file_format = 'NETCDF4_CLASSIC'
compression = False
To create or update a cube call the cube-gen
tool with the configuration and the cube data provider(s).
The cube data providers can have parameters on their own. All current providers have the dir
parameter
indicating the source data directory but this is not a rule. Other providers which read from
multivariate sources also have a var
parameter to indicate which variable of many possible should be used.
$ cube-gen mycube -c mycube.config ozone:dir=/path/to/ozone/netcdfs
will create the cube mycube
in current directory using the mycube.config
configuration and add a single
variable ozone
from source NetCDF files in /path/to/ozone/netcdfs
.
Note, the GitHub repository cube-config is used to keep the configurations of individual ESDC versions.
4.2. Writing a new Provider¶
In order to add new source data for which there is no source data provider yet, you can write your own.
Make sure cablab-core
is installed as described in section Installation above.
If your source data is NetCDF, writing a new provider is easy. Just copy one of the existing providers, e.g. cablab/providers/ozone.py and start adopting the code to your needs.
For source data other than NetCDF, you will have to write a provider from scratch by implementing
the cablab.CubeSourceProvider
interface or by extending the cablab.BaseCubeSourceProvider
which is usually easier. Make sure you adhere to the contract described in the documentation of the respective class.
To run your provider you will have to register it in the setup.py
file. Assuming your provider is called
sst
and your provider class is SeaSurfaceTemperatureProvider
located in
myproviders.py
, then the entry_points
section of the setup.py
file should reflect this as follows:
entry_points={
'cablab.source_providers': [
'burnt_area = cablab.providers.burnt_area:BurntAreaProvider',
'c_emissions = cablab.providers.c_emissions:CEmissionsProvider',
'ozone = cablab.providers.ozone:OzoneProvider',
...
'sst = myproviders:SeaSurfaceTemperatureProvider',
To run it:
$ cube-gen mycube -c mycube.config sst:dir=/path/to/sst/netcdfs
4.3. Sharing a Provider¶
If you plan to distribute and share your provider, you should create your own Python module separate
from cablab-core
with a dedicated setup.py
with only your providers listed in the entry_points
section.
Other users may then install your module on top of an cablab-core
to make use of your plugin.
4.4. Python Cube API Reference¶
Data Cube read-only access:
from cablab import Cube
from datetime import datetime
cube = Cube.open('./cablab-cube-v05')
data = cube.data.get(['LAI', 'Precip'], [datetime(2001, 6, 1), datetime(2012, 1, 1)], 53.2, 12.8)
Data Cube creation/update:
from cablab import Cube, CubeConfig
from datetime import datetime
cube = Cube.create('./my-cablab-cube', CubeConfig(spatial_res=0.05))
cube.update(MyVar1SourceProvider(cube.config, './my-cube-sources/var1'))
cube.update(MyVar2SourceProvider(cube.config, './my-cube-sources/var2'))
-
class
cablab.
Cube
(base_dir, config)[source]¶ Represents a data cube. Use the static open() or create() methods to obtain data cube objects.
-
base_dir
¶ The cube’s base directory.
-
closed
¶ Checks if the cube has been closed.
-
config
¶ The cube’s configuration. See CubeConfig class.
-
static
create
(base_dir, config=CubeConfig(spatial_res=0.250000, grid_x0=0, grid_y0=0, grid_width=1440, grid_height=720, temporal_res=8, ref_time=datetime.datetime(2001, 1, 1, 0, 0)))[source]¶ Create a new data cube. Use the Cube.update(provider) method to add data to the cube via a source data provider.
Parameters: - base_dir – The data cube’s base directory. Must not exists.
- config – The data cube’s static information.
Returns: A cube instance.
-
data
¶ The cube’s data which is an instance of the CubeDataAccess class.
-
info
() → str[source]¶ Return a human-readable information string about this data cube (markdown formatted).
-
-
class
cablab.
CubeConfig
(spatial_res=0.25, grid_x0=0, grid_y0=0, grid_width=1440, grid_height=720, temporal_res=8, calendar='gregorian', ref_time=datetime.datetime(2001, 1, 1, 0, 0), start_time=datetime.datetime(2001, 1, 1, 0, 0), end_time=datetime.datetime(2012, 1, 1, 0, 0), variables=None, file_format='NETCDF4_CLASSIC', compression=False, chunk_sizes=None, static_data=False, model_version='1.0.1')[source]¶ A data cube’s static configuration information.
Parameters: - spatial_res – The spatial image resolution in degree.
- grid_x0 – The fixed grid X offset (longitude direction).
- grid_y0 – The fixed grid Y offset (latitude direction).
- grid_width – The fixed grid width in pixels (longitude direction).
- grid_height – The fixed grid height in pixels (latitude direction).
- temporal_res – The temporal resolution in days.
- ref_time – A datetime value which defines the units in which time values are given, namely ‘days since ref_time’.
- start_time – The inclusive start time of the first image of any variable in the cube given as datetime value.
None
means unlimited. - end_time – The exclusive end time of the last image of any variable in the cube given as datetime value.
None
means unlimited. - variables – A list of variable names to be included in the cube.
- file_format – The file format used. Must be one of ‘NETCDF4’, ‘NETCDF4_CLASSIC’, ‘NETCDF3_CLASSIC’ or ‘NETCDF3_64BIT’.
- compression – Whether the data should be compressed.
-
date2num
(date) → float[source]¶ Return the number of days for the given date as a number in the time units given by the
time_units
property.Parameters: date – The date as a datetime.datetime value
-
easting
¶ The latitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).
-
geo_bounds
¶ The geographical boundary given as ((LL-lon, LL-lat), (UR-lon, UR-lat)).
-
static
load
(path) → object[source]¶ Load a CubeConfig from a text file.
Parameters: path – The file’s path name. Returns: A new CubeConfig instance
-
northing
¶ The longitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).
-
num_periods_per_year
¶ Return the integer number of target periods per year.
-
time_units
¶ Return the time units used by the data cube as string using the format ‘days since ref_time’.