4. ESDC Generation¶
This section explains how a ESDC is generated and how it can be extended by new variables.
4.1. Command-Line Tool¶
To generate new data cubes or to update existing ones a dedicated command-line tool cube-gen
is used.
After installing esdl-core
as described in section Installation, try:
$ cube-gen --help
ESDL command-line interface, version 0.2.2
usage: cube-gen [-h] [-l] [-G] [-c CONFIG] [TARGET] [SOURCE [SOURCE ...]]
Generates a new ESDL data cube or updates an existing one.
positional arguments:
TARGET data cube root directory
SOURCE <provider name>:dir=<directory>, use -l to list source
provider names
optional arguments:
-h, --help show this help message and exit
-l, --list list all available source providers
-G, --dont-clear-cache
do not clear data cache before updating the cube
(faster)
-c CONFIG, --cube-conf CONFIG
data cube configuration file
The list
option lists all currently installed source data providers:
$ cube-gen --list
ozone -> esdl.providers.ozone.OzoneProvider
net_ecosystem_exchange -> esdl.providers.mpi_bgc.MPIBGCProvider
air_temperature -> esdl.providers.air_temperature.AirTemperatureProvider
interception_loss -> esdl.providers.gleam.GleamProvider
transpiration -> esdl.providers.gleam.GleamProvider
open_water_evaporation -> esdl.providers.gleam.GleamProvider
...
Source data providers are the pluggable software components used by cube-gen
to read data from a
source directory and transform it into a common data cube structure. The list above shows the mapping from
short names to be used by the cube-gen
command-line to the actual Python code, e.g. for ozone
,
the OzoneProvider
class of the esdl/providers/ozone.py module is used.
The common cube structure is established by a cube configuration file provided by the cube-config
option.
Here is the configuration file that is used to produce the low-resolution ESDC. It will produce a 0.25 degrees global
cube that whose source data will aggregated/interpolated to match 8 day periods and then resampled to match
1440 x 720 spatial grid cells:
model_version = '0.2.4'
spatial_res = 0.25
temporal_res = 8
grid_width = 1440
grid_height = 720
start_time = datetime.datetime(2001, 1, 1, 0, 0)
end_time = datetime.datetime(2012, 1, 1, 0, 0)
ref_time = datetime.datetime(2001, 1, 1, 0, 0)
calendar = 'gregorian'
file_format = 'NETCDF4_CLASSIC'
compression = False
To create or update a cube call the cube-gen
tool with the configuration and the cube data provider(s).
The cube data providers can have parameters on their own. All current providers have the dir
parameter
indicating the source data directory but this is not a rule. Other providers which read from
multivariate sources also have a var
parameter to indicate which variable of many possible should be used.
$ cube-gen mycube -c mycube.config ozone:dir=/path/to/ozone/netcdfs
will create the cube mycube
in current directory using the mycube.config
configuration and add a single
variable ozone
from source NetCDF files in /path/to/ozone/netcdfs
.
Note, the GitHub repository cube-config is used to keep the configurations of individual ESDC versions.
4.2. Writing a new Provider¶
In order to add new source data for which there is no source data provider yet, you can write your own.
Make sure esdl-core
is installed as described in section Installation above.
If your source data is NetCDF, writing a new provider is easy. Just copy one of the existing providers, e.g. esdl/providers/ozone.py and start adopting the code to your needs.
For source data other than NetCDF, you will have to write a provider from scratch by implementing
the esdl.CubeSourceProvider
interface or by extending the esdl.BaseCubeSourceProvider
which is usually easier. Make sure you adhere to the contract described in the documentation of the respective class.
To run your provider you will have to register it in the setup.py
file. Assuming your provider is called
sst
and your provider class is SeaSurfaceTemperatureProvider
located in
myproviders.py
, then the entry_points
section of the setup.py
file should reflect this as follows:
entry_points={
'esdl.source_providers': [
'burnt_area = esdl.providers.burnt_area:BurntAreaProvider',
'c_emissions = esdl.providers.c_emissions:CEmissionsProvider',
'ozone = esdl.providers.ozone:OzoneProvider',
...
'sst = myproviders:SeaSurfaceTemperatureProvider',
To run it:
$ cube-gen mycube -c mycube.config sst:dir=/path/to/sst/netcdfs
4.3. Sharing a Provider¶
If you plan to distribute and share your provider, you should create your own Python module separate
from esdl-core
with a dedicated setup.py
with only your providers listed in the entry_points
section.
Other users may then install your module on top of an esdl-core
to make use of your plugin.
4.4. Python Cube API Reference¶
Data Cube read-only access:
from esdl import Cube
from datetime import datetime
cube = Cube.open('./esdl-cube-v05')
data = cube.data.get(['LAI', 'Precip'], [datetime(2001, 6, 1), datetime(2012, 1, 1)], 53.2, 12.8)
Data Cube creation/update:
from esdl import Cube, CubeConfig
from datetime import datetime
cube = Cube.create('./my-esdl-cube', CubeConfig(spatial_res=0.05))
cube.update(MyVar1SourceProvider(cube.config, './my-cube-sources/var1'))
cube.update(MyVar2SourceProvider(cube.config, './my-cube-sources/var2'))
-
class
esdl.
Cube
(base_dir, config)[source]¶ Represents a data cube. Use the static open() or create() methods to obtain data cube objects.
-
property
base_dir
¶ The cube’s base directory.
-
property
closed
¶ Checks if the cube has been closed.
-
property
config
¶ The cube’s configuration. See CubeConfig class.
-
static
create
(base_dir, config=CubeConfig(grid_width=1440, grid_height=720, temporal_res=8, ref_time=datetime.datetime(2001, 1, 1, 0, 0)))[source]¶ Create a new data cube. Use the Cube.update(provider) method to add data to the cube via a source data provider.
- Parameters
base_dir – The data cube’s base directory. Must not exists.
config – The data cube’s static information.
- Returns
A cube instance.
-
property
data
¶ The cube’s data represented as an xarray dataset
-
info
() → str[source]¶ Return a human-readable information string about this data cube (markdown formatted).
-
property
-
class
esdl.
CubeConfig
(spatial_res=0.25, grid_x0=0, grid_y0=0, lon0=None, lon1=None, lat0=None, lat1=None, grid_width=1440, grid_height=720, temporal_res=8, calendar='gregorian', ref_time=datetime.datetime(2001, 1, 1, 0, 0), start_time=datetime.datetime(2001, 1, 1, 0, 0), end_time=datetime.datetime(2012, 1, 1, 0, 0), variables=None, file_format='NETCDF4_CLASSIC', chunk_sizes=None, compression=False, comp_level=5, static_data=False, model_version='2.0.2')[source]¶ A data cube’s static configuration information.
- Parameters
spatial_res – The spatial image resolution in degree.
lon0 – Left border of the most left grid cell
lon1 – Right border of the most right grid cell
lat0 – Upper border of the uppermost grid cell
lat1 – Lower border of the lowermost grid cell
grid_width – The fixed grid width in pixels (longitude direction).
grid_height – The fixed grid height in pixels (latitude direction).
temporal_res – The temporal resolution in days.
ref_time – A datetime value which defines the units in which time values are given, namely ‘days since ref_time’.
start_time – The inclusive start time of the first image of any variable in the cube given as datetime value.
None
means unlimited.end_time – The exclusive end time of the last image of any variable in the cube given as datetime value.
None
means unlimited.variables – A list of variable names to be included in the cube.
file_format – The file format used. Must be one of ‘NETCDF4’, ‘NETCDF4_CLASSIC’, ‘NETCDF3_CLASSIC’ or ‘NETCDF3_64BIT’.
chunk_sizes – A mapping of dimension names to chunk size for encoding. Default is None.
compression – Whether gzip compression is used for encoding. Default is False.
comp_level – Integer between 1 and 9 describing the level of compression desired for encoding. Default is 5. Ignored if compression is False.
-
date2num
(date) → float[source]¶ Return the number of days for the given date as a number in the time units given by the
time_units
property.- Parameters
date – The date as a datetime.datetime value
-
property
easting
¶ The latitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).
-
property
geo_bounds
¶ The geographical boundary given as ((LL-lon, LL-lat), (UR-lon, UR-lat)).
-
static
load
(path) → object[source]¶ Load a CubeConfig from a text file.
- Parameters
path – The file’s path name.
- Returns
A new CubeConfig instance
-
property
northing
¶ The longitude position of the upper-left-most corner of the upper-left-most grid cell given by (grid_x0, grid_y0).
-
property
num_periods_per_year
¶ Return the integer number of target periods per year.
-
property
time_units
¶ Return the time units used by the data cube as string using the format ‘days since ref_time’.