Welcome to h5geo’s documentation!

To see C++ API documentation please refer to tierra-colada.github.io/h5geo/

About

C++17 and python API to work with geo-data (seismic, wells, maps, other in process) based on HDF5. Aimed at geoscientists and developers.

General

In geoscience there are different types of data and many file formats for storing these data. Also the data belong to some spatial reference system. In the same time the developper or scientist prefer not to use many libraries when working with data (i.e. reading, converting spatial reference system and units). All we want is convenient API, fast execution and availability in python. That’s what h5geo is aimed at.

Being written in C++ it uses pybind11 to make python bindings, HDF5 for storing data, GDAL for spatial reference conversion, units for units conversion and Eigen as container for storing numerics in memory.

For now h5geo provides API to work with:

  • seismic

  • wells

    • deviations

    • logs

    • welltops

  • maps

  • points

License

MIT

Build instructions

Pre-requisites

  • CMake

  • compiler with C++17 support

  • python (optional)

Dependencies

Build using prebuilt dependencies

git clone https://github.com/tierra-colada/h5geo.git
cd h5geo
mkdir build
cd build
cmake ..
  # find package dirs
  -DEigen3_ROOT:PATH=/path/to/eigen
  -Dmio_ROOT:PATH=/path/to/mio
  -DZLIB_ROOT:PATH=/path/to/zlib
  -DHDF5_ROOT:PATH=/path/to/hdf5
  -Dh5gt_ROOT:PATH=/path/to/h5gt
  -Dmagic_enum_ROOT:PATH=/path/to/magic_enum
  -Dunits_DIR:PATH=$/path/to/units/lib/cmake/units
  -DTBB_ROOT:PATH=/path/to/tbb
  -DGDAL_ROOT:PATH=/path/to/gdal
  -DPYTHON_EXECUTABLE=/path/to/python.exe
  # Lib settings
  -DH5GEO_SUPERBUILD:BOOL=OFF
  -DH5GEO_USE_THREADS:BOOL=ON
  -DH5GEO_USE_GDAL:BOOL=ON
  -DH5GEO_BUILD_SHARED_LIBS:BOOL=ON
  -DH5GEO_BUILD_TESTS:BOOL=OFF
  -DH5GEO_BUILD_h5geopy:BOOL=ON
  # to 'import h5geopy' to python runtime deps should be resolved
  -DH5GEOPY_RESOLVE_RUNTIME_DEPS:BOOL=ON
  -DH5GEOPY_COPY_RUNTIME_DEPS:BOOL=OFF
  # path to runtimes (used at h5geopy installation step)
  -DHDF5_RUNTIME_DIRS:PATH=/path/to/hdf5/bin
  -DZLIB_RUNTIME_DIRS:PATH=/path/to/zlib/bin
  -DUNITS_RUNTIME_DIRS:PATH=/path/to/units/bin
  -DTBB_RUNTIME_DIRS:PATH=/path/to/tbb/bin
  -DGDAL_RUNTIME_DIRS:PATH=/path/to/tbb/bin
  -DH5GEO_RUNTIME_DIRS:PATH=/path/to/h5geo/bin
  # CMake settings
  -DCMAKE_BUILD_TYPE=Release
  -G "some generator"
cmake --build . --config Release
cmake --install . --prefix /path/to/h5geo-install

Note

If you see that some of the dependencies were not resolved at installation step then change <lib>_RUNTIME_DIRS, rerun cmake & cmake install steps.

Warning

H5GEOPY_COPY_RUNTIME_DEPS copies all the resolved runtimes to site-packages/h5geopy dir. There might be OS-specific runtimes that nobody wants to copy. Thus this option is not recommended yet.

Build using superbuild

git clone https://github.com/tierra-colada/h5geo.git
cd h5geo
mkdir build
cd build
cmake ..
  -DCMAKE_INSTALL_PREFIX=/path/to/h5geo-install
  -DCMAKE_BUILD_TYPE=Release
  -DH5GEO_SUPERBUILD=ON
  -DH5GEO_BUILD_h5geopy=ON
  -DH5GEO_USE_THREADS=ON
  -DH5GEO_BUILD_SHARED_LIBS=ON
  -DH5GEO_USE_GDAL=ON
  -DH5GEO_BUILD_TESTS=OFF
  -DCOPY_H5GTPY_RUNTIME_DEPS=OFF
  -DRESOLVE_H5GTPY_RUNTIME_DEPS=ON
  -DH5GEOPY_COPY_RUNTIME_DEPS=OFF
  -DH5GEOPY_RESOLVE_RUNTIME_DEPS=ON
  -DPYTHON_EXECUTABLE=/path/to/python.exe

Note

h5gtpy and GDAL (osgeo) will be installed in site-packages as well.

Build using pip

pip install git+https://github.com/tierra-colada/h5geo.git@<tag> --verbose

where <tag> is git-tag.

Note

h5gtpy and GDAL (osgeo) will be installed in site-packages as well.

h5geo dependencies will be installed in site-packages/h5geopy.<some_postfix> dir.

Warning

No prebuilt wheels is supplied yet. pip install simply runs superbuild. Thus installation takes pretty much time. The verbose option helps you to see the build progress and catch errors if any.

Supported platforms

Windows, Linux, MacOS

API & Basic Usage

Description

h5geo follows interface strategy. That means API is exposed through implementation classes (suffixed with Impl). Also there is no public constructors. To instantiate a class one needs to use factory function.

The most basic class is H5Base. It can’t be instantiated explicitely.

The classes H5BaseContainer and H5BaseObject are direct successors of H5Base. The main difference between them is that H5BaseContainer is built around h5gt::File while H5BaseObject explores h5gt::Group. Thus container (HDF5 file) may store many geo-objects (HDF5 objects).

Next in the hierarchy follows more specific classes: H5SeisContainer/H5Seis, H5MapContainer/H5Map, H5WellContainer/H5Well, H5DevCurve, H5LogCurve, H5BasePoints, H5Points1 etc.

Note

A designated container may store only appropriated geo-objects. In that sense you can’t store H5Map in H5SeisContaner for example. But as you can see there is no container for H5Points. That means you are free to create and store points in any container.

Warning

h5geo works with column-major Eigen matrices only (the default Eigen storage order)!

Usage

containers

To create a container one should use one the following factory functions:

H5BaseContainer* createContainer(
   h5gt::File h5File, h5geo::ContainerType cntType, h5geo::CreationType createFlag);

H5BaseContainer* createContainerByName(
   std::string& fileName, h5geo::ContainerType cntType, h5geo::CreationType createFlag);

Using dynamic_cast<> one is able to cast returned container to the one defined with h5geo::ContainerType argument.

Note

There are helper functions to create specific containers: h5geo::createMapContainer, h5geo::createMapContainerByName that implicitely cast returned type to H5Map (most of geo-objects have such functions).

And to open a container:

H5BaseContainer* openContainer(
   h5gt::File h5File);

H5BaseContainer* openContainerByName(
   const std::string& fileName);

Note

As when creating object you may want to use helper functions like: h5geo::openMapContainer, h5geo::openMapContainerByName that implicitely cast returned type to H5Map (most of geo-objects have such functions).

geo-objects

All geo-objects are represented by h5gt::Group with assotiated and nested DataSets, Groups and Attributes.

To create a geo-object one must have an instance of an appropriate container (or parent geo-object: for example H5DevCurve and H5LogCurve have H5Well as parent). Thus to create H5Well one must have instance of H5WellContainer and use one of the following method:

H5Well* H5WellContainer::createWell(
   std::string& name,
   H5WellParam& p,
   h5geo::CreationType createFlag)

H5Well* H5WellContainer::createWell(
   h5gt::Group group,
   H5WellParam& p,
   h5geo::CreationType createFlag)

H5WellParam inherites H5BaseObjectParam and defines parameters like: spatial reference, length/temporal/angular/data units, null value, uwi, kelly bushing and well head coordinates.

Note

Most geo-objects may be created/opened with name or h5gt::Group object.

To open a geo-object one may use parent object instance:

H5Well* H5WellContainer::openWell(
   const std::string& name);
H5Well* H5WellContainer::openWell(
   h5gt::Group group);

Note

There are helper functions to open them without having parent object: h5geo::openWell, h5geo::openWellByName

Or more generally (use it in pair with dynamic_cast<>): h5geo::openObject, h5geo::openObjectByName

pointers

Currently only unique pointers are provided. They are named in the following manner: H5WellCnt_ptr and H5Well_ptr.

The preffered way to create objects:

#include <iostream>
#include <h5geo/h5wellcontainer.h>
#include <h5geo/h5well.h>

int main(){
   std::string fileName = "wells.h5";
   H5WellCnt_ptr wellCnt(h5geo::createWellContainerByName(
      fileName, h5geo::CreationType::OPEN_OR_CREATE));

   if (!wellCnt){
      std::cout << "Unable to open or create well container" << std::endl;
      return -1;
   }

   H5WellParam p;
   p.headX = 444363;
   p.headY = 7425880;
   p.kb = 50.88;
   p.uwi = "my_uwi";
   p.lengthUnits = "meter";

   std::string wellName = "myWell";
   H5Well_ptr well(wellCnt->createWell(
      wellName, p, h5geo::CreationType::OPEN_OR_CREATE));
   if (!well){
      std::cout << "Unable to open or create well" << std::endl;
      return -1;
   }

   return 0;
}

units & spatial reference

units

All geo objects have spatial reference and length/temporal/angular/data units. Not all them may be used by geo-object but the idea is: a geo-object must match to the units.

That means if for example H5Well has length units meter then all length units must be given in meters (header coordinates as well as kelly bushing for example). The same also concerns temporal, angular and data units.

Data units is units of Z-axis of H5Map object for example. Or units of H5Seis traces (maybe psi in case of marine seismic).

Every geo-object provides API to automatically convert units. For example when writing data to H5Map one is free to specify the data units that the data currently is in: bool H5Map::writeData(Eigen::Ref<Eigen::MatrixXd> M, "m")

The data will be converted from m to H5Map::getDataUnits.

And one needs to get data in some units: Eigen::MatrixXd H5Map::getData("cm")

Then the conversion is done in reverse order: from H5Map::getDataUnits to cm.

Note

No conversion is done if no units were specified.

Sometimes it is impossible to predict what units are going to be used. For example when working with seismic trace headers the API provides two arguments: unitsFrom and unitsTo. The conversion is done in direct order: unitsFrom will be converted to unitsTo.

One can check if the units are convertable through the web-service.

spatial reference

Spatial reference is given from PROJ-install/share/proj/proj.db. In projected_crs table find auth_name and code columns. Usually the spatial reference is shaped as: auth_name:code.

An example: EPSG:32056.

Basically OGRSpatialReference::SetFromUserInput function is used to create spatial reference object.

If you work with many objects that belong to different spatial reference you may want to set a spatial reference for the session and pass doCoordTransform as true when writing/getting the data. Take a look at h5core_sr_settings.h.

Tutorial

As h5geo consists of several independent geo-types then the tutorial is splitted in several parts.

Maps

Map is an object that is represented by a matrix with coordinates of upper-left matrix corner (origin X0, Y0), coordinates of upper-right matrix corner (point1 X1, Y1) and coordinates of lower-left matrix corner (point2 X2, Y2). Thus only single Z value corresponds for each XY point.

Create Map

Define map parameters for column-major Eigen matrix of size [20, 10] whose Z-values represent time (ms) and the coordinates are given in millimeters.

#include <iostream>
#include <h5geo/h5mapcontainer.h>
#include <h5geo/h5map.h>


int main(){
   std::string fileName = "maps.h5";
   H5MapCnt_ptr cnt(h5geo::createMapContainerByName(
      fileName, h5geo::CreationType::OPEN_OR_CREATE));
   if (!cnt){
      std::cout << "Unable to open or create map container" << std::endl;
      return -1;
   }

   H5MapParam p;
   p.nX = 10;
   p.nY = 20;
   p.X0 = 0;
   p.Y0 = 0;
   p.X1 = 100;
   p.Y1 = 0;
   p.X2 = 0;
   p.Y2 = 100;
   p.lengthUnits = "millimeter";
   p.dataUnits = "ms";
   p.xChunkSize = 5;
   p.yChunkSize = 5;
   p.spatialReference = "EPSG:8139";

   std::string mapName = "myMap";
   H5Map_ptr map(cnt->createMap(
      mapName, p, h5geo::CreationType::OPEN_OR_CREATE));
   if (!map){
      std::cout << "Unable to open or create map" << std::endl;
      return -1;
   }

   return 0;
}

Write/Read data

Let’s suppose that we have data in sec and we want to write it. To be sure that h5geo is able to convert sec to ms we can use the web-service.

Eigen::MatrixXd m = Eigen::MatrixXd::Random(p.nY, p.nX);
if (!map->writeData(m, "sec")){
   std::cout << "Unable to write data" << std::endl;
   return -1;
}

Now we want to read the data in sec:

Eigen::MatrixXd M = map->getData("sec");
if (M.size() < 1){
   std::cout << "Unable to read data" << std::endl;
   return -1;
}

Read data from GDAL supported file

There are useful methods aimed at reading files using GDAL API.

Before reading one should call GDALAllRegister() defined in gdal.h (the same also concerns when using h5geopy).

// don't forget to initialize GDAL readers first (maybe at application initialization time)
GDALAllRegister();

if (!map->readRasterCoordinates("data_file.zmap", "meter")){
   std::cout << "Unable to read coordinates from raster data" << std::endl;
   return -1;
}
if (!map->readRasterSpatialReference("data_file.zmap")){
   std::cout << "Unable to read spatial reference from raster data" << std::endl;
   return -1;
}
if (!map->readRasterLengthUnits("data_file.zmap")){
   std::cout << "Unable to read length from raster data" << std::endl;
   return -1;
}
if (!map->readRasterData("data_file.zmap")){
   std::cout << "Unable to read data from raster data" << std::endl;
   return -1;
}

Working with attribute map

Let’s suppose the created time-domain map has velocity attribute i.e. we have somehow sliced volume of velocities and kept the data. Lets generate such attribute map first:

H5MapParam p_attrMap = p;
p_attrMap.dataUnits = "feet/s"

std::string attrMapName = "myAttrMap";
H5Map_ptr attrMap(cnt->createMap(
   attrMapName, p_attrMap, h5geo::CreationType::OPEN_OR_CREATE));
if (!attrMap){
   std::cout << "Unable to open or create attribute map" << std::endl;
   return -1;
}

Eigen::MatrixXd v = Eigen::MatrixXd::Random(p_attrMap.nY, p_attrMap.nX);
if (!attrMap->writeData(v, "km/ms")){
   std::cout << "Unable to write data" << std::endl;
   return -1;
}

To add attribute map:

// addAttributeMap returns std::optional<h5gt::Group> of created map
if (!map->addAttributeMap(attrMap, "velocity").has_value()){
   std::cout << "Unable to add attribute map" << std::endl;
}

Then we can open the attribute and work with it as with usual map:

H5Map_ptr velocityMap(map->openAttributeMap("velocity"));
if (!velocityMap){
   std::cout << "Unable to open attribute map" < std::endl;
}

Finally to remove attribute map we can call the following method:

if (!map->removeAttributeMap("velocity")){
   std::cout << "Unable to remove attribute map" << std::endl;
}

Note

Attribute map is simply HDF5 soft link within H5Map object.

Wells

Well is an object that is responsible for well head coordinates, kelly bushing and manages deviations and log curves.

Create Well

We define well head coordinates, kelly bushing and uwi:

#include <iostream>
#include <h5geo/h5wellcontainer.h>
#include <h5geo/h5well.h>
#include <h5geo/h5devcurve.h>
#include <h5geo/h5logcurve.h>


int main(){
   std::string fileName = "wells.h5";
   H5WellCnt_ptr cnt(h5geo::createWellContainerByName(
      fileName, h5geo::CreationType::OPEN_OR_CREATE));
   if (!cnt){
      std::cout << "Unable to open or create well container" << std::endl;
      return -1;
   }

   H5WellParam p;
   p.headX = 444363;
   p.headY = 7425880;
   p.kb = 50.88;
   p.uwi = "my_uwi";
   p.lengthUnits = "meter";

   std::string wellName = "myWell";
   H5Well_ptr well(cnt->createWell(
      wellName, p, h5geo::CreationType::OPEN_OR_CREATE));
   if (!well){
      std::cout << "Unable to open or create well" << std::endl;
      return -1;
   }

   return 0;
}

Open well by uwi

The straight forward way to open well is by its name. Another way is by uwi (slower but useful):

H5Well_ptr wellByUwi(cnt->openWellByUWI("my_uwi"));
if (!wellByUwi || !well->isEqual(wellByUwi.get())){
   std::cout << "Unable to open well by UWI" << std::endl;
   return -1;
}

Create Dev Curve

Dev curve is responsible for trajectory storing and transformations using minimum curvature method implemented in h5deviation.h5.

H5DevCurveParam p_devCurve;
p_devCurve.lengthUnits = "meter";
p_devCurve.temporalUnits = "millisecond";
p_devCurve.angularUnits = "degree";

std::string devCurveName = "myDevCurve";
H5DevCurve_ptr devCurve(
   well->createDevCurve(
      devCurveName, p_devCurve, h5geo::CreationType::OPEN_OR_CREATE));
if (!devCurve){
   std::cout << "Unable to open or create dev curve" << std::endl;
   return -1;
}

Write/Read Dev Curve

H5DevCurve stores the following curves: MD,AZIM,INCL,DX,DY,TVD,OWT. All other curves are calculated based on these curves. That is done to prevent calculation errors i.e. everytime MD_AZIM_INCL is transformed to X_Y_TVD an error is accumulating. The same concerns when doing that in backward order: X_Y_TVD to MD_AZIM_INCL.

Eigen::VectorXd dx(3), dy(3), tvd(3);
dx << 0, 3, 5;
dy << 0, 0.3, 0.5;
tvd << 0, 1, 2;
if (!devCurve->writeDX(dx, "m") ||
    !devCurve->writeDY(dy, "m") ||
    !devCurve->writeTVD(tvd, "m")){
   std::cout << "Unable to write DX, DY, TVD" << std::endl;
   return -1;
}

// update is needed to calculate MD, AZIM, INCL based on DX, DY, TVD
devCurve->updateMdAzimIncl();

To get back data:

Eigen::VectorXd tvdss_out = devCurve->getCurve("TVDSS", "km");
if (tvdss_out.size() < 1){
   std::cout << "Unable to get TVDSS" << std::endl;
   return -1;
}

H5Well also provides API to work with active deviation. To set current deviation to be active:

if (!well->setActiveDevCurve(devCurve.get())){
   std::cout << "Unable to set active dev curve" << std::endl;
   return -1;
}

// or simply: devCurve->setActive();

To check is dev curve is active use H5DevCurve::isActive().

Note

Active dev curve is simply soft link to the real dev curve within HDF5 file.

Create Log Curve

Log curve is represented by [N, 2] matrix. The first column is MD and the second is VAL (value).

H5LogCurveParam p_logCurve;
p_logCurve.lengthUnits = "meter";
p_logCurve.dataUnits = "kg/m2";

std::string logCurveName = "myLogCurve";
H5LogCurve_ptr logCurve(
   well->createLogCurve(
      logCurveName, p_logCurve, h5geo::CreationType::OPEN_OR_CREATE));
if (!logCurve){
   std::cout << "Unable to open or create log curve" << std::endl;
   return -1;
}

Write/Read Log Curve

Write MD and VAL:

Eigen::VectorXd md(3), vals(3);
md << 0, 3, 5;
vals << 500, 700, 800;
if (!logCurve->writeCurve(h5geo::LogDataType::MD, md) ||
    !logCurve->writeCurve(h5geo::LogDataType::VAL, vals)){
   std::cout << "Unable to write MD and VALS" << std::endl;
   return -1;
}

And to read data:

Eigen::VectorXd md_out = logCurve->getCurve(h5geo::LogDataType::MD, "cm");
if (md_out.size() < 1){
   std::cout << "Unable to get MD" << std::endl;
   return -1;
}

Eigen::VectorXd vals_out = logCurve->getCurve("VAL", "g/cm2");
if (vals_out.size() < 1){
   std::cout << "Unable to get VAL" << std::endl;
   return -1;
}

Seismic

Seis is an object that is composed of several datasets and groups: traces are separated from trace headers. Trace headers are kept on double format while trace data is float. Seis is designed to provide high perfomance while keeping simple API.

Create Seis

Seismic parameters includes: survey type (2D or 3D), data type (stack or prestack), number of traces, number of samples etc:

#include <iostream>
#include <h5geo/h5seiscontainer.h>
#include <h5geo/h5seis.h>
#include <h5geo/h5horizon.h>


int main(){
   std::string fileName = "seis.h5";
   H5SeisCnt_ptr cnt(h5geo::createSeisContainerByName(
      fileName, h5geo::CreationType::OPEN_OR_CREATE));
   if (!cnt){
      std::cout << "Unable to open or create seis container" << std::endl;
      return -1;
   }

   H5SeisParam p;
   p.domain = h5geo::Domain::OWT;
   p.lengthUnits = "millimeter";
   p.temporalUnits = "millisecond";
   p.angularUnits = "degree";
   p.dataUnits = "psi";
   p.dataType = h5geo::SeisDataType::PRESTACK;
   p.surveyType = h5geo::SurveyType::TWO_D;
   p.nTrc = 30;
   p.nSamp = 10;
   p.srd = 20;
   p.spatialReference = "EPSG:8139";

   std::string seisName = "mySeis";
   H5Seis_ptr seis(cnt->createSeis(
      seisName, p, h5geo::CreationType::OPEN_OR_CREATE));
   if (!seis){
      std::cout << "Unable to open or create seis" << std::endl;
      return -1;
   }

   return 0;
}

Write/Read data

By analogy with SEGY format H5Seis contains: text header, binary header, traces headers, trace data.

Write/Read text header
std::vector<std::string> txtHdr;
for (size_t i = 0; i < 40; i++)
   txtHdr.push_back("Bart Simpson");
if (!seis->writeTextHeader(txtHdr)){
   std::cout << "Unable to write text header" << std::endl;
   return -1;
}
// another way is to write C-array: 'char txtHdr_c [40][80];'

Note

Text header is a dataset of size [40, 80]. Thus everything outside of this range will be lost.

To read text header:

std::vector<std::string> txtHdr_out =
   seis->getTextHeader();
if (txtHdr_out.size() < 1){
   std::cout << "Unable to read text header" << std::endl;
   return -1;
}
Write/Read binary header

The simplest way to write binary header is:

// convert 'seconds' to the temporal units of seis object
if (!seis->writeBinHeader("SAMP_RATE", 0.002, "sec", seis->getTemporalUnits())){
   std::cout << "Unable to write samp rate" << std::endl;
   return -1;
}

and to get it back:

double sampRate = seis->getBinHeader("SAMP_RATE", seis->getTemporalUnits(), "ms");
if (isnan(sampRate))
   std::cout << "Unable to get samp rate" << std::endl;
   return -1;
}

Note

List of binary header names is available through getBinHeaderNames function declared in h5core_util.h. Header names are consistent to those used in SEGY viewer SeiSee

Write/Read trace headers

There are many functions to do this. Here is one of them:

Eigen::MatrixXd cdp(3);
cdp << 1, 2, 3;
// write starting from 5th trace
if (!seis->writeTraceHeader("CDP", cdp, 5)){
   std::cout << "Unable to write CDP trace header from 5th trace" << std::endl;
   return -1;
}

and to get it back:

// get 'cdp' trace header from 3 traces starting from 5th trace
Eigen::MatrixXd cdp_out = seis->getTraceHeader("CDP", 5, 3);
if (cdp_out.size() < 1){
   std::cout << "Unable to get CDP trace header">> std::endl;
   return -1;
}

// update trace header limits is needed when trace headers are written
if (!seis->updateTraceHeaderLimits()){
   std::cout << "Unable to update trace header limits" << std::endl;
   return -1;
}

Note

List of trace header names is available through getTraceHeaderNames function declared in h5core_util.h. Header names are consistent to those used in SEGY viewer SeiSee

Warning

Call updateTraceHeaderLimits everytime when trace header min/max values changed.

Write/Read trace data

Once again there are many functions to do this, here are some:

Eigen::MatrixXd traces(p.nSamp, 3);
Eigen::MatrixXf traces = Eigen::MatrixXf::Random(
   seis->getNSamp(), seis->getNTrc());
// write starting from zero's trace
if (!seis->writeTrace(traces, 0)){
   std::cout << "Unable to write traces" << std::endl;
   return -1;
}

Get traces back:

// from 3rd trace, 10 traces, from 2nd sample, 5 samples
traces_out = seis->getTrace(3, 10, 2, 5);
if (traces_out.size() < 1){
   std::cout << "Unable to get traces">> std::endl;
   return -1;
}

Note

write/get trace headers and trace data have pretty wide opportunities including trace selection and working with sorted data. Take a look at seis.h to see all them.

Sorting

The idea behind sorting is to prepare sorting by primary keys (PKey). To accelerate the IO process the user need to add PKey sorting first addPKeySort and then use getSortedData function to retrieve the data. No need to manually resort data, h5geo only keeps indexes and unique values of prepared sortings. In theory this should make work with big data pretty effective.

For example there is widely used sorting CDP-OFFSET (OFFSET is called DSREG in h5geo). Add Pkey CDP and then you are free to retrieve any CDP-... sorted data.

if (!seis->addPKeySort("CDP")){
   std::cout << "Unable to add CDP PKey" << std::endl;
   return -1;
}

// then you are allowed to use convenient 'getSortedData' function
Eigen::MatrixXf trace_out;
Eigen::MatrixXd trc_header_out;
// from CDP 1 to 2, from DSREG 0 to 500
// 'trc_ind' - contains indexes of selected traces
Eigen::VectorX<size_t> trc_ind = seis->getSortedData(
   trace_out, trc_header_out, {"CDP", "DSREG"}, {1, 0}, {2, 500});

Note

Use updatePKeySort when data was mixed.

Sorting uses parallelization over the threads.

Warning

Sorting idea is effetive only if the chosen PKey has many repeating values.

Calculating XY boundary around the survey

There is a convenient function to calculate XY boundary around survey. For 3D and 2D prestack data it uses convex hull algorithm. For 2D stack data it simply shows CDP coordinates of traces.

 if (!seis->updateBoundary()){
    std::cout << "Unable to update boundary" << std::endl;
    return -1;
 }

 Eigen::MatrixXd boundary = seis->calcBoundary();
 if (!boundary.size() < 1){
    std::cout << "Unable to calculate boundary" << std::endl;
    return -1;
 }

 std::string horizonName = "boundary";
 H5HorizonParam p_hrz;
 p_hrz.components["X"] = 0;
 p_hrz.components["Y"] = 1;
 p_hrz.nPoints = 10;
 p_hrz.pointsChunkSize = 10;
 p_hrz.domain = h5geo::Domain::TWT;
 p_hrz.lengthUnits = p.lengthUnits;
 p_hrz.spatialReference = p.spatialReference;

 H5Horizon_ptr hrz(
    seis->createHorizon(
       horizonName, p_hrz, h5geo::CreationType::CREATE_OR_OVERWRITE));
ASSERT_TRUE(hrz != nullptr);
 if (!hrz){
    std::cout << "Unable to create horizon (boundary)" << std::endl;
    return -1;
 }

 if (!hrz->writeComponent("X", boundary.col(0))){
    std::cout << "Unable to write X to boundary" << std::endl;
    return -1;
 }

 if (!hrz->writeComponent("Y", boundary.col(1))){
    std::cout << "Unable to write Y to boundary" << std::endl;
    return -1;
 }

To get calculated values:

Eigen::VectorXd X = hrz->getComponent("X");
if (X.size() < 1){
   std::cout << "Unable to get X from boundary" << std::endl;
   return -1;
}

Eigen::VectorXd Y = hrz->getComponent("Y");
if (Y.size() < 1){
   std::cout << "Unable to get Y from boundary" << std::endl;
   return -1;
}

Read SEGY

Reading SEGY is pretty simple:

if (!seis->readSEGYTextHeader("file.sgy")){
   std::cout << "Unable to read segy text header" << std::endl;
   return -1;
}
if (!seis->readSEGYBinHeader("file.sgy")){
   std::cout << "Unable to read segy binary header" << std::endl;
   return -1;
}
// SEGY files will be concatenated
if (!seis->readSEGYTraces({"file1.sgy", "file2.sgy", "file3.sgy"})){
   std::cout << "Unable to read segy binary header" << std::endl;
   return -1;
}

Note

To read SEGY files h5geo uses memory-mapping technique and parallelization over the threads (OpenMP library). Thus it should work pretty fast but there is a limitation with memory-mapping: the SEGY files should be on the PC’s hard drive. See more on wiki.

Map SEGY

The user may want not to spend time on reading SEGY file but simply map it. In h5geo you are allowed to do this at H5Seis creation time:

H5SeisParam p_mapped = p;
p_mapped.mapSEGY = true;
p_mapped.segyFiles = {"file1.sgy", "file2.sgy", "file3.sgy"};

std::string mappedSeisName = "seisMapped";
H5Seis_ptr seisMapped(cnt->createSeis(
   mappedSeisName, p_mapped, h5geo::CreationType::OPEN_OR_CREATE));
if (!seisMapped){
   std::cout << "Unable to open or create mapped seis" << std::endl;
   return -1;
}

Then you are free to use it as with regular seis object but with some limitations:

  • probably it is impossible to resize file

  • data loss when writing to trace headers and binary header (double is casted to int and short)

  • only SEGY ieee-32 format are supported

indexes and tables