Class MultiDBD : reading multiple files

MultiDBD API

The MultiDBD class is designed to be used with multiple files, either dbd’s (sbd’s) or ebd’s (tbd’s), or a mixture.

As for single dbd files, using the class::dbdreader.DBD, the behaviour of the binary reader can be modified using the skip_initial_line keyword to the constructor. See also Class DBD : reading single files.

class dbdreader.MultiDBD(filenames=None, pattern=None, cacheDir=None, complemented_files_only=False, complement_files=False, banned_missions=[], missions=[], max_files=None, skip_initial_line=True)

Opens multiple dbd files for reading

This class is intended for reading multiple dbd files and treating them as one.

Parameters:
  • filenames (list of str or None) – list of filenames to open

  • pattern (str or None) – search pattern as passed to glob

  • cacheDir (str or None) – path to directory with CAC cache files (None: the default directory is used)

  • complemented_files_only (bool) – if True, only those files are retained for which both engineering and science data files are available.

  • complement_files (bool) – If True automatically include matching [de]bd files

  • banned_missions (list of str) – List of mission names that should be disregarded.

  • missions (list of str) – List of missions names that should be considered only.

  • maxfiles (int) –

    maximum number of files to be read, where

    >0: the first n files are read <0: the last n files are read.

  • skip_initial_line (bool (default: True)) – If True, the first data line in each dbd file (and friends) is not read.

Notes

Upon creating the dbd file, when starting a new mission or dive segment, all parameters are written and marked as updated. In reality, most parameters are NOT update, and the value written is the value in memory, which may be several minutes old, or even longer. It has been pointed out to me that a handful parameters, are set only once, before creating the dbd file. Since these parameters are not of interest for normal data processing, the first line of data is skipped by default, but can be read if required.

Changed in version 0.4.0: ensure_paired and included_paired keywords have been replaced by complemented_files_only and complement_files, respectively.

close()

Close all open files

determine_ctd_type()

Determines CTD type installed from the presence of CTD specific name for the time stamp.

Returns:

  • string – {“ctd41cp”, “rbrctd”}

  • If unable to get a positive CTD identification, it is assumed the CTD installed is a Seabird

  • CTD, returning “ctd41cp”.

Notes

New in version 0.5.5.

get(*parameters, decimalLatLon=True, discardBadLatLon=True, return_nans=False, include_source=False, max_values_to_read=-1)

Returns time and value tuple(s) for requested parameter(s)

This method returns time and values tuples for a list of parameters.

Note that each parameter comes with its own time base. No interpolation is done. Use get_sync() for that in stead.

Parameters:
  • parameter_list (list of str) – list of parameter names

  • decimalLatLon (bool, optional) – If True (default), latitiude and longitude related parameters are converted to decimal format, as opposed to nmea format.

  • discardBadLatLon (bool, optional) – If True (default), bogus latitiude and longitude values are ignored.

  • return_nans (bool) – If True, nan’s are returned for those timestamps where no new value is available. Default value: False

  • include_source (bool, optional) –

    If True, a list with a reference for each data point to the DBD object, where the datapoint originated from. If called with a single parameter, a tuple of a Nx2 array with data and a list of N elements with refrences to a DBD object. If called for more parameters, a list of such tuples is returned.

    Default value: False

  • max_values_to_read (int, optional) – if > 1, then reading is stopped after this many values have been read. Default value : -1

Returns:

  • (ndarray, ndarray) or

  • ((ndarray, ndarray), list) or

  • [(ndarray, ndarray), (ndarray, ndarray), …]

  • [((ndarray, ndarray), list), ((ndarray, ndarray), list), …] – for a single parameter, for a single parameter, including source file list, for multiple parameters, for multiple parameters, including source file list, respectively.

  • .. versionchanged:: 0.5.5 For a single parameter request, the number of values to be read can be limited.

get_CTD_sync(*parameters, decimalLatLon=True, discardBadLatLon=True)

Returns a list of values from CTD and optionally other parameters, all interpolated to the time base of the CTD timestamp.

Parameters:
  • *parameters (variable length list of str) – names of parameters to be read additionally

  • decimalLatLon (bool, optional) – If True (default), latitiude and longitude related parameters are converted to decimal format, as opposed to nmea format.

  • discardBadLatLon (bool, optional) – If True (default), bogus latitiude and longitude values are ignored.

Returns:

Time vector (of first parameter), C, T and P values, and interpolated values of subsequent parameters.

Return type:

(ndarray, ndarray, …)

Notes

New in version 0.4.0.

get_global_time_range(fmt='%d %b %Y %H:%M')

Returns start and end dates of data set (all files)

Parameters:

fmt (str) – String that determines how the time string is formatted.

Returns:

tuple with formatted time strings

Return type:

(str, str)

get_sync(*parameters, decimalLatLon=True, discardBadLatLon=True)
Returns a list of values from parameters, all interpolated to the

time base of the first paremeter

This method is used if a number of parameters should be interpolated onto the same time base.

Parameters:
  • *parameters (variable length list of str) – parameter names. Minimal length is 2. The time base of the first parameter is used to interpolate all other parameters onto.

  • decimalLatLon (bool, optional) – If True (default), latitiude and longitude related parameters are converted to decimal format, as opposed to nmea format.

  • discardBadLatLon (bool, optional) – If True (default), bogus latitiude and longitude values are ignored.

Returns:

  • (ndarray, ndarray, …) – Time vector (of first parameter), values of first parmaeter, and interpolated values of subsequent parameters.

  • Example – get_sync(‘m_water_pressure’,’m_water_cond’,’m_water_temp’)

Notes

Changed in version 0.4.0: Calling signature has changed from the sync parameters passed on as a list, to passed on as parameters.

get_time_range(fmt='%d %b %Y %H:%M')

Get start and end date of the time range selection set

Parameters:

fmt (str) – String that determines how the time string is formatted

Returns:

Tuple with formatted time strings

Return type:

(str, str)

get_xy(parameter_x, parameter_y, decimalLatLon=True, discardBadLatLon=True)

Returns values of parameter_x and paramter_y

For parameters parameter_x and parameter_y this method returns a tuple with the values of both parameters. If necessary, the time base of parameter_y is interpolated onto the one of parameter_x.

Parameters:
  • parameter_x (str) – parameter name of x-parameter

  • parameter_y (str) – parameter name of y-parameter

  • decimalLatLon (bool, optional) – If True (default), latitiude and longitude related parameters are converted to decimal format, as opposed to nmea format.

  • discardBadLatLon (bool, optional) – If True (default), bogus latitiude and longitude values are ignored.

Returns:

tuple of value vectors

Return type:

(ndarray, ndarray)

has_parameter(parameter)

Has this file parameter? :returns: True if this instance has found parameter :rtype: bool

classmethod isScienceDataFile(fn)

Is file a science file?

Parameters:

fn (str) – filename

Returns:

True if file fn is a science file

Return type:

bool

set_skip_initial_line(skip_initial_line)

Sets the reading mode of the binary reader to skip the initial data entry or not.

Parameters:

skip_initial_line (bool) – Sets the attribute skip_initial_line of each DBD instance, controlling the reading of the first data entry of each binary file.

set_time_limits(minTimeUTC=None, maxTimeUTC=None)

Set time limits for data to be returned by get() and friends.

Parameters:
  • minTimeUTC (str) – start time in UTC

  • maxTimeUTC (str) – end time in UTC

Notes

{minTimeUTC, maxTimeUTC} are expected in one of these formats:

“%d %b %Y” 3 Mar 2014

or

“%d %b %Y %H:%M” 4 Apr 2014 12:21

MultiDBD Example

import numpy as np
import dbdreader


# This examples shows how to deal with multiple dbd files. This can be
# a sdb and tbd file for a single segment, or a number of sbd files
# only, or a combionation of both.
#
# This example uses the MultiDBD class. You can require that each sbd
# file must have its accompanying tbd file, or that you specify sbd
# files only and MultiBDB looks for the accompanying tbd files. You
# can limit the number of files processed to the first n or last n
# files, mainly for developing purposes. See the doc string for pointers.
#
# You can specify the files to be opened as a list of filenames, or as
# a pattern using wild cards, using either the filenames=[...] keyword
# or patterns='....' keyword.
#
# All files that match are used. You can narrow down your selecting by
# setting the start and/or end times. This reflects to the opening
# times of the files. (The reason for this is that the header only
# needs to be read, and not every possible variable to find start and
# end times of each file.
#
# There are basically two ways of narrowing down the number of files
# processed. You can use the set_time_limits() method of the MultiDBD
# class or you can use DBDPatternSelect. The latter selects files
# according to a pattern, or as a list of files. The select() method
# returns a list of files that match from and until dates, which can
# be used to create a MultiDBD instance using the filenames=[]
# keyword.

# Below this is put to the test.


# open some files, using a pattern

dbd=dbdreader.MultiDBD(pattern="data/amadeus*.[st]bd")

# print what parameters are available:
print("we the following science parameters:")
for i,p in enumerate(dbd.parameterNames['sci']):
    print("%2d: %s"%(i,p))
print("\n and engineering paramters:")
for i,p in enumerate(dbd.parameterNames['eng']):
    print("%2d: %s"%(i,p))

# get the measured depth

tm,depth=dbd.get("m_depth")

max_depth=depth.max()
print("\nmax depth found is %f m"%(max_depth))

# get lat lon
lat,lon=dbd.get_xy("m_lat","m_lon")

# interpolate roll speed on depth time
tm,depth,roll,speed=dbd.get_sync("m_depth","m_roll","m_speed")

print("\nmax speed %f m/s"%(speed.compress(np.isfinite(speed)).max()))


# print the time range of the files
tr=dbd.get_global_time_range()
# these are the opening times of the first and last files.
print("We have data from %s until %s"%tuple(tr))

# limit our data
print("we limit our data to include only files opened after 24 Jul 2014 18:00")
# use only data files that are opened after 6 pm on 24 Jul 2014
dbd.set_time_limits(minTimeUTC="24 Jul 2014 18:00")

tm1,depth1=dbd.get("m_depth")
print("start time full time range:")
print(dbdreader.epochToDateTimeStr(tm[0]))
print("start time reduced time range:")
print(dbdreader.epochToDateTimeStr(tm1[0]))


# time selection, we can achieve in a different way too.

pattern_selector=dbdreader.DBDPatternSelect()
pattern_selector.set_date_format("%d %b %Y %H")
selection=pattern_selector.select(pattern="data/amadeus*.[st]bd",from_date="24 Jul 2014 18")

print("full list of sbd files:")
for i,n in enumerate(dbd.filenames):
    if n.endswith("sbd"):
        print("%d: %s"%(int(i/2),n))
print("and...")
print("reduced list of sbd files:")
for i,n in enumerate(selection):
    if n.endswith("sbd"):
        print("%d: %s"%(int(i/2),n))