Cache files and sorting filenames

Cache files

The Slocum gliders store their binary data files (dbd, ebd and friends) in a compact form: the sensor list, along with the size and type of each sensor, is only written to the file’s header if it is not already known. If it is known, the header only references it by an ID. The full sensor list belonging to that ID is stored separately in a so-called CAC (cache) file. To be able to read a data file, dbdreader therefore also needs access to the matching CAC file(s).

By default, dbdreader keeps a copy of the CAC files it encounters in a single, fixed cache directory, so that you do not have to hunt down and supply CAC files yourself for every dbd/ebd file you want to read. The dbdreader.DBDCache class manages this directory.

The default cache directory is set automatically when dbdreader is imported:

  • On Linux: $HOME/.local/share/dbdreader

  • On Windows: $HOME/.dbdreader

For most use cases you never need to interact with dbdreader.DBDCache directly, since dbdreader.DBD, dbdreader.MultiDBD and dbdreader.DBDPatternSelect all fall back to this default directory automatically when no cacheDir argument is given. There are, however, a few situations where you may want to change or query where CAC files are stored.

Example 1: overriding the default cache directory for the whole session

If you want all subsequently created DBD/MultiDBD objects to use a different cache directory (for example because you keep CAC files for a specific glider project in a dedicated folder), set it once at the start of your script:

import dbdreader

dbdreader.DBDCache.set_cachedir("/home/user/gliderdata/amadeus/cac")

# from here on, DBD and MultiDBD objects use this directory by default
dbd = dbdreader.DBD("data/amadeus-2014-204-05-000.sbd")

Note

set_cachedir() raises an error if the given directory does not exist. Use the class constructor (see Example 2) instead if you want the directory to be created automatically.

Example 2: creating the cache directory if it does not exist yet

Calling dbdreader.DBDCache directly, rather than set_cachedir(), has the same effect, except that the target directory is created if it does not exist yet:

import dbdreader

dbdreader.DBDCache("/home/user/gliderdata/amadeus/cac")

Example 3: reading a single file from a non-default cache location

If you only need to read a handful of files from a cache directory that is different from your session-wide default, you do not need to touch dbdreader.DBDCache at all — simply pass cacheDir to the constructor of dbdreader.DBD or dbdreader.MultiDBD for that one call:

import dbdreader

dbd = dbdreader.DBD("data/amadeus-2014-204-05-000.sbd",
                     cacheDir="/home/user/gliderdata/other_project/cac")

This leaves the session-wide default cache directory (as managed by dbdreader.DBDCache) untouched for all other files.

Sorting filenames

Slocum data filenames encode a segment number as four fields, for example unit204-2014-212-0-3.dbd. Because these fields are not zero-padded, a plain alphabetical sort of filenames does not put them in the correct chronological order (unit204-2014-212-0-30.dbd would sort before unit204-2014-212-0-3.dbd, for instance). The dbdreader.DBDList class is a list subclass that fixes this by overriding sort() to compare files by their segment fields instead of alphabetically.

import dbdreader

filenames = dbdreader.DBDList(
    ["unit204-2014-212-0-30.dbd",
     "unit204-2014-212-0-3.dbd",
     "unit204-2014-212-0-100.dbd"]
)
filenames.sort()
print(filenames)
# ['unit204-2014-212-0-3.dbd', 'unit204-2014-212-0-30.dbd', 'unit204-2014-212-0-100.dbd']

DBDList behaves like a regular list in every other respect (it can be indexed, iterated, appended to, and so on); only sort() is specialised.