DUECA/DUSIME
Logging with DDFF

Introduction

DDFF or Delft Data File Format, is a datafile layout introduced for DUECA/DUSIME record and replay facilities, and it is also available and useful for generic logging. The low-level file format is itself based on the msgpack binary packing and unpacking format. To facilitate recording and replay of (generally the input of) a simulation, the DDFF loggers provide real-time logging and reading of data. This logging backbone can be used in a number of different ways:

DUECA also provides facilities for logging in the HDF5 format. Unfortunately, logging real-time data in that format can require quite some processing power, and if you try to log from somewhere between 10 and 20 channels, the hdf5 logger may clog up your simulation. If that happens (or anyway), you can easily switch over to DDFF, since the modules have compatible arguments for their configuration. A Python-based program can be used to convert your DDFF file into a HDF5 format that is very close to the type of file DUECA would directly produce, so this step will have minimal impact on your data processing. The ddff reading is also directly usable from your own Python scripts.

Streams and segments

In its most basic form, a DDFF file provides several 'streams' of msgpack encoded data in a single file. However, normally DUECA logs to files with an inventory and segments.

The inventory is a description of the contents of the different streams. It is written as a series of msgpack objects, into stream #0. This data contains:

The DCO object data is envoded in msgpack arrays. An overall array of three elements codes time tick, time span, and the DCO object contents in an array.

On top of that, the file can contain multiple segments. The segment information links a name (something like "recording1") with a stretch of data in the file. To this end, the file block offsets, starting points in these blocks, and the end of the recording is stored in the segment information in stream #1. By decoding the segment information in stream #1, the start and end points in the file for each recording segment can be found. If not controlled through the log configuration (see below), a datafile will simply contain a single segment.

Logging

To add the logging module, add ddff to the DUECA components in your main CMakeLists.txt, for example:

set(DUECA_COMPONENTS ${SCRIPTLANG} extra dusime udp ddff)

The logging module can then be added to the dueca_mod.py configuration file, Logging works by subscribing to the specified channels. DCO objects in those channels need to be generated with the

(Option msgpack)

option. Preferably give the logger its own priority. Timing does not need to be fast, a few invocations per second is enough, the logger will simply collect the data from the specified channel, and code these in the DDFF logging format. Each channel or each entry will be written into a logging stream,

There are two common ways of specifying log entries.

The following snippet shows these both:

mymods.append(
dueca.Module("ddff-logger", "", log_priority).param(
("set-timing", log_timing),
("log-entry", # Vehicle output
(f"VehicleCabPosition://{entity}",
"VehicleCabPosition",
"/data/vehicle" ) ),
("watch-channel",
("BaseObjectMotion://world",
"/data/traffic" ) ),
("filename-template",
"mydatalog-%Y%m%d_%H%M%S.ddff" )
) )

Timing for the logged data can be influenced by a number of keywords

The logger can also be controlled and monitored through DUECA channels. You can specify a configuration channel with DUECALogConfig events. These can be used to define sections in your file, or specify that you want to log to a new file. Logging status can be reported to a channel with DUECALogStatus events, giving information on file size and logging process.

...
("config-channel",
f"DUECALogConfig://{entity}"),
("status-channel",
f"DUECALogStatus://{entity}"),
...

Reading the data

The pyddff Python module is installed with a DUECA installation, and can also be separately downloaded and installed through pypi.

> pip install --user pyddff

This also install the program ddff-convert. See ddff-convert help to get instructions. When working with python and the pyddff module directly, you can efficiently convert the data to a convenient format.

from pyddff import DDFFSegments
# Open a file
df = DDFFSegments("mylogfile.ddff")
# see what streams are defined
print(df.keys())
# see what segments are defined
print(df.tags())
# for stream data, it is useful to read streams into dictionaries of
# numpy arrays. For "simple" data (floats, doubles, ints, strings, or
# fixed-length arrays of these), this results in 1 or 2-dimensional
# numpy arrays.
# without arguments get_data will get all data, with a period or index,
# it will collect only that segment.
# this returns time `tick`, `span` as numpy arrays, and `d` a dictionary
# of numpy arrays with data.
t, span, d = df['/data/traffic'].get_data()
# for events, it makes more sense to iterate over these, you will get
# the event data in a dictionary, one by one
for t, event in df['/data/config'].items():
print(t, event)

Record and replay

When using record and replay, DDFF files (for each entity involved in record and replay, one file for each node where record and replay is used), are used to store data for a simulation where recording is used. As an example, the control loading simulation may record all user input. For a replay, the desired data segment is loaded, and in the Replay mode, that data is read and produces an exact copy of the output during the recording.

For control loading devices, it would of course be cool it the device moved as it did during the recording. For some other devices (buttons or the like), this may not be possible. To accomodate this, there two ways of recording and replaying are implemented:

This uses the DataRecorder class.

/// Recorder for record and replay
DataRecorder myrecorder;

For the constructor, a datarecorder does not need arguments. However in the isPrepared function, you need to link it to the channel you use, or give it a custom configuration:

// in isPrepared, after the token you are using is ok:
if (res) {
// you can call this any number of times
myrecorder.complete(getEntity(), my_token);
// check the recorder, if happy, it keeps res true:
CHECK_RECORDER(myrecorder);
}

Alternatively, if you want to custom create the recorder:

myrecorder.complete(getEntity(), "unique key", "MyDCOClass");

You should now (however you completed your recorder), (try to) record whenever your module is in "Advance" mode. This may be custom data, or simply the data you sent over the channel(s), if that is enough for you:

DataWriter<MyData> dw(my_token, ts);
// write the data
...
// only when in advance, use the written data for the recording
if (getCurrentState() == SimulationState::Advance) {
my_recorder.record(ts, dw.data());
}

There is a slight modification if you want to record event data. With that type of data, in each simulation update step there may be zero, one or multiple events written (and thus also recorded). Simply record as above, but indicate that it is for events, and mark each completed simulation period:

while (writing_my_event) {
DataWriter<MyObject> dw(w_mytoken, ts);
// write the data to dw.data(), etc. ...
if (getCurrentState() == SimulationState::Advance) {
my_recorder.record(ts.getValidityStart(), dw.data());
}
}
// make sure that the recording is marked as complete for this
// update
if (getCurrentState() == SimulationState::Advance) {
my_recorder.markRecord(ts);
}

It is also possible to record any other DCO class data, and you do not need to do this with a DataWriter / write token.

For playback, only in Replay mode, either directly read from the recorder, and use that data, or use the recorder to directly replay on the channel:

// example of read and use
case Replay: {
MyDCOClass obj;
DataTimeSpec ts2;
my_recorder.replay(ts, obj, ts2);
... do stuff with the data
}

The .replay call returns true if there was data to replay for the period, and for event data you can test this in a while loop to get all recorded events.

If you don't need to see or check the data during replay, you can directly use the channelReplay method with the token:

if (getCurrentState() == SimulationState::Replay) {
my_recorder.channelReplay(ts, w_mytoken);
}

When replaying a piece of recorded data, time specifications likely differ between the recorded and replayed stretch. All timing is automatically translated by the recorder.

A standard interface and record and replay controller are provided by DUSIME to help you manage and select recorded data.

Record and Replay configuration

To configure record and replay, you need to add a number of elements to the dueca_mod.py file. For each entity that you want to include in the record and replay, you need to add a replay controller, and it also makes sense to add a separate controller/handler for the initial model state (snapshots).

# suppose you named your entity
entity = 'PHLAB'
# the initial state inventory is part of the DUECA modules
DUECA_mods.append(
dueca.Module("initials-inventory", entity, admin_priority).param(
# reference_file=f"initials-{entity}.toml",
store_file=f"initials-{entity}-new.toml"))
DUECA_mods.append(
dueca.Module("replay-master", entity, admin_priority).param(
# reference_files=f"recordings-{entity}.ddff",
store_files=f"recordings-{entity}-new.ddff"))

This adds new windows to the view menu of the DUECA interface. If you have a reference file, the initial model states and recordings from the reference file will become available for use right away, and together with any new states or recordings you capture, these will be added to the store files.

After a run, a new initials file will be available on node 0 of your DUECA process. Files for the recordings are stored locally, so if you have multiple DUECA nodes, and record data on these, multiple recording files will be generated, each with its part of the data needed for replay.

In addition to the replay master, each DUECA node where recording or replay takes place, needs a ReplayFiler object, one per entity serviced. Simply create that in the dueca_mod.py file, and assign it to a variable.

filer = dueca.ReplayFiler(entity)

The filer will connect to its corresponding master, and ensure that any recorders you use can store their data and retrieve replay data.

Relation to msgpack

The DDFF file format uses msgpack to convert the DUECA data to file. When logging a DCO object, msgpack codes this as an array with each element of the array corresponding to a data member in the DCO object. The msgpack format can also be used in general inter-process communication, the websocket server can use this format, and in the past it was also used to communicate with the PupilLabs Core eye tracker. In those cases it is better to code the DCO objects as object structs (i.e., coding variable name + value), rather than as arrays. This way of coding is actually the default. Let's show this with a simple example, first a DCO object:

(Type int)
(Object MyObject
(Option msgpack)
(int a (Default 1))
(int b (Default 2))
)

If you want to pack the data, you could use a DUECA messagebuffer:

dueca::MessageBuffer buf(200);
MyObject o1; // will have default data
msgpack::packer<dueca::MessageBuffer> pk(buf); pk.pack(o1);

In this case, the object will be packed as an object. The msgpack format is binary, but can be