MDZ_long_term_stats

Locale	Presentation string
en-GB	Long-term statistics

Overview

This extension is exposed when user enables storage for saving performance and health data on miner. In general, M8M does not provide any kind of persistent storage as this could generate a potentially unbounded amount of data useful mostly to advanced users.

The main purpose of this extension is to define a generic framework for retrieval of miner-side data. Data might come from different sources and most likely will have a different semantic. Those different streams of data can be queried or maintained on disk independently of each other.

This extension does not define any stream albeit for the purpose of discussion stream is used to refer to an hypothetical stream of data defined by other extensions. Streams are identified by an unique string similar in concept to extension tokens. Whatever an extension is eligible for long-term monitoring, it must define an unique stream name, semantics and data format for stream.

Interactions with other extensions

None.

Configuration file additions

A new value LTS is added. Its value can be:

true to enable data storage with no disk quota limit;
an unsigned integer value (not zero) to enable storage up to a certain storage limit, measured in MiBs.

By default, LTS: false so it is not necessary to explicitly disable this feature resulting in no disk usage.
When disk quotas are used, older samples are discarded first.

New common commands

statStreams

Parameters: none.

Purpose: provide a list of streams available for query and time of first sample available.

Reply:
{ stream: [S₀, S₁, ... , S_N-1], presentation: [P₀, P₁, ... , P_N-1] }
An object containing a set of equally-sized arrays containing exactly N elements. Each entry corresponds to an enabled stream of data. A stream is enumerated only if enabled by user and available for query. A stream is enumerated when those two conditions are met, even if no samples are available.

Each element S_i must be a string, conceptually similar to an extension token. Those strings are used to query data from an uniquely identified stream.
Each element P_i must be a string, it is to be considered similar to extension presentation string, potentially localized.
Each element C_i must be an integer, number of available samples for the corresponding i-th stream. Alternatively, it can be the string "unknown" to indicate a stream which exact sample count cannot be trivially inferred. Note the sample count can be 0.

Push: not allowed.

statSamples

Parameters: an array of strings, each one being an S_i token from a previous statStreams command.
The empty array is considered equivalent to the stream array returned by the previous statStreams command and has the effect of querying all enabled streams.
Not specifying a parameter is considered equivalent to specifying [].

Purpose: retrieve information about data samples in the specified streams.

Reply: an object, containing as many sub-objects as the number of queried streams in no specified order. Each sub-object is identified by the unique stream name, which is also the string used to query it.
Each object is in the following form:
{ first: <preciseTimePoint>, last: <preciseTimePoint>, count: <integer-count> }
Where <preciseTimePoint> is an array containing two entries. The first is the number of seconds till the epoch of a certain recorded event while the second is the number of microseconds since the start of the identified second.
The fields first and last identify the first and last sample taken for the given stream.
The field <integer-count> is the amount of available samples at the moment the reply was produced.

The average amount of samples per second can be computed as (last - first) / count.

Push: not allowed.

getSamples

Parameters: an object containing one sub-object identified by the stream name, where each of those sub-objects has the following form
{ first: <bound-identifier>, count: <bound-identifier> }
Where <bound-identifier> is one of the following:

A <preciseTimePoint> value as previously defined.
When used as first value, this is the minimum amount of time passed since epoch to consider a sample eligible for reporting. It is not required to match a sample time accurately.
When used as count value, this effectively defines a maximum time elapsed since first to consider samples eligible for reporting.
If first is not specified as a <preciseTimePoint> but count is, then a <preciseTimePoint> value is derived from the first sample reported.
An unsigned integer value.
For first, this must be the index of a valid sample. The amount of valid samples available can be retrieved by the statSamples command. Note enabling disk quotas might result in those counts becoming stale.
For count, the value will be silently clamped to fit the available range of samples at the time the reply is produced.
"begin" (only for first): the first sample eligible for reporting will be the oldest available.
"all" (only for count): reply will contain all samples successive to first at the time reply is produced. This is a guaranteed-to-work way to query the last few produced samples, as their indices cannot be accurately discovered.

Purpose: retrieve information about data samples in the specified streams.

Reply: an object containing one sub-object identified by the stream name, where each of those sub-objects has the following form
{ when: [T₀, T₁, ... , T_N-1], value: [V₀, V₁, ... , V_N-1] }
Each T_i value is a <preciseTimePoint> as previously specified.
Each V_i value must correspond to the T_i. The format of each V_i value is to be defined by each stream as value type.
If a stream produces samples with no associated values the value array can be empty.
If a stream conditionally produces values, then samples without a valid value must have the null value.

Push: not allowed.

New monitor commands

None.

New admin commands

None.

Issues

Isn't this a bit too easygoing in terms of security?

Solution: yes. It seems security should be achieved by other means however such as tunnelling / VPN, filesystem cryptography and firewalls. Those are more effective, flexible and reliable than anything I could hack.

Should `statStreams` provide more data?

Solution: no, I have decided against this, favouring modular approaches. Using multiple commands is easier.

Are disk quotas really so useful?

Notes: they complicate management significantly and introduce considerable amounts of uncertainty in the protocol:

sample counts are no more monotonically increasing;
a sample index can shortly become invalid;
two queries with the same sample indices could retrieve different data as the samples are effectively moved;
the way indices are updated is unspecified;
different streams might produce data at different rate; perhaps they should be tuned individually?
their goal can be easily accomplished by using a external tools or scripts.

Solution: ?

MDZ_long_term_stats

Overview

Interactions with other extensions

Configuration file additions

New common commands

statStreams

statSamples

getSamples

New monitor commands

New admin commands

Issues

Isn't this a bit too easygoing in terms of security?

Should statStreams provide more data?

Are disk quotas really so useful?

Should `statStreams` provide more data?