Miscellaneous

asr_eval.ROOT_DIR

The root directory for the asr_eval package, where its __init__.py lives.

asr_eval.CACHE_DIR

A cache dir for asr_eval.

Default ~/.cache/asr_eval/ on Linux. May be overridden by setting the environmental variable ASR_EVAL_CACHE.

asr_eval.segments

Audio segmentation utils.

class asr_eval.segments.AudioSegment(start_time, end_time)[source]

An audio segment from .start_time to .end_time.

Is immutable.

Parameters:
  • start_time (float)

  • end_time (float)

start_pos(sampling_rate=16_000)[source]

Get the start array position given a sampling rate.

Return type:

int

Parameters:

sampling_rate (int)

end_pos(sampling_rate=16_000)[source]

Get the end array position given a sampling rate.

Return type:

int

Parameters:

sampling_rate (int)

slice(sampling_rate=16_000)[source]

Get a slice from the start to the end array position given a sampling rate.

Parameters:

sampling_rate (int)

Return type:

slice[int]

property duration: float

A duration in seconds.

overlap_seconds(other)[source]

An overlap with another segment in seconds.

Return type:

float

Parameters:

other (AudioSegment)

expand(left_indent, right_indent)[source]

Expands, given left and right indent. Avoids going into negative time positions. Returns a copy without modifying the original segment.

Return type:

Self

Parameters:
  • left_indent (float)

  • right_indent (float)

clip(max_sound_duration)[source]

Clips start and end times up to the given time. Returns a copy without modifying the original segment.

Return type:

Self

Parameters:

max_sound_duration (float)

property center_time

Gets a center time in seconds.

class asr_eval.segments.TimedText(start_time, end_time, text)[source]

Bases: AudioSegment

An AudioSegment with the corresponding text.

Parameters:
  • start_time (float)

  • end_time (float)

  • text (str)

class asr_eval.segments.DiarizationSegment(start_time, end_time, speaker)[source]

Bases: AudioSegment

An AudioSegment with the corresponding speaker index or name.

Parameters:
  • start_time (float)

  • end_time (float)

  • speaker (int | str)

class asr_eval.segments.TimedDiarizationText(start_time, end_time, speaker, text)[source]

Bases: TimedText, DiarizationSegment

Parameters:
  • start_time (float)

  • end_time (float)

  • speaker (int | str)

  • text (str)

asr_eval.segments.chunking.chunk_audio(length, segment_length, segment_shift, last_chunk_mode='same_length')[source]

Chunks the audio uniformly.

Parameters:
  • length (float) – A total audio length.

  • segment_length (float) – The desired length of each segment.

  • segment_shift (float) – The desired shift between conecutive segments.

  • last_chunk_mode (Literal['same_length', 'same_shift'])

Return type:

list[AudioSegment]

If length < segment_length, returns a single chunk from 0 to length. Otherwise calculates how much chunks with the given segment_length and segment_shift fit into the length. If the length does not accommodate an integer number of shifts, adds a single additional chunk:

  • If last_chunk_mode='same_length': from length - segment_length to length

  • If last_chunk_mode='same_shift': from <last_chunk_end> + segment_shift to length

<---->  segment_shift
<----------------------->  segment_length
<--------------------------------------->  length
=========================                |
      ==========================         |
            ===========================  |
              ===========================|  # an additional chunk

Example

>>> chunk_audio(length=41, segment_length=30, segment_shift=5)
[AudioSegment(start_time=0.0, end_time=30.0),
AudioSegment(start_time=5.0, end_time=35.0),
AudioSegment(start_time=10.0, end_time=40.0),
AudioSegment(start_time=11, end_time=41)]
asr_eval.segments.chunking.average_segment_features(segments, features, feature_tick_size, averaging_weights='beta')[source]

Given audio features calculated on the given audio chunking, averages them. The chunks (segments) may overlap.

Parameters:
  • segments (list[AudioSegment]) – A list of segments. Typically obtained by a uniform chunking using chunk_audio(), but may also be non-uniform.

  • features (list[ndarray[tuple[int, ...], dtype[floating[Any]]]] | list[ndarray[tuple[int, ...], dtype[integer[Any]]]]) – 2D feature array for each segment.

  • feature_tick_size (float) – A time interval between consecutive positions in features.

  • averaging_weights (Literal['beta', 'uniform']) – May be “uniform” (flat) or “beta” (decaying at time edges of each feature array in features). Is used to weight features.

Return type:

ndarray[tuple[int, ...], dtype[floating[Any]]]

asr_eval.tts

Utils for text-to-speech.

asr_eval.tts.yandex_speechkit.yandex_text_to_speech(text, api_key, voice='random', role='random', speed=1, language='russian')[source]

A wrapper for speech synthesis with Yandex API v3. Will also work for long texts, by joining synthesized parts with pauses.

Return type:

tuple[ndarray[tuple[int, ...], dtype[floating[Any]]], str, str]

Returns:

Audio, voice and role.

Raises:

May raise grpc._channel._Rendezvous exception as said in docs.

Parameters:
  • text (str)

  • api_key (str)

  • voice (str | Literal['random'])

  • role (str | Literal['random'])

  • speed (float)

  • language (Literal['russian', 'english'])

Installation: pip install yandex-speechkit.

To obtain API key, create service account and API key, as described: https://yandex.cloud/ru/docs/speechkit/quickstart/stt-quickstart-v2

asr_eval.utils

Various utilities for asr_eval.

class asr_eval.utils.storage.BaseStorage[source]

Bases: ABC

A persistent key-value storage.

Represents a table, where rows are key-value pairs, the “value” column stores any picklable objects, and a variable number of columns act as a joint key, with values of type string, int, float, bool or None (not set).

To add a new row (key-value pair), you don’t need to specify values for all the key columns added earlier, the omitted columns will be filled with None. If you add a new key-value pair with a new key column not present earlier, we add this column with a value None for all other rows.

Note that since we do not differentiate bewteen the explicit “null” and the “not set”, storing the explicit nulls is not possible.

Example

>>> from asr_eval.utils.storage import BaseStorage, ShelfStorage
>>> st: BaseStorage = ShelfStorage('tmp/storage.db')
>>> st.add_row(value='Hi', dataset='fleurs', sample=0, what='ground_truth')
>>> st.add_row(value='Hi', dataset='fleurs', model='whisper', sample=0, what='pred')
>>> st.add_row(value='Ho', dataset='fleurs', model='tuned', steps=100, sample=0, what='pred')
>>> storage.list_all(load_values=True)

The result will be a dataframe with 3 rows and columns ‘value’, ‘dataset’, ‘sample’, ‘model’, ‘type’, ‘steps’. Cell values for the omitted keys will be filled with None.

abstractmethod has_row(**keys)[source]

Checks if we have a row (key-value pair) with the specified keys, and omitted keys being “not set”.

Return type:

bool

Parameters:

keys (str | int | float | bool)

abstractmethod add_row(value, overwrite=True, **keys)[source]

Adds a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row exists, i. e. contains(**keys) is True, will overwrite if overwrite=True, otherwise raises ValueError.

Parameters:
  • value (Any)

  • overwrite (bool)

  • keys (str | int | float | bool)

abstractmethod get_row(**keys)[source]

Gets a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row does not exist, i. e. contains(**keys) is False, raises KeyError.

Return type:

Any

Parameters:

keys (str | int | float | bool)

abstractmethod delete_row(missing_ok=False, **keys)[source]

Removes a row (key-value pair) with the specified keys, and omitted keys being “not set”. If missing_ok is False and such a row does not exist, i. e. contains(**keys) is False, raises KeyError.

Parameters:
  • missing_ok (bool)

  • keys (str | int | float | bool)

abstractmethod list_all(load_values=False, **keys)[source]

Gets a list of rows (key-value pairs) with the specified keys, and any values for the omitted keys. Fills the “not set” values with None. Drops full-None columns.

Return type:

DataFrame

Parameters:
  • load_values (bool)

  • keys (str | int | float | bool)

abstractmethod iter_rows(load_values=False, **keys)[source]

Same as .list_all(), but returns rows one by one, instead of converting all the rows in a dataframe.

Return type:

Iterator[dict[str, Any]]

Parameters:
  • load_values (bool)

  • keys (str | int | float | bool)

abstractmethod delete_all(**keys)[source]

Removes all rows (key-value pair) with the specified keys, and any values for the omitted keys.

Parameters:

keys (str | int | float | bool)

abstractmethod close()[source]

Closes the storage.

class asr_eval.utils.storage.DictStorage[source]

Bases: BaseStorage

A dict-based in-memory BaseStorage implementation.

has_row(**keys)[source]

Checks if we have a row (key-value pair) with the specified keys, and omitted keys being “not set”.

Return type:

bool

Parameters:

keys (str | int | float | bool)

add_row(value, overwrite=True, **keys)[source]

Adds a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row exists, i. e. contains(**keys) is True, will overwrite if overwrite=True, otherwise raises ValueError.

Parameters:
  • value (Any)

  • overwrite (bool)

  • keys (str | int | float | bool)

get_row(**keys)[source]

Gets a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row does not exist, i. e. contains(**keys) is False, raises KeyError.

Return type:

Any

Parameters:

keys (str | int | float | bool)

delete_row(missing_ok=False, **keys)[source]

Removes a row (key-value pair) with the specified keys, and omitted keys being “not set”. If missing_ok is False and such a row does not exist, i. e. contains(**keys) is False, raises KeyError.

Parameters:
  • missing_ok (bool)

  • keys (str | int | float | bool)

list_all(load_values=False, **keys)[source]

Gets a list of rows (key-value pairs) with the specified keys, and any values for the omitted keys. Fills the “not set” values with None. Drops full-None columns.

Return type:

DataFrame

Parameters:
  • load_values (bool)

  • keys (str | int | float | bool)

iter_rows(load_values=False, **keys)[source]

Same as .list_all(), but returns rows one by one, instead of converting all the rows in a dataframe.

Return type:

Iterator[dict[str, Any]]

Parameters:
  • load_values (bool)

  • keys (str | int | float | bool)

delete_all(**keys)[source]

Removes all rows (key-value pair) with the specified keys, and any values for the omitted keys.

Parameters:

keys (str | int | float | bool)

close()[source]

Closes the storage.

class asr_eval.utils.storage.ShelfStorage(path, read_only=False)[source]

Bases: DictStorage

An implementation of BaseStorage based on Python’s shelf.

With read_only=True you can open the same file multiple times simultaneously.

Note

Methods list_all() or delete_all() iterate over all the rows, which may be slow in this implementation.

Parameters:
  • path (str | Path)

  • read_only (bool)

close()[source]

Closes the storage.

class asr_eval.utils.storage.CSVStorage(path)[source]

Bases: BaseStorage

A csv-based BaseStorage implementation.

Note that BaseStorage can use int/float/str/bool as key types.

Warning

Gemini 3.0 LLM code!

Note

While BaseStorage interface is flexible and values can be of any pickleable type, CSV format is very limited and not typed. In this implementation, we try to serialize objects such as timed text segments into json and back, but this may cause unexpected behaviour or simply not work in some cases. Also note that row deletion/modification operation is extremely inefficient since it requires to rewrite the whole file. Finally, not that simultaneous modifications to the same file should not be done, or this may cause errors.

Parameters:

path (str | Path)

has_row(**keys)[source]

Checks if we have a row (key-value pair) with the specified keys, and omitted keys being “not set”.

Return type:

bool

Parameters:

keys (str | int | float | bool)

add_row(value, overwrite=True, **keys)[source]

Adds a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row exists, i. e. contains(**keys) is True, will overwrite if overwrite=True, otherwise raises ValueError.

Parameters:
  • value (Any)

  • overwrite (bool)

  • keys (str | int | float | bool)

get_row(**keys)[source]

Gets a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row does not exist, i. e. contains(**keys) is False, raises KeyError.

Return type:

Any

Parameters:

keys (str | int | float | bool)

delete_row(missing_ok=False, **keys)[source]

Removes a row (key-value pair) with the specified keys, and omitted keys being “not set”. If missing_ok is False and such a row does not exist, i. e. contains(**keys) is False, raises KeyError.

Parameters:
  • missing_ok (bool)

  • keys (str | int | float | bool)

list_all(load_values=False, **keys)[source]

Gets a list of rows (key-value pairs) with the specified keys, and any values for the omitted keys. Fills the “not set” values with None. Drops full-None columns.

Return type:

DataFrame

Parameters:
  • load_values (bool)

  • keys (str | int | float | bool)

iter_rows(load_values=False, **keys)[source]

Same as .list_all(), but returns rows one by one, instead of converting all the rows in a dataframe.

Return type:

Iterator[dict[str, Any]]

Parameters:
  • load_values (bool)

  • keys (str | int | float | bool)

delete_all(**keys)[source]

Removes all rows (key-value pair) with the specified keys, and any values for the omitted keys.

Parameters:

keys (str | int | float | bool)

close()[source]

Closes the storage.

class asr_eval.utils.storage.DiskcacheStorage(dir)[source]

Bases: DictStorage

An implementation of BaseStorage based on diskcache.

Note

Methods list_all() or delete_all() iterate over all the rows, which may be slow in this implementation.

Parameters:

dir (str | Path)

close()[source]

Closes the storage.

asr_eval.utils.audio_ops.waveform_to_bytes(waveform, sampling_rate=16_000, format='wav')[source]

Converts a waveform into WAV bytes (or another format passed as format argument).

Return type:

bytes

Parameters:
  • waveform (ndarray[tuple[int, ...], dtype[floating[Any]]])

  • sampling_rate (int)

  • format (str)

asr_eval.utils.audio_ops.merge_synthetic_speech(waveforms, sampling_rate=16_000, pause_range=(0.2, 1.2), random_seed=None)[source]

Merges speech segments using silent pauses of random length in pause_range.

Is suitable to construct a longform synthetic speech.

Return type:

ndarray[tuple[int, ...], dtype[floating[Any]]]

Parameters:
  • waveforms (list[ndarray[tuple[int, ...], dtype[floating[Any]]]])

  • sampling_rate (int)

  • pause_range (tuple[float, float])

  • random_seed (int | None)

asr_eval.utils.audio_ops.waveform_as_file(waveform)[source]

Turns a waveform into a file. The file is deleted on exit from the context.

Return type:

Iterator[Path]

Parameters:

waveform (ndarray[tuple[int, ...], dtype[floating[Any]]])

Example

>>> with waveform_as_file(waveform) as audio_path:
...     recognize_speech(path=audio_path)
asr_eval.utils.audio_ops.convert_audio_format(waveform, to_audio_type='float')[source]

Converts a waveform with sampling rate 16000 into one of the pre-defined formats:

  • ‘float’: float values, preferrably from -1 to 1. Does nothing

    because this is the same as input format.

  • ‘int’: np.int16 values.

  • ‘bytes’: 2 bytes per frame.

  • ‘wav’: 2 bytes per frame plus WAV header.

TODO find some python library that already supports these formats and conversions, or design this better.

Return type:

ndarray[tuple[int, ...], dtype[floating[Any]]] | ndarray[tuple[int, ...], dtype[integer[Any]]] | bytes

Parameters:
  • waveform (ndarray[tuple[int, ...], dtype[floating[Any]]])

  • to_audio_type (Literal['float', 'int', 'bytes', 'wav'])

class asr_eval.utils.cacheable.DiskCacheable(fn, cache_path)[source]

A wrapper for callable that converts string to string. Caches the inputs and outputs into a file using Python’s shelf.

Parameters:
  • fn (Callable[[str], str])

  • cache_path (str | Path)

class asr_eval.utils.dataframe.DataclassDataFrame(data=None)[source]

Bases: Generic

A pandas-like table backed by a list of rows as dataclass objects. That is, DataclassDataFrame(lst: list[MyDataclass]) behaves similarly to pd.DataFrame([vars(obj) for obj in lst]).

There is no “index” in the DataclassDataFrame, just as in Polars.

Parameters:

data (list[T] | None)

asr_eval.utils.deduplicate.find_audio_duplicates(dataset, window_size=16_000, num_proc=32)[source]

Finds duplicates even with different normalization constant or different slicing. For example, if audio B is a copy of A, but sliced from 1 to 5 seconds, and multiplied by 2, will still detect it as a duplicate.

Return type:

set[Duplicate]

Parameters:
  • dataset (Dataset)

  • window_size (int)

  • num_proc (int)

It does the following: 1. applies np.sign(np.diff(waveform)).astype(np.int8) to

each waveform

  1. in each waveform, finds all positions where ANCHOR is found (usually every ~0.1 sec)

  2. for each position P, extracts integer hash of waveform[P:P+window_size].

  3. also extracts integer hashes for the whole waveforms

  4. if equal hash is found for two different samples, adds them to duplicates set

  5. if this is the whole audio hash, sets mode='whole', otherwise mode='partial'.

asr_eval.utils.deduplicate.find_audio_duplicates_for_multiple_splits(splits, splits_order, window_size=16_000, num_proc=32)[source]

A generalization of find_audio_duplicates() that is applicable to a datset with multiple splits.

Forms a dataframe with columns: - dup_split - a split of a duplicated sample - dup_idx - a positional index of a duplicated sample - orig_split - a split of the original sample - orig_idx - a positional index of the original sample - mode - if duplicate is “whole” or “partial”

If two duplicated samples are found in different splits, their split indices in splits_order are compared: the smaller split index is considered original, and the larger is considered duplicated. So, if you dataset has “train”, “val” and “test” splits, specify splits_order=['train', 'val', 'test']. This ensures that if a sample is found in train and test splits, it will be considered duplicate (to remove later) in the test split.

Return type:

DataFrame

Parameters:
  • splits (dict[str, Dataset] | DatasetDict)

  • splits_order (Sequence[str])

  • window_size (int)

  • num_proc (int)

class asr_eval.utils.deduplicate.Duplicate(mode, sample_idxs)[source]

An information about found duplicate.

Parameters:
  • mode (Literal['whole', 'partial'])

  • sample_idxs (list[int])

mode: Literal['whole', 'partial']

If “partial” this is a duplicate with different slicing. For example, if sample #1 has a length of 10 seconds, and sample #0 is a slice of sample #1 from 3 to 7 seconds, then they both form a Duplicate(mode='partial', sample_idxs=[0, 1]).

sample_idxs: list[int]

A list of sample indices that are considered duplicates.

asr_eval.utils.deduplicate.visualize_speaker_embeddings(splits, split_colors=None, max_samples_per_split=None, save_path=None, show=True)[source]

Performs speaker embedding analysis via UMAP projection into a 2D plot. Draws the plot and saves to the save_path. Returns speaker embeddings, both original and after UMAP.

Requires pip install torch umap-learn pyannote.audio

Return type:

tuple[ndarray[tuple[int, ...], dtype[floating[Any]]], ndarray[tuple[int, ...], dtype[floating[Any]]]]

Parameters:
  • splits (dict[str, Dataset] | DatasetDict)

  • split_colors (dict[str, str] | None)

  • max_samples_per_split (int | None)

  • save_path (str | Path | None)

  • show (bool)

class asr_eval.utils.formatting.Formatting(color=None, on_color=None, attrs=<factory>)[source]

ANSI text formatting attrubutes, such as “bold”, “red” etc.

Example

>>> from asr_eval.utils.formatting import Formatting
>>> Formatting(color='red', attrs={'strike'})
...
Parameters:
  • color (str | None)

  • on_color (str | None)

  • attrs (set[str])

class asr_eval.utils.formatting.FormattingSpan(fmt, start, end)[source]

A Formatting with the corresponding start and end positions in the text.

Note that the positions are specified for the text before adding ANSI color codes.

Parameters:
asr_eval.utils.formatting.apply_formatting(text, spans, color_mode='ansi')[source]

Applies ANSI formatting to the specified spans in the text.

Return type:

str

Parameters:
  • text (str)

  • spans (list[FormattingSpan])

  • color_mode (Literal['ansi', 'html'])

Example

>>> from asr_eval.utils.formatting import apply_formatting, Formatting, FormattingSpan
>>> apply_formatting('ABCDEFXXXYYY', [
...     FormattingSpan(Formatting(color='red'), 0, 5),
...     FormattingSpan(Formatting(on_color='on_black'), 0, 3),
...     FormattingSpan(Formatting(attrs={'strike'}), 0, 9),
... ])
ABC
[31mDEFXXXYYY

(this can be rendered in Jupyter notebook or console)

If color_mode='html', converts the ANSI codes into HTML. If overlaps occur, the shorter spans are prioritized.

asr_eval.utils.misc.new_uid()[source]

A unique ID generator.

Return type:

str

asr_eval.utils.misc.groupby_into_spans(iterable)[source]

Find spans of the same value in a sequence. Returns (value, start_index, end_index).

Return type:

Iterable[tuple[TypeVar(T), int, int]]

Parameters:

iterable (Iterable)

Example

>>> list(groupby_into_spans(['x', 'x', 'b', 'a', 'a', 'a']))
[('x', 0, 2), ('b', 2, 3), ('a', 3, 6)]
asr_eval.utils.misc.list_join(sep, iterable)[source]

Combines iterables via a given separator. Acts like str.join, but for lists.

Return type:

list[TypeVar(T)]

Parameters:
  • sep (T)

  • iterable (Iterable)

asr_eval.utils.misc.rolling_window(arr, size)[source]

Returns all subarrays of length size, stacked together along a new axis.

Return type:

TypeVar(T, ndarray[tuple[int, ...], dtype[integer[Any]]], ndarray[tuple[int, ...], dtype[floating[Any]]])

Parameters:
  • arr (T)

  • size (int)

Example

>>> rolling_window(np.array([1, 0, 2, 1, 3, 5]), 3)
array([[1, 0, 2], [0, 2, 1], [2, 1, 3], [1, 3, 5]])

Taken from: https://stackoverflow.com/a/7100681

asr_eval.utils.misc.locate_subarray_in_array(arr, subarr)[source]

Finds all positions X where arr[X:X+len(subarr)] equals subarr, in effiecient way.

Return type:

list[int]

Parameters:
  • arr (T)

  • subarr (T)

asr_eval.utils.plots.draw_line_with_ticks(x1, x2, y, y_tick_width, ax, **kwargs)[source]

Draws a horizontal line with ticks at the ends.

Parameters:
  • x1 (float)

  • x2 (float)

  • y (float)

  • y_tick_width (float)

  • ax (Axes)

  • kwargs (Any)

asr_eval.utils.plots.draw_bezier(xy_points, ax, indent=0.1, zorder=0, lw=1, color='darkgray')[source]

Draws a Bezier curve.

Parameters:
  • xy_points (list[tuple[float, float]])

  • ax (Axes)

  • indent (float)

  • zorder (int)

  • lw (float)

  • color (str)

class asr_eval.utils.serializing.SerializableToDict[source]

Bases: ABC

An interface to to serialize an object into a json-compatibl dict with serialize_object() and load back with deserialize_object().

Is not needed for dataclasses, only for objects with custom (de)serialization logic.

abstractmethod serialize_to_dict()[source]

Returns a dict to write into json. The resulting dict can be passed back into the class constructor to restore an equal object.

Return type:

dict[str, Any]

asr_eval.utils.serializing.save_to_json(obj, path, indent=4)[source]

Serializes an hierarchical structure of dataclasses/lists/dicts to a json-compatible dict and then saves to a .json file. Can be loaded back with load_from_json().

If an exception or keyboard interrupt happens during saving, the file will not br created.

Parameters:
  • obj (Any)

  • path (str | Path)

  • indent (int)

asr_eval.utils.serializing.load_from_json(path)[source]

Loads a data structure that was saved with save_to_json(). If the .json file does not contain any _target_ fields, will act equally to json.loads(path.read_text()).

Return type:

Any

Parameters:

path (str | Path)

asr_eval.utils.serializing.serialize_object(obj)[source]

Serializes an hierarchical structure of dataclasses, lists, dicts or enums into a json-compatible dict.

This includes converting dataclasses into dicts (omitting fields where the value is None and the default value is also None). The class full name is written to the additional _target_ field to construct the object back with deserialize_object().

Besides dataclasses, can serialize SerializableToDict() objects. This is useful for custom classes that are not dataclasses, but we want to be able to save (to json or yaml) and load them.

Return type:

Any

Parameters:

obj (Any)

asr_eval.utils.serializing.deserialize_object(serialized, ignore_errors=False)[source]

Deserializes an object serialized with serialize_object().

If no :code:’_target_’ fields found, returns the input data without changes.

Return type:

Any

Parameters:
  • serialized (Any)

  • ignore_errors (bool)

class asr_eval.utils.server.ServerAsSubprocess(cmd, ready_message='Application startup complete', verbose=True)[source]

The class constructor runs a given command as a suprocess and waits until a ready_message appears in the stdout output. After this, you can use .stop() to send SIGINT to the process.

Example

>>> vllm_proc = ServerAsSubprocess([
...     'vllm', 'serve', 'mistralai/Voxtral-Mini-3B-2507', '--port', '8001', ...
... ], ready_message='Application startup complete', verbose=False)
>>> # here you can make API calls to the VLLM server http://localhost:8001/v1
>>> vllm_proc.stop()
Parameters:
  • cmd (list[str])

  • ready_message (str | None)

  • verbose (bool)

class asr_eval.utils.shelves.TupleKeyShelf(path)[source]

Bases: MutableMapping[tuple[str, …], Any]

A wrapper around a shelve.Shelf that uses tuples of strings as keys. Internally, keys are stored as a single string joined by NUL () to avoid collisions.

Parameters:

path (str | Path)

asr_eval.utils.srt_wrapper.utterances_to_srt(utterances)[source]

Composes an SRT file contents from texts, start and end times.

Return type:

str

Parameters:

utterances (list[tuple[str, float, float]])

asr_eval.utils.srt_wrapper.read_srt(path)[source]

Reads .srt transcription file into a list of TimedText.

Return type:

list[TimedText]

Parameters:

path (str | Path)

class asr_eval.utils.table.Table2D(data)[source]

Bases: Generic[T]

A type-safe 2D table with cells of type T with default cell values.

Supports: - slicing [:, :] - returns a new Table2D[T] - slicing [:, i] or [i, :] - returns a list[T] - slicing [i, j] - returns T - mapping with function T -> T2, returns a new Table2D[T2] - appending and prepending rows and columns of type list[T] - converting to DataFrame (without col/row names) with .to_pandas() - getting .shape

Parameters:

data (np.ndarray[tuple[int, int], Any])

class asr_eval.utils.timer.Timer(timeout=0, verbose=None)[source]

A timer that can be used as a context.

Can be user to know how much time was spent and/or is left, example:
>>> with Timer(timeout=2) as timer:
...     print(timer.get_remaining_time())
...     time.sleep(1)
...     print(timer.get_remaining_time())
...     time.sleep(1)
...     print(timer.get_remaining_time())
...     time.sleep(1)
...     print(timer.get_remaining_time())
2.0
1.0
0.001
TimeoutError: negative time left in Timer

If timeout=0 in constructor, .get_remaining_time() will always be zero. This is useful when we want to treat 0 as “no timeout”.

If :code:verbose` is a string, will print it and elapsed time on exit from the context. `

Parameters:
  • timeout (float)

  • verbose (str | None)

asr_eval.utils.types.FLOATS

A typization for numpy array of floats

alias of ndarray[tuple[int, …], dtype[floating[Any]]]

asr_eval.utils.types.INTS

A typization for numpy array of integers

alias of ndarray[tuple[int, …], dtype[integer[Any]]]