Miscellaneous¶
- asr_eval.ROOT_DIR¶
The root directory for the asr_eval package, where its
__init__.pylives.
- asr_eval.CACHE_DIR¶
A cache dir for asr_eval.
Default ~/.cache/asr_eval/ on Linux. May be overridden by setting the environmental variable
ASR_EVAL_CACHE.
asr_eval.segments¶
Audio segmentation utils.
- class asr_eval.segments.AudioSegment(start_time, end_time)[source]¶
An audio segment from
.start_timeto.end_time.Is immutable.
- Parameters:
start_time (float)
end_time (float)
- start_pos(sampling_rate=16_000)[source]¶
Get the start array position given a sampling rate.
- Return type:
int- Parameters:
sampling_rate (int)
- end_pos(sampling_rate=16_000)[source]¶
Get the end array position given a sampling rate.
- Return type:
int- Parameters:
sampling_rate (int)
- slice(sampling_rate=16_000)[source]¶
Get a
slicefrom the start to the end array position given a sampling rate.- Parameters:
sampling_rate (int)
- Return type:
slice[int]
- property duration: float¶
A duration in seconds.
- overlap_seconds(other)[source]¶
An overlap with another segment in seconds.
- Return type:
float- Parameters:
other (AudioSegment)
- expand(left_indent, right_indent)[source]¶
Expands, given left and right indent. Avoids going into negative time positions. Returns a copy without modifying the original segment.
- Return type:
Self- Parameters:
left_indent (float)
right_indent (float)
- clip(max_sound_duration)[source]¶
Clips start and end times up to the given time. Returns a copy without modifying the original segment.
- Return type:
Self- Parameters:
max_sound_duration (float)
- property center_time¶
Gets a center time in seconds.
- class asr_eval.segments.TimedText(start_time, end_time, text)[source]¶
Bases:
AudioSegmentAn
AudioSegmentwith the corresponding text.- Parameters:
start_time (float)
end_time (float)
text (str)
- class asr_eval.segments.DiarizationSegment(start_time, end_time, speaker)[source]¶
Bases:
AudioSegmentAn
AudioSegmentwith the corresponding speaker index or name.- Parameters:
start_time (float)
end_time (float)
speaker (int | str)
- class asr_eval.segments.TimedDiarizationText(start_time, end_time, speaker, text)[source]¶
Bases:
TimedText,DiarizationSegment- Parameters:
start_time (float)
end_time (float)
speaker (int | str)
text (str)
- asr_eval.segments.chunking.chunk_audio(length, segment_length, segment_shift, last_chunk_mode='same_length')[source]¶
Chunks the audio uniformly.
- Parameters:
length (
float) – A total audio length.segment_length (
float) – The desired length of each segment.segment_shift (
float) – The desired shift between conecutive segments.last_chunk_mode (Literal['same_length', 'same_shift'])
- Return type:
list[AudioSegment]
If
length < segment_length, returns a single chunk from 0 to length. Otherwise calculates how much chunks with the givensegment_lengthandsegment_shiftfit into thelength. If the length does not accommodate an integer number of shifts, adds a single additional chunk:If
last_chunk_mode='same_length': fromlength - segment_lengthtolengthIf
last_chunk_mode='same_shift': from<last_chunk_end> + segment_shifttolength
<----> segment_shift <-----------------------> segment_length <---------------------------------------> length ========================= | ========================== | =========================== | ===========================| # an additional chunkExample
>>> chunk_audio(length=41, segment_length=30, segment_shift=5) [AudioSegment(start_time=0.0, end_time=30.0), AudioSegment(start_time=5.0, end_time=35.0), AudioSegment(start_time=10.0, end_time=40.0), AudioSegment(start_time=11, end_time=41)]
- asr_eval.segments.chunking.average_segment_features(segments, features, feature_tick_size, averaging_weights='beta')[source]¶
Given audio features calculated on the given audio chunking, averages them. The chunks (
segments) may overlap.- Parameters:
segments (
list[AudioSegment]) – A list of segments. Typically obtained by a uniform chunking usingchunk_audio(), but may also be non-uniform.features (
list[ndarray[tuple[int,...],dtype[floating[Any]]]] |list[ndarray[tuple[int,...],dtype[integer[Any]]]]) – 2D feature array for each segment.feature_tick_size (
float) – A time interval between consecutive positions infeatures.averaging_weights (
Literal['beta','uniform']) – May be “uniform” (flat) or “beta” (decaying at time edges of each feature array infeatures). Is used to weight features.
- Return type:
ndarray[tuple[int,...],dtype[floating[Any]]]
asr_eval.tts¶
Utils for text-to-speech.
- asr_eval.tts.yandex_speechkit.yandex_text_to_speech(text, api_key, voice='random', role='random', speed=1, language='russian')[source]¶
A wrapper for speech synthesis with Yandex API v3. Will also work for long texts, by joining synthesized parts with pauses.
- Return type:
tuple[ndarray[tuple[int,...],dtype[floating[Any]]],str,str]- Returns:
Audio, voice and role.
- Raises:
May raise grpc._channel._Rendezvous exception as said in docs. –
- Parameters:
text (str)
api_key (str)
voice (str | Literal['random'])
role (str | Literal['random'])
speed (float)
language (Literal['russian', 'english'])
Installation:
pip install yandex-speechkit.To obtain API key, create service account and API key, as described: https://yandex.cloud/ru/docs/speechkit/quickstart/stt-quickstart-v2
asr_eval.utils¶
Various utilities for asr_eval.
- class asr_eval.utils.storage.BaseStorage[source]¶
Bases:
ABCA persistent key-value storage.
Represents a table, where rows are key-value pairs, the “value” column stores any picklable objects, and a variable number of columns act as a joint key, with values of type string, int, float, bool or None (not set).
To add a new row (key-value pair), you don’t need to specify values for all the key columns added earlier, the omitted columns will be filled with None. If you add a new key-value pair with a new key column not present earlier, we add this column with a value None for all other rows.
Note that since we do not differentiate bewteen the explicit “null” and the “not set”, storing the explicit nulls is not possible.
Example
>>> from asr_eval.utils.storage import BaseStorage, ShelfStorage >>> st: BaseStorage = ShelfStorage('tmp/storage.db') >>> st.add_row(value='Hi', dataset='fleurs', sample=0, what='ground_truth') >>> st.add_row(value='Hi', dataset='fleurs', model='whisper', sample=0, what='pred') >>> st.add_row(value='Ho', dataset='fleurs', model='tuned', steps=100, sample=0, what='pred') >>> storage.list_all(load_values=True)
The result will be a dataframe with 3 rows and columns ‘value’, ‘dataset’, ‘sample’, ‘model’, ‘type’, ‘steps’. Cell values for the omitted keys will be filled with None.
- abstractmethod has_row(**keys)[source]¶
Checks if we have a row (key-value pair) with the specified keys, and omitted keys being “not set”.
- Return type:
bool- Parameters:
keys (str | int | float | bool)
- abstractmethod add_row(value, overwrite=True, **keys)[source]¶
Adds a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row exists, i. e.
contains(**keys)is True, will overwrite ifoverwrite=True, otherwise raisesValueError.- Parameters:
value (Any)
overwrite (bool)
keys (str | int | float | bool)
- abstractmethod get_row(**keys)[source]¶
Gets a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row does not exist, i. e.
contains(**keys)is False, raisesKeyError.- Return type:
Any- Parameters:
keys (str | int | float | bool)
- abstractmethod delete_row(missing_ok=False, **keys)[source]¶
Removes a row (key-value pair) with the specified keys, and omitted keys being “not set”. If missing_ok is False and such a row does not exist, i. e.
contains(**keys)is False, raisesKeyError.- Parameters:
missing_ok (bool)
keys (str | int | float | bool)
- abstractmethod list_all(load_values=False, **keys)[source]¶
Gets a list of rows (key-value pairs) with the specified keys, and any values for the omitted keys. Fills the “not set” values with None. Drops full-None columns.
- Return type:
DataFrame- Parameters:
load_values (bool)
keys (str | int | float | bool)
- abstractmethod iter_rows(load_values=False, **keys)[source]¶
Same as
.list_all(), but returns rows one by one, instead of converting all the rows in a dataframe.- Return type:
Iterator[dict[str,Any]]- Parameters:
load_values (bool)
keys (str | int | float | bool)
- class asr_eval.utils.storage.DictStorage[source]¶
Bases:
BaseStorageA dict-based in-memory
BaseStorageimplementation.- has_row(**keys)[source]¶
Checks if we have a row (key-value pair) with the specified keys, and omitted keys being “not set”.
- Return type:
bool- Parameters:
keys (str | int | float | bool)
- add_row(value, overwrite=True, **keys)[source]¶
Adds a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row exists, i. e.
contains(**keys)is True, will overwrite ifoverwrite=True, otherwise raisesValueError.- Parameters:
value (Any)
overwrite (bool)
keys (str | int | float | bool)
- get_row(**keys)[source]¶
Gets a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row does not exist, i. e.
contains(**keys)is False, raisesKeyError.- Return type:
Any- Parameters:
keys (str | int | float | bool)
- delete_row(missing_ok=False, **keys)[source]¶
Removes a row (key-value pair) with the specified keys, and omitted keys being “not set”. If missing_ok is False and such a row does not exist, i. e.
contains(**keys)is False, raisesKeyError.- Parameters:
missing_ok (bool)
keys (str | int | float | bool)
- list_all(load_values=False, **keys)[source]¶
Gets a list of rows (key-value pairs) with the specified keys, and any values for the omitted keys. Fills the “not set” values with None. Drops full-None columns.
- Return type:
DataFrame- Parameters:
load_values (bool)
keys (str | int | float | bool)
- iter_rows(load_values=False, **keys)[source]¶
Same as
.list_all(), but returns rows one by one, instead of converting all the rows in a dataframe.- Return type:
Iterator[dict[str,Any]]- Parameters:
load_values (bool)
keys (str | int | float | bool)
- class asr_eval.utils.storage.ShelfStorage(path, read_only=False)[source]¶
Bases:
DictStorageAn implementation of
BaseStoragebased on Python’sshelf.With
read_only=Trueyou can open the same file multiple times simultaneously.Note
Methods
list_all()ordelete_all()iterate over all the rows, which may be slow in this implementation.- Parameters:
path (str | Path)
read_only (bool)
- class asr_eval.utils.storage.CSVStorage(path)[source]¶
Bases:
BaseStorageA csv-based
BaseStorageimplementation.Note that
BaseStoragecan use int/float/str/bool as key types.Warning
Gemini 3.0 LLM code!
Note
While
BaseStorageinterface is flexible and values can be of any pickleable type, CSV format is very limited and not typed. In this implementation, we try to serialize objects such as timed text segments into json and back, but this may cause unexpected behaviour or simply not work in some cases. Also note that row deletion/modification operation is extremely inefficient since it requires to rewrite the whole file. Finally, not that simultaneous modifications to the same file should not be done, or this may cause errors.- Parameters:
path (str | Path)
- has_row(**keys)[source]¶
Checks if we have a row (key-value pair) with the specified keys, and omitted keys being “not set”.
- Return type:
bool- Parameters:
keys (str | int | float | bool)
- add_row(value, overwrite=True, **keys)[source]¶
Adds a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row exists, i. e.
contains(**keys)is True, will overwrite ifoverwrite=True, otherwise raisesValueError.- Parameters:
value (Any)
overwrite (bool)
keys (str | int | float | bool)
- get_row(**keys)[source]¶
Gets a row (key-value pair) with the specified keys, and omitted keys being “not set”. If such a row does not exist, i. e.
contains(**keys)is False, raisesKeyError.- Return type:
Any- Parameters:
keys (str | int | float | bool)
- delete_row(missing_ok=False, **keys)[source]¶
Removes a row (key-value pair) with the specified keys, and omitted keys being “not set”. If missing_ok is False and such a row does not exist, i. e.
contains(**keys)is False, raisesKeyError.- Parameters:
missing_ok (bool)
keys (str | int | float | bool)
- list_all(load_values=False, **keys)[source]¶
Gets a list of rows (key-value pairs) with the specified keys, and any values for the omitted keys. Fills the “not set” values with None. Drops full-None columns.
- Return type:
DataFrame- Parameters:
load_values (bool)
keys (str | int | float | bool)
- iter_rows(load_values=False, **keys)[source]¶
Same as
.list_all(), but returns rows one by one, instead of converting all the rows in a dataframe.- Return type:
Iterator[dict[str,Any]]- Parameters:
load_values (bool)
keys (str | int | float | bool)
- class asr_eval.utils.storage.DiskcacheStorage(dir)[source]¶
Bases:
DictStorageAn implementation of
BaseStoragebased on diskcache.Note
Methods
list_all()ordelete_all()iterate over all the rows, which may be slow in this implementation.- Parameters:
dir (str | Path)
- asr_eval.utils.audio_ops.waveform_to_bytes(waveform, sampling_rate=16_000, format='wav')[source]¶
Converts a waveform into WAV bytes (or another format passed as
formatargument).- Return type:
bytes- Parameters:
waveform (ndarray[tuple[int, ...], dtype[floating[Any]]])
sampling_rate (int)
format (str)
- asr_eval.utils.audio_ops.merge_synthetic_speech(waveforms, sampling_rate=16_000, pause_range=(0.2, 1.2), random_seed=None)[source]¶
Merges speech segments using silent pauses of random length in
pause_range.Is suitable to construct a longform synthetic speech.
- Return type:
ndarray[tuple[int,...],dtype[floating[Any]]]- Parameters:
waveforms (list[ndarray[tuple[int, ...], dtype[floating[Any]]]])
sampling_rate (int)
pause_range (tuple[float, float])
random_seed (int | None)
- asr_eval.utils.audio_ops.waveform_as_file(waveform)[source]¶
Turns a waveform into a file. The file is deleted on exit from the context.
- Return type:
Iterator[Path]- Parameters:
waveform (ndarray[tuple[int, ...], dtype[floating[Any]]])
Example
>>> with waveform_as_file(waveform) as audio_path: ... recognize_speech(path=audio_path)
- asr_eval.utils.audio_ops.convert_audio_format(waveform, to_audio_type='float')[source]¶
Converts a waveform with sampling rate 16000 into one of the pre-defined formats:
- ‘float’: float values, preferrably from -1 to 1. Does nothing
because this is the same as input format.
‘int’:
np.int16values.‘bytes’: 2 bytes per frame.
‘wav’: 2 bytes per frame plus WAV header.
TODO find some python library that already supports these formats and conversions, or design this better.
- Return type:
ndarray[tuple[int,...],dtype[floating[Any]]] |ndarray[tuple[int,...],dtype[integer[Any]]] |bytes- Parameters:
waveform (ndarray[tuple[int, ...], dtype[floating[Any]]])
to_audio_type (Literal['float', 'int', 'bytes', 'wav'])
- class asr_eval.utils.cacheable.DiskCacheable(fn, cache_path)[source]¶
A wrapper for callable that converts string to string. Caches the inputs and outputs into a file using Python’s
shelf.- Parameters:
fn (Callable[[str], str])
cache_path (str | Path)
- class asr_eval.utils.dataframe.DataclassDataFrame(data=None)[source]¶
Bases:
GenericA pandas-like table backed by a list of rows as dataclass objects. That is,
DataclassDataFrame(lst: list[MyDataclass])behaves similarly topd.DataFrame([vars(obj) for obj in lst]).There is no “index” in the DataclassDataFrame, just as in Polars.
- Parameters:
data (list[T] | None)
- asr_eval.utils.deduplicate.find_audio_duplicates(dataset, window_size=16_000, num_proc=32)[source]¶
Finds duplicates even with different normalization constant or different slicing. For example, if audio B is a copy of A, but sliced from 1 to 5 seconds, and multiplied by 2, will still detect it as a duplicate.
- Return type:
set[Duplicate]- Parameters:
dataset (Dataset)
window_size (int)
num_proc (int)
It does the following: 1. applies
np.sign(np.diff(waveform)).astype(np.int8)toeach waveform
in each waveform, finds all positions where ANCHOR is found (usually every ~0.1 sec)
for each position P, extracts integer hash of
waveform[P:P+window_size].also extracts integer hashes for the whole waveforms
if equal hash is found for two different samples, adds them to duplicates set
if this is the whole audio hash, sets
mode='whole', otherwisemode='partial'.
- asr_eval.utils.deduplicate.find_audio_duplicates_for_multiple_splits(splits, splits_order, window_size=16_000, num_proc=32)[source]¶
A generalization of
find_audio_duplicates()that is applicable to a datset with multiple splits.Forms a dataframe with columns: - dup_split - a split of a duplicated sample - dup_idx - a positional index of a duplicated sample - orig_split - a split of the original sample - orig_idx - a positional index of the original sample - mode - if duplicate is “whole” or “partial”
If two duplicated samples are found in different splits, their split indices in
splits_orderare compared: the smaller split index is considered original, and the larger is considered duplicated. So, if you dataset has “train”, “val” and “test” splits, specifysplits_order=['train', 'val', 'test']. This ensures that if a sample is found in train and test splits, it will be considered duplicate (to remove later) in the test split.- Return type:
DataFrame- Parameters:
splits (dict[str, Dataset] | DatasetDict)
splits_order (Sequence[str])
window_size (int)
num_proc (int)
- class asr_eval.utils.deduplicate.Duplicate(mode, sample_idxs)[source]¶
An information about found duplicate.
- Parameters:
mode (Literal['whole', 'partial'])
sample_idxs (list[int])
- mode: Literal['whole', 'partial']¶
If “partial” this is a duplicate with different slicing. For example, if sample #1 has a length of 10 seconds, and sample #0 is a slice of sample #1 from 3 to 7 seconds, then they both form a
Duplicate(mode='partial', sample_idxs=[0, 1]).
- sample_idxs: list[int]¶
A list of sample indices that are considered duplicates.
- asr_eval.utils.deduplicate.visualize_speaker_embeddings(splits, split_colors=None, max_samples_per_split=None, save_path=None, show=True)[source]¶
Performs speaker embedding analysis via UMAP projection into a 2D plot. Draws the plot and saves to the
save_path. Returns speaker embeddings, both original and after UMAP.Requires
pip install torch umap-learn pyannote.audio- Return type:
tuple[ndarray[tuple[int,...],dtype[floating[Any]]],ndarray[tuple[int,...],dtype[floating[Any]]]]- Parameters:
splits (dict[str, Dataset] | DatasetDict)
split_colors (dict[str, str] | None)
max_samples_per_split (int | None)
save_path (str | Path | None)
show (bool)
- class asr_eval.utils.formatting.Formatting(color=None, on_color=None, attrs=<factory>)[source]¶
ANSI text formatting attrubutes, such as “bold”, “red” etc.
Example
>>> from asr_eval.utils.formatting import Formatting >>> Formatting(color='red', attrs={'strike'}) ...
- Parameters:
color (str | None)
on_color (str | None)
attrs (set[str])
- class asr_eval.utils.formatting.FormattingSpan(fmt, start, end)[source]¶
A Formatting with the corresponding start and end positions in the text.
Note that the positions are specified for the text before adding ANSI color codes.
- Parameters:
fmt (Formatting)
start (int)
end (int)
- asr_eval.utils.formatting.apply_formatting(text, spans, color_mode='ansi')[source]¶
Applies ANSI formatting to the specified spans in the text.
- Return type:
str- Parameters:
text (str)
spans (list[FormattingSpan])
color_mode (Literal['ansi', 'html'])
Example
>>> from asr_eval.utils.formatting import apply_formatting, Formatting, FormattingSpan >>> apply_formatting('ABCDEFXXXYYY', [ ... FormattingSpan(Formatting(color='red'), 0, 5), ... FormattingSpan(Formatting(on_color='on_black'), 0, 3), ... FormattingSpan(Formatting(attrs={'strike'}), 0, 9), ... ]) [9m[40m[31mABC[0m[9m [31mDE[0m[9mFXXX[0mYYY[0m
(this can be rendered in Jupyter notebook or console)
If
color_mode='html', converts the ANSI codes into HTML. If overlaps occur, the shorter spans are prioritized.
- asr_eval.utils.misc.groupby_into_spans(iterable)[source]¶
Find spans of the same value in a sequence. Returns (value, start_index, end_index).
- Return type:
Iterable[tuple[TypeVar(T),int,int]]- Parameters:
iterable (Iterable)
Example
>>> list(groupby_into_spans(['x', 'x', 'b', 'a', 'a', 'a'])) [('x', 0, 2), ('b', 2, 3), ('a', 3, 6)]
- asr_eval.utils.misc.list_join(sep, iterable)[source]¶
Combines iterables via a given separator. Acts like str.join, but for lists.
- Return type:
list[TypeVar(T)]- Parameters:
sep (T)
iterable (Iterable)
- asr_eval.utils.misc.rolling_window(arr, size)[source]¶
Returns all subarrays of length
size, stacked together along a new axis.- Return type:
TypeVar(T,ndarray[tuple[int,...],dtype[integer[Any]]],ndarray[tuple[int,...],dtype[floating[Any]]])- Parameters:
arr (T)
size (int)
Example
>>> rolling_window(np.array([1, 0, 2, 1, 3, 5]), 3) array([[1, 0, 2], [0, 2, 1], [2, 1, 3], [1, 3, 5]])
Taken from: https://stackoverflow.com/a/7100681
- asr_eval.utils.misc.locate_subarray_in_array(arr, subarr)[source]¶
Finds all positions X where
arr[X:X+len(subarr)]equalssubarr, in effiecient way.- Return type:
list[int]- Parameters:
arr (T)
subarr (T)
- asr_eval.utils.plots.draw_line_with_ticks(x1, x2, y, y_tick_width, ax, **kwargs)[source]¶
Draws a horizontal line with ticks at the ends.
- Parameters:
x1 (float)
x2 (float)
y (float)
y_tick_width (float)
ax (Axes)
kwargs (Any)
- asr_eval.utils.plots.draw_bezier(xy_points, ax, indent=0.1, zorder=0, lw=1, color='darkgray')[source]¶
Draws a Bezier curve.
- Parameters:
xy_points (list[tuple[float, float]])
ax (Axes)
indent (float)
zorder (int)
lw (float)
color (str)
- class asr_eval.utils.serializing.SerializableToDict[source]¶
Bases:
ABCAn interface to to serialize an object into a json-compatibl dict with
serialize_object()and load back withdeserialize_object().Is not needed for dataclasses, only for objects with custom (de)serialization logic.
- asr_eval.utils.serializing.save_to_json(obj, path, indent=4)[source]¶
Serializes an hierarchical structure of dataclasses/lists/dicts to a json-compatible dict and then saves to a .json file. Can be loaded back with
load_from_json().If an exception or keyboard interrupt happens during saving, the file will not br created.
- Parameters:
obj (Any)
path (str | Path)
indent (int)
- asr_eval.utils.serializing.load_from_json(path)[source]¶
Loads a data structure that was saved with
save_to_json(). If the .json file does not contain any _target_ fields, will act equally tojson.loads(path.read_text()).- Return type:
Any- Parameters:
path (str | Path)
- asr_eval.utils.serializing.serialize_object(obj)[source]¶
Serializes an hierarchical structure of dataclasses, lists, dicts or enums into a json-compatible dict.
This includes converting dataclasses into dicts (omitting fields where the value is None and the default value is also None). The class full name is written to the additional
_target_field to construct the object back withdeserialize_object().Besides dataclasses, can serialize
SerializableToDict()objects. This is useful for custom classes that are not dataclasses, but we want to be able to save (to json or yaml) and load them.- Return type:
Any- Parameters:
obj (Any)
- asr_eval.utils.serializing.deserialize_object(serialized, ignore_errors=False)[source]¶
Deserializes an object serialized with
serialize_object().If no :code:’_target_’ fields found, returns the input data without changes.
- Return type:
Any- Parameters:
serialized (Any)
ignore_errors (bool)
- class asr_eval.utils.server.ServerAsSubprocess(cmd, ready_message='Application startup complete', verbose=True)[source]¶
The class constructor runs a given command as a suprocess and waits until a
ready_messageappears in the stdout output. After this, you can use.stop()to send SIGINT to the process.Example
>>> vllm_proc = ServerAsSubprocess([ ... 'vllm', 'serve', 'mistralai/Voxtral-Mini-3B-2507', '--port', '8001', ... ... ], ready_message='Application startup complete', verbose=False) >>> # here you can make API calls to the VLLM server http://localhost:8001/v1 >>> vllm_proc.stop()
- Parameters:
cmd (list[str])
ready_message (str | None)
verbose (bool)
- class asr_eval.utils.shelves.TupleKeyShelf(path)[source]¶
Bases:
MutableMapping[tuple[str, …],Any]A wrapper around a
shelve.Shelfthat uses tuples of strings as keys. Internally, keys are stored as a single string joined by NUL () to avoid collisions.- Parameters:
path (str | Path)
- asr_eval.utils.srt_wrapper.utterances_to_srt(utterances)[source]¶
Composes an SRT file contents from texts, start and end times.
- Return type:
str- Parameters:
utterances (list[tuple[str, float, float]])
- asr_eval.utils.srt_wrapper.read_srt(path)[source]¶
Reads .srt transcription file into a list of
TimedText.- Return type:
list[TimedText]- Parameters:
path (str | Path)
- class asr_eval.utils.table.Table2D(data)[source]¶
Bases:
Generic[T]A type-safe 2D table with cells of type T with default cell values.
Supports: - slicing [:, :] - returns a new Table2D[T] - slicing [:, i] or [i, :] - returns a list[T] - slicing [i, j] - returns T - mapping with function T -> T2, returns a new Table2D[T2] - appending and prepending rows and columns of type list[T] - converting to DataFrame (without col/row names) with .to_pandas() - getting .shape
- Parameters:
data (np.ndarray[tuple[int, int], Any])
- class asr_eval.utils.timer.Timer(timeout=0, verbose=None)[source]¶
A timer that can be used as a context.
- Can be user to know how much time was spent and/or is left, example:
>>> with Timer(timeout=2) as timer: ... print(timer.get_remaining_time()) ... time.sleep(1) ... print(timer.get_remaining_time()) ... time.sleep(1) ... print(timer.get_remaining_time()) ... time.sleep(1) ... print(timer.get_remaining_time()) 2.0 1.0 0.001 TimeoutError: negative time left in Timer
If
timeout=0in constructor,.get_remaining_time()will always be zero. This is useful when we want to treat 0 as “no timeout”.If :code:
verbose` is a string, will print it and elapsed time on exit from the context. `- Parameters:
timeout (float)
verbose (str | None)
- asr_eval.utils.types.FLOATS¶
A typization for numpy array of floats
alias of
ndarray[tuple[int, …],dtype[floating[Any]]]
- asr_eval.utils.types.INTS¶
A typization for numpy array of integers
alias of
ndarray[tuple[int, …],dtype[integer[Any]]]