asr_eval.bench

Tools for reproducible ASR evaluation and dashboard visualizations.

The package includes a registry for several types of components:

The package offers several command line tools:

Internally these tools use an abstract class BaseStorage to store the results. It has a concrete implementation ShelfStorage (but it is possible to quickly adapt to a new storage type such as SQLite or wandb). The PredictionLoader loads the saved predictions and performs string alignment, saving the alignments in cache. The helper functions get_dataset_data() and compare_pipelines() calculate metrics with bootstrap confidence and run a fine-grained comparison.

A command line utility to check that a pipeline works.

usage: sphinx-build [-h] [--audio PATH] [--trim N] pipeline

positional arguments:
pipeline A pipeline name registered in asr_eval.

options:
-h, --help show this help message and exit
--audio PATH Audio file path or pre-defined audios downloaded from Hugging Face: EN (default) - 10 sec English audio, EN_LONG - 47 sec English audio, RU - 20 sec Russian audio, RU_LONG - 77 sec Russian audio.
--trim N If float, take the first N seconds of the audio.

A command line wrapper around run_pipeline() to run pipeline(s) on dataset(s) and save the results.

Note that run_pipeline() as Python function accepts a single pipeline name, while the command line tool allows to specify multiple names or patterns.

See more details and examples in the user guide Evaluation and dashboard.

usage: sphinx-build [-h] -p PATTERN [PATTERN ...] -d PATTERN [PATTERN ...]
[-s PATH] [--print] [--overwrite] [--suffix SUFFIX]
[--keep KEEP [KEEP ...]] [--import IMPORT_ [IMPORT_ ...]]

options:
-h, --help show this help message and exit
-p PATTERN [PATTERN ...], --pipeline PATTERN [PATTERN ...]
A list of pipeline names or patterns. Searches for all piplines in the registry that match patterns, then call run_pipeline() for each of the found pipelines. For example, "--pipeline gigaam-* whisper-tiny" will run all pipelines starting from "gigaam-", and a "whisper-tiny" pipeline.
-d PATTERN [PATTERN ...], --dataset PATTERN [PATTERN ...]
A list of dataset names, patterns or specs.
-s PATH, --storage PATH
Path of the storage file to save the results (creates if not exists). Use .csv or .dbm file extension (the latter is binary and more efficient).
--print Print transcriptions to stdout.
--overwrite Overwrite existing results, instead of skipping them.
--suffix SUFFIX Add suffix to each pipeline name when saving to storage. Useful for versioning.
--keep KEEP [KEEP ...]
Keep only the specified fields in the outputs of :class:`~asr_eval.bench.pipelines.TranscriberPipeline`. Can be used if the storage (e.g. .csv files) does not support data types for other fields.
--import IMPORT_ [IMPORT_ ...]
Will import this module by name. Useful to registeradditional components, such as `my_package.asr.models`.
asr_eval.bench.run.run_pipeline(storage, pipeline_name, dataset_specs, print_transcriptions=False, overwrite_existing=False, suffix=None, keep=None)[source]

Runs a pipeline on a list of datasets.

Has also a CLI version, see python -m asr_eval.bench.run --help

See also

More details and examples in the user guide Evaluation and dashboard.

Parameters:
  • storage (BaseStorage) – Storage to save the results, such as ShelfStorage.

  • pipeline_name (str) – Pipeline name to run.

  • dataset_specs (Sequence[str | DatasetSpec]) – List of dataset names, patterns or specs.

  • print_transcriptions (bool) – Print transcriptions at runtime.

  • overwrite_existing (bool) – Overwrite existing results, instead of skipping them.

  • suffix (str | None) – If not None, add the suffix to the pipeline name when saving to storage. Useful for versioning.

  • keep (list[str] | None) – If not empty, keeps only the specified fields in the outputs of TranscriberPipeline. Can be used if the storage (e.g. .csv files) does not support data types for other fields.

asr_eval.bench.evaluator.get_dataset_data(multiple_alignments, count_absorbed_insertions=True, max_consecutive_insertions=None, wer_averaging_mode='concat', exclude_samples_with_digits=False, max_samples_to_render=None)[source]

Takes raw multiple alignments (usually from :meth`~asr_eval.bench.loader.PredictionLoader.get_multiple_alignments`) and 1) renders multiple alignments in a displayable form, 2) averages metrics across all samples.

Acts as a main utility for the ASR dashboard data model.

See also

More details and examples in the user guide Alignments and WER.

Parameters:
  • multiple_alignments (dict[int, MultipleAlignment]) – multiple alignments for several sample ids in some dataset. All the multiple alignments should NOT necessary contain the same set of pipelines.

  • count_absorbed_insertions (bool) – a parameter for error_listing() when calculating metrics.

  • max_consecutive_insertions (int | None) – a parameter for error_listing() when calculating metrics.

  • wer_averaging_mode (Literal['plain', 'concat']) – a parameter for from_samples() when averagint metrics.

  • exclude_samples_with_digits (bool) – if True, when averagint metrics, excludes all samples where a digit is found either in the ground truth transcription, or in some of the pipeline predictions. This acts as a “poor man’s solution” to avoid issues with normalization of numericals.

  • max_samples_to_render (int | None) – if not None, don’t render multiple alignments for all samples except the specified number of samples.

Return type:

DatasetData

Returns:

See the DatasetData docs.

class asr_eval.bench.evaluator.DatasetData(samples, full_samples, dataset_metric)[source]

Output format for the get_dataset_data function.

Parameters:
samples: list[SampleData]

A list of SampleData for all the sample ids for which we have at least one prediction.

full_samples: list[int]

A list of sample ids for which all the pipelines have a prediction. These sample ids are used for averaging metrics, to avoid a problem where different pipeline predicions are averaged across differen samples set, and hence are not directly comparable.

dataset_metric: dict[str, DatasetMetric]

Metrics for each pipeline, averaged across full_samples, if the full_samples list is not empty.

get_all_pipelines()[source]

Get all the pipelines for which we have at least one prediction.

Return type:

list[str]

class asr_eval.bench.evaluator.SampleData(sample_id, baseline_transcription_html, baseline_is_ground_truth, pipelines, baseline_name='')[source]
Parameters:
  • sample_id (int)

  • baseline_transcription_html (str | None)

  • baseline_is_ground_truth (bool)

  • pipelines (dict[str, SamplePipelineData])

  • baseline_name (str)

class asr_eval.bench.evaluator.SamplePipelineData(err_positions, metrics, elapsed_time, transcription_html, alignment)[source]

A field of the DatasetData dataclass, represents the Alignment between ground truth and prediction, as well as other useful information.

Parameters:
err_positions: dict[OuterLoc, ErrorListingElement]

The output of error_listing()

metrics: Metrics

The output of error_listing()

elapsed_time: float

Inference time, may be NaN if not known.

transcription_html: str | None

The aligned transcription in HTML to display.

alignment: Alignment

The alignment between ground truth and prediction

asr_eval.bench.evaluator.compare_pipelines(dataset_data, pipeline_name_1, pipeline_name_2)[source]

An utility for fine-grained comparison of two pipelines on the same dataset.

To be documented.

Return type:

DatasetPipelinePairComparison

Parameters:
  • dataset_data (DatasetData)

  • pipeline_name_1 (str)

  • pipeline_name_2 (str)

class asr_eval.bench.loader.PredictionLoader(storage, cache, pipelines=('*',), dataset_specs=('*',))[source]

Loads and aligns predictions saved with run_pipeline().

See also

More details and examples in the user guide Evaluation and dashboard.

Parameters:
  • storage (BaseStorage) – A storage where the predictions were saved, typically a ShelfStorage.

  • cache (BaseStorage) – A cache to store alignments and other data to cache, may be initially filled or empty. The cache is reusable.

  • pipelines (Sequence[str]) – A list of pipelines names or patterns to load. By default loads all pipelines.

  • dataset_specs (Sequence[str | DatasetSpec]) – A list of dataset names, patterns or specs to load. By default loads all datasets. In simple case just use dataset name, such as dataset_specs=['fleurs']. For more complex case, see the example below.

Dataset specs (specificators with semicolons) allow to specify augmentors, parsers or sample count to load, see Evaluation and dashboard for details. Examples:

Example

PredictionLoader(dataset_specs=["fleurs:n=100!"]) will search for the fleurs dataset in the storage. For every “key” consisting of (pipeline + augmentor + parser) it will try to load exactly 100 first samples of the fleurs dataset. Will drop keys that have not all of these samples. Will drop all other samples. This ensures that for all the “keys” exactly the same sample set is loaded, which allows to compare them on the same data.

grouped_loaded_predictions: dict[GroupKey, dict[int, SamplePrediction]]

A public attribute that exposes a mapping. The keys are combinations of dataset + pipeline + augmentor + parser. The values are mapping from sample id to a prediction that keeps the predicted text and the inference time.

get_multiple_alignments(dataset_name, augmentor_name='none', parser_name='default', pipeline_patterns=('*',))[source]

Compares multiple pipelines on a dataset.

See also

More details and examples in the user guide Evaluation and dashboard.

Given a list of pipeline_patterns, searches for all keys in the grouped_loaded_predictions that match the given pipeline, dataset, augmentor and parser. Since we can only compare pipelines with the same augmentor and parser, this provides all the results we have: pipelines, and their predictons on sample ids. Importantly, different pipelines may have different sets of sample ids: say, we run the first pipeline on 100 samples and the second pipeline only on 10 samples. Let we have pipelines P_1, …, P_N and their sets of sample ids S_1, …, S_N. The current function returns a dict where keys are union(S_1, …, S_N), and for each sample id a MultipleAlignment is provided, with all pipelines that have this id predicted. In our example, the function returns a dict of all sample ids, and for 10 of them the MultipleAlignment has 2 pipelines, while for the remaining 90 ids the MultipleAlignment has only 1 pipeline. Further we can: 1) visualize all the alignments, 2) call get_dataset_data() function that averages metrics.

Return type:

dict[int, MultipleAlignment]

Parameters:
  • dataset_name (str)

  • augmentor_name (str)

  • parser_name (str)

  • pipeline_patterns (Sequence[str])

get_ordered_sample_ids(dataset_name)[source]

For a given registered dataset, returns a sequence of sample ids in the standard (shuffled) version, that can be obtained by get_dataset(dataset_name, shuffle=True).

Return type:

list[int]

Parameters:

dataset_name (str)

get_annotation(dataset_name, parser_name, sample_id)[source]

Get a parsed annotation for the given dataset, parser name and sample id. If not in cache, retrieves the annotation by instantiating this dataset.

Return type:

Transcription

Parameters:
  • dataset_name (str)

  • parser_name (str | Literal['default'])

  • sample_id (int)

class asr_eval.bench.loader.GroupKey(pipeline_name, dataset_name, augmentor, parser)[source]

A key to group predictions in PredictionLoader.

Parameters:
  • pipeline_name (str)

  • dataset_name (str)

  • augmentor (str)

  • parser (str)

class asr_eval.bench.loader.SamplePrediction(text, elapsed_time)[source]

A value to group predictions in PredictionLoader.

Parameters:
  • text (str)

  • elapsed_time (float)

A command line wrapper around run_dashboard() to run web dashboard to visualize the predictions of the ASR models and their metrics.

See more details and examples in the user guide Evaluation and dashboard.

usage: sphinx-build [-h] [-s STORAGE] [-c CACHE] [--assets_dir ASSETS_DIR]
[-p [PIPELINES ...]] [-d [DATASETS ...]]
[-a [ANNOTATIONS ...]] [--export-audio] [--host HOST]
[--port PORT] [--import IMPORT_ [IMPORT_ ...]]

options:
-h, --help show this help message and exit
-s STORAGE, --storage STORAGE
Path of the storage file to load the results from. Use .csv or .dbm file extension (the latter is binary and more efficient).
-c CACHE, --cache CACHE
Path of the ShelfStorage to cache alignments during evaluation (creates if not exists). If not specified, disables caching.
--assets_dir ASSETS_DIR
Directory for web assets (creates if not exists)
-p [PIPELINES ...], --pipelines [PIPELINES ...]
Pipelines to load from the storage (load all if not specified)
-d [DATASETS ...], --datasets [DATASETS ...]
Datasets to load from the storage (load all if not specified)
-a [ANNOTATIONS ...], --annotations [ANNOTATIONS ...]
Custom annotations for datasets not registered in asr_eval in form of path(s) to CSV files with columns names "dataset_name", "sample_id" and "text".
--export-audio Export audio .mp3 to the assets dir while starting the dashboard. If not set, will export .mp3 on demand, but this may slow down the response to the user requests.
--host HOST A dashboard host
--port PORT A dashboard port
--import IMPORT_ [IMPORT_ ...]
Will import this module by name. Useful to registeradditional components, such as `my_package.asr.models`.
asr_eval.bench.dashboard.run.run_dashboard(loader, assets_dir='tmp/dashboard_assets', pre_export_audio=False, host='0.0.0.0', port=8051)[source]

Runs a web dashboard to visualize the predictions of the ASR models and their metrics.

Has also a CLI version, see python -m asr_eval.bench.dashboard.run --help

See also

More details and examples in the user guide Evaluation and dashboard.

Parameters:
  • loader (PredictionLoader) – Prediction loader that loads and aligns predictions.

  • assets_dir (str | Path) – Directory for web assets (creates if not exists).

  • pre_export_audio (bool) – Export audio .mp3 to the assets dir while starting the dashboard. If False, will export .mp3 on demand, but this may slow down the response to the user requests.

  • host (str) – A dashboard host.

  • port (int) – A dashboard port.

class asr_eval.bench.augmentors.AudioAugmentor[source]

Bases: ABC

Abstract audio preprocessor, primarily for evaluation with artificial noises.

To register an augmentor, one need to subclass this class and define the __call__ method that processes an audio sample.

Preferrably should not modify the input dict and return a copy.

TODO example.

asr_eval.bench.augmentors.get_augmentor(name)[source]

Retrieve a registered augmentor. Will instantiate this augmentor and return it on all subsequent calls.

Parameters:

name (str)

asr_eval.bench.parsers.register_parsers(name, true_parser, pred_parser)[source]

Register a pair of parsers: one for the annotation and another for the prediction. To specify a custom parser, you need to subclass the Parser class so that the constructor does not accept arguments, and register it here.

Example

>>> # we will register a new char-wise parser
>>> from asr_eval.align.parsing import PUNCT, Parser
>>> from asr_eval.bench.parsers import register_parsers
>>> from asr_eval.bench.parsers._registry import get_parser
>>> class CharWiseParser(Parser):
...     def __init__(self):
...         super().__init__(tokenizing=rf'[^\s{PUNCT}]')
>>> register_parsers('charwise', CharWiseParser, CharWiseParser)
>>> transcription = (
...     get_parser('charwise', 'true')
...     .parse_single_variant_transcription('hello!')
... )
>>> [token.value for token in transcription.blocks]
['h', 'e', 'l', 'l', 'o']
Parameters:
  • name (str)

  • true_parser (type[Parser])

  • pred_parser (type[Parser])

asr_eval.bench.parsers.get_parser(name, type)[source]

Retrieve a registered parser for annotation (type='true') of prediction (type='pred'). Will instantiate this parser and return it on all subsequent calls (useful for parsers containing neural text normalizers).

Parameters:
  • name (str)

  • type (Literal['true', 'pred'])

class asr_eval.bench.parsers.RuNormParser[source]

Bases: Parser

A parser with Russian text normalization. Includes translit normalization, Silero normalization and filler words removal.

class asr_eval.bench.datasets.AudioSample[source]

Bases: TypedDict

A TypedDict typization for Hugging Face audio sample in standard asr_eval format.

This class is for typing purposes only. A sample in Hugging Face dataset is a plain dict.

In asr_eval standard workflow, sampling rate should be 16_000 and all samples should have unique “sample_id” value. Dataset may include other custom fields as well.

See also

More details and examples in the user guide Evaluation and dashboard.

Example

>>> # instantiation from `get_dataset`:
>>> from asr_eval.bench.datasets import get_dataset
>>> dataset = get_dataset('podlodka')
>>> sample: AudioSample = dataset[0]
>>> # AudioSample inner structure:
>>> audio_data: AudioData = sample['audio']
>>> assert audio_data['sampling_rate'] == 16_000
>>> waveform: FLOATS = audio_data['array']
>>> transcription: str = sample['transcription']
>>> # instantiation from Hugging Face:
>>> from datasets import load_dataset, Audio
>>> from asr_eval.bench.datasets import AudioSample, AudioData
>>> from asr_eval.utils.types import FLOATS
>>> from asr_eval.bench.datasets.mappers import assign_sample_ids
>>> dataset = (
...     load_dataset('PolyAI/minds14', name='en-US', split='train')
...     .cast_column('audio', Audio(sampling_rate=16_000))
...     .map(assign_sample_ids, with_indices=True)
... )
>>> sample: AudioSample = dataset[0]
audio: AudioData

An Audio feature. In asr_eval standard workflow, should be obtained with .cast_column('audio', Audio(decode=True, sampling_rate=16_000)).

transcription: str

A transcription as text, possibly with multivariant annotation, may optionally include punctuation or capitalization.

sample_id: int

A sample ID that should be unique in the dataset. Normally should equal a sample index in the unshuffled and not filtered version.

class asr_eval.bench.datasets.AudioData[source]

Bases: TypedDict

A TypedDict typization for Audio feature in Hugging Face dataset.

See examples in the docs for AudioSample.

array: ndarray[tuple[int, ...], dtype[floating[Any]]]

1-D audio waveform of floats, normalized roughly from -1 to 1, with sampling rate specified in sampling_rate (normally 16000).

sampling_rate: int

A sampling rate for array, i. e. array size per second (normally 16000).

asr_eval.bench.datasets.register_dataset(name, splits=('test',), unlabeled=False)[source]

Register a new dataset in asr_eval. The dataset will be available under the registered name in get_dataset().

Parameters:
  • name (str) – A unique name for the dataset.

  • splits (tuple[str, ...]) – A list of available splits. All the datasets should at least have “test” split available, because asr_eval is for testing purposes. If a dataset has a “train” split only, consider registering it as “test” if you want to test on it. Datasets can have other splits registered with any names, primarily to check for train-test overlap.

  • unlabeled (bool) – If the dataset is unlabeled. Experimental feature.

See many examples in asr_eval.bench.datasets._registered package.

Example

>>> from datasets import Audio, load_dataset, Dataset
>>> from asr_eval.bench.datasets import register_dataset, get_dataset
>>> from asr_eval.bench.datasets.mappers import assign_sample_ids
>>> @register_dataset('podlodka-new', splits=('train', 'test'))
>>> def load_podlodka(split: str = 'test') -> Dataset:
...     return (
...         load_dataset('bond005/podlodka_speech', split=split)
...         .cast_column('audio', Audio(sampling_rate=16_000)) # type: ignore
...         .map(assign_sample_ids, with_indices=True)
...     )
>>> dataset = get_dataset('podlodka-new')
class asr_eval.bench.datasets.DatasetInfo(instantiate_fn, splits, unlabeled, filter=None)[source]

A container for dataset information that is stored if a dataset gets registered.

Parameters:
  • instantiate_fn (Callable[[str], Dataset])

  • splits (tuple[str, ...])

  • unlabeled (bool)

  • filter (Callable[[str], list[int]] | None)

class asr_eval.bench.datasets.DatasetSpec(name_pattern, augmentor='all', parser='default', n_samples='all', n_samples_mode='up_to')[source]

Represents an extended syntax for specifying datasets when running pipelines and dashboard. Allows to specify the required samples count, augmentor and parser.

The dataset spec is understanded and used by two utilities:

  1. python -m asr_eval.bench.run

  2. python -m asr_eval.bench.dashboard.run

A dataset spec has a string representation as a semicolon-separated string. The first value is a name pattern, other values are modifiers in form <key>=<value>.

The “a” modifier specifies the augmentor to use (see AudioAugmentor). Has a special value “all” (a default value) - when running pipelines it is treated as “run without augmentor”, and when running dashboard it is treated as “load the results will all augmentors available in storage”.

The “p” modifier specifies the parser to use (see get_parser()). It is ignored when running pipelines, and when running dashboard the specified parser will be used. By default uses a “default” parser (DEFAULT_PARSER).

The “n” modifier specifies the number of samples. If may be either “all” or integer, where “all” means all the samples in the dataset. The value may also have exclamation mark as suffix (example: “n=20!”) - when running pipelines it is ignored, and when running a dashboard it will drop all the pipeline with not enough samples. For example, if “n=all!”, then all the pipelines with partial results will not be displayed in the dashboard.

See also

See details and examples in the user guide Evaluation and dashboard.

Example

>>> from asr_eval.bench.datasets import DatasetSpec
>>> DatasetSpec.from_string('fleurs-*:p=ru-norm:n=50!')
DatasetSpec(
    name_pattern='fleurs-*',
    augmentor='all',
    parser='ru-norm',
    n_samples=50,
    n_samples_mode='exactly'
)
Parameters:
  • name_pattern (str)

  • augmentor (str | Literal['none', 'all'])

  • parser (str | Literal['default'])

  • n_samples (int | Literal['all'])

  • n_samples_mode (Literal['up_to', 'exactly'])

asr_eval.bench.datasets.get_dataset(name, augmentor_name=None, split='test', shuffle=True, filter=True)[source]

Instantiates a registered dataset.

Parameters:
  • name (str) – A dataset name under which it was registered.

  • augmentor_name (Union[str, None, Literal['none']]) – An augmentor name to apply, None by default (see AudioAugmentor).

  • split (str) – A split name, “test” by default.

  • shuffle (bool) – Whether to perform shuffle(seed=0), True by default. The shuffling is used to ensure that the first N samples form a representative set. The sample IDs help to track the original indices, before shuffling or filtering.

  • filter (bool) – Whether to filter out duplicate and malformed samples, if set_filter was done for this dataset. True by default. This ensures that the datasets in asr_eval by default do not contain duplicate of malformed samples.

Return type:

Dataset

asr_eval.bench.datasets.get_dataset_info(name)[source]

Get info for a registered dataset.

Return type:

DatasetInfo

Parameters:

name (str)

asr_eval.bench.datasets.get_dataset_sample_by_id(dataset_name, split, sample_id, augmentor_name=None)[source]

An utility to simply retrieve the required sample ID for the given dataset. Internally instantiates a dataset if not instantiated yet.

Return type:

AudioSample

Parameters:
  • dataset_name (str)

  • split (str)

  • sample_id (int)

  • augmentor_name (str | None)

asr_eval.bench.datasets.set_filter(dataset_name)[source]

Register a sample filter for the given registered dataset.

The filter should accept split name and return a list of sample IDs to filter out. Is primarily used for deduplication. The get_dataset() function by default returns a filtered dataset if filter was set.

Parameters:

dataset_name (str)

A registry for pipelines.

class asr_eval.bench.pipelines.TranscriberPipeline(warmup=False)[source]

Bases: ABC

An abstract class for pipelines.

Pipeline is any speech recognition algorithm that processes audio into text or timed text. Each pipeline is stored under unqiue name.

See also

More details and examples in the user guides Evaluation and dashboard.

See many examples in asr_eval.bench.pipelines._registered package.

To register a pipeline, you need to subclass as follows:

Example

>>> from datasets import load_dataset, Audio
>>> from asr_eval.bench.pipelines import TranscriberPipeline, get_pipeline
>>> from asr_eval.models.base.longform import LongformCTC
>>> from asr_eval.models.wav2vec2_wrapper import Wav2vec2Wrapper
>>> class _(TranscriberPipeline, register_as='example-wav2vec2'):
...     def init(self):
...         # override init to return a pipeline instance
...         return LongformCTC(
...             Wav2vec2Wrapper('facebook/wav2vec2-base-960h')
...         )
>>> # now you can load the registered pipeline:
>>> pipeline_instance = get_pipeline('example-wav2vec2')()
>>> dataset = (
...     load_dataset('PolyAI/minds14', name='en-US', split='train')
...     .cast_column('audio', Audio(sampling_rate=16_000))
... )
>>> sample = dataset[4]
>>> pipeline_instance.run(sample)
{'text': 'CAN NOW YOU HELP ME SET UP AN JOINT LEAKACCOUNT ',
    'elapsed_time': 0.23598575592041016}
Parameters:

warmup (bool)

asr_eval.bench.pipelines.get_pipeline(name)[source]

Get a registered pipeline class.

Return type:

type[TranscriberPipeline]

Parameters:

name (str)

asr_eval.bench.pipelines.get_pipeline_index(name)[source]

Get an index (in registration order) for a registered pipeline, or -1 if not registered.

Return type:

int

Parameters:

name (str)

A command line utility for streaming evaluation.

Reads from the storage file obtained by run, finds results of the streaming pipelines, analyzes histories of input and output chunks and make various diagrams.

See also

More details and examples in the user guide Evaluation and dashboard.

usage: sphinx-build [-h] [-s STORAGE] [-o OUTPUT] [-a [ANNOTATIONS ...]]
[--import IMPORT_ [IMPORT_ ...]]

options:
-h, --help show this help message and exit
-s STORAGE, --storage STORAGE
Path of the storage file to load the results from. Use .csv or .dbm file extension (the latter is binary and more efficient).
-o OUTPUT, --output OUTPUT
Directory to save the results
-a [ANNOTATIONS ...], --annotations [ANNOTATIONS ...]
Custom annotations for datasets not registered in asr_eval in form of path(s) to CSV files with columns names "dataset_name", "sample_id" and "text".
--import IMPORT_ [IMPORT_ ...]
Will import this module by name. Useful to registeradditional components, such as `my_package.asr.models`.

A command line utility for streaming evaluation.

Scans the directory created by make_plots tool and runs a web interface to visualize the results.

See more details and examples in the user guide Evaluation and dashboard.

usage: sphinx-build [-h] [-d DIR] [--host HOST] [--port PORT]

options:
-h, --help show this help message and exit
-d DIR, --dir DIR Directory with plots, created by `make_plots` tool.
--host HOST A dashboard host
--port PORT A dashboard port