asr_eval.bench¶
Tools for reproducible ASR evaluation and dashboard visualizations.
The package includes a registry for several types of components:
Datasets for ASR evaluation, in Hugging Face Dataset format:
Storage:
datasets_registryRegistering:
@register_datasetRetrieving:
get_dataset()
Pipelines for offline and streaming ASR in a unified format:
Storage:
pipelines_registryRegistering:
TranscriberPipelineRetrieving:
get_pipeline()
Parsers to define text tokenization and normalization scheme:
Storage:
parsers_registryRegistering:
register_parsers()Retrieving:
get_parser()
Augmentors to define audio processing methods:
Storage:
augmentors_registryRegistering:
AudioAugmentorRetrieving:
get_augmentor()
The package offers several command line tools:
python -m asr_eval.bench.checkto run a quick check that a pipeline works.python -m asr_eval.bench.runto run pipelines on datasets, supports incremental runs, allows to specify augmentors and sample counts, and store predictions, as well as input and output chunk history for streaming pipelines.python -m asr_eval.bench.dashboard.runto load the results and run an interactive dashboard that displays metrics and results on individual samples, and enables fune-grained comparison.python -m asr_eval.bench.streaming.make_plotsto analyze input and output chunk history, make various diagrams for streaming pipelines and save the into a folder.python -m asr_eval.bench.streaming.show_plotsto show the obtained streaming diagrams vis web interface.
Internally these tools use an abstract class
BaseStorage to store the results. It
has a concrete
implementation ShelfStorage (but it is
possible to quickly adapt to a new storage type such as SQLite or
wandb). The PredictionLoader loads the
saved predictions and performs string alignment, saving the alignments
in cache. The helper functions
get_dataset_data() and
compare_pipelines() calculate metrics
with bootstrap confidence and run a fine-grained comparison.
A command line utility to check that a pipeline works.
usage: sphinx-build [-h] [--audio PATH] [--trim N] pipeline
positional arguments:
pipeline A pipeline name registered in asr_eval.
options:
-h, --help show this help message and exit
--audio PATH Audio file path or pre-defined audios downloaded from Hugging Face: EN (default) - 10 sec English audio, EN_LONG - 47 sec English audio, RU - 20 sec Russian audio, RU_LONG - 77 sec Russian audio.
--trim N If float, take the first N seconds of the audio.
A command line wrapper around
run_pipeline() to run pipeline(s) on
dataset(s) and save the results.
Note that run_pipeline() as Python function
accepts a single pipeline name, while the command line tool allows to
specify multiple names or patterns.
See more details and examples in the user guide Evaluation and dashboard.
usage: sphinx-build [-h] -p PATTERN [PATTERN ...] -d PATTERN [PATTERN ...]
[-s PATH] [--print] [--overwrite] [--suffix SUFFIX]
[--keep KEEP [KEEP ...]] [--import IMPORT_ [IMPORT_ ...]]
options:
-h, --help show this help message and exit
-p PATTERN [PATTERN ...], --pipeline PATTERN [PATTERN ...]
A list of pipeline names or patterns. Searches for all piplines in the registry that match patterns, then call run_pipeline() for each of the found pipelines. For example, "--pipeline gigaam-* whisper-tiny" will run all pipelines starting from "gigaam-", and a "whisper-tiny" pipeline.
-d PATTERN [PATTERN ...], --dataset PATTERN [PATTERN ...]
A list of dataset names, patterns or specs.
-s PATH, --storage PATH
Path of the storage file to save the results (creates if not exists). Use .csv or .dbm file extension (the latter is binary and more efficient).
--print Print transcriptions to stdout.
--overwrite Overwrite existing results, instead of skipping them.
--suffix SUFFIX Add suffix to each pipeline name when saving to storage. Useful for versioning.
--keep KEEP [KEEP ...]
Keep only the specified fields in the outputs of :class:`~asr_eval.bench.pipelines.TranscriberPipeline`. Can be used if the storage (e.g. .csv files) does not support data types for other fields.
--import IMPORT_ [IMPORT_ ...]
Will import this module by name. Useful to registeradditional components, such as `my_package.asr.models`.
- asr_eval.bench.run.run_pipeline(storage, pipeline_name, dataset_specs, print_transcriptions=False, overwrite_existing=False, suffix=None, keep=None)[source]¶
Runs a pipeline on a list of datasets.
Has also a CLI version, see
python -m asr_eval.bench.run --helpSee also
More details and examples in the user guide Evaluation and dashboard.
- Parameters:
storage (
BaseStorage) – Storage to save the results, such asShelfStorage.pipeline_name (
str) – Pipeline name to run.dataset_specs (
Sequence[str|DatasetSpec]) – List of dataset names, patterns or specs.print_transcriptions (
bool) – Print transcriptions at runtime.overwrite_existing (
bool) – Overwrite existing results, instead of skipping them.suffix (
str|None) – If not None, add the suffix to the pipeline name when saving to storage. Useful for versioning.keep (
list[str] |None) – If not empty, keeps only the specified fields in the outputs ofTranscriberPipeline. Can be used if the storage (e.g. .csv files) does not support data types for other fields.
- asr_eval.bench.evaluator.get_dataset_data(multiple_alignments, count_absorbed_insertions=True, max_consecutive_insertions=None, wer_averaging_mode='concat', exclude_samples_with_digits=False, max_samples_to_render=None)[source]¶
Takes raw multiple alignments (usually from :meth`~asr_eval.bench.loader.PredictionLoader.get_multiple_alignments`) and 1) renders multiple alignments in a displayable form, 2) averages metrics across all samples.
Acts as a main utility for the ASR dashboard data model.
See also
More details and examples in the user guide Alignments and WER.
- Parameters:
multiple_alignments (
dict[int,MultipleAlignment]) – multiple alignments for several sample ids in some dataset. All the multiple alignments should NOT necessary contain the same set of pipelines.count_absorbed_insertions (
bool) – a parameter forerror_listing()when calculating metrics.max_consecutive_insertions (
int|None) – a parameter forerror_listing()when calculating metrics.wer_averaging_mode (
Literal['plain','concat']) – a parameter forfrom_samples()when averagint metrics.exclude_samples_with_digits (
bool) – if True, when averagint metrics, excludes all samples where a digit is found either in the ground truth transcription, or in some of the pipeline predictions. This acts as a “poor man’s solution” to avoid issues with normalization of numericals.max_samples_to_render (
int|None) – if not None, don’t render multiple alignments for all samples except the specified number of samples.
- Return type:
- Returns:
See the
DatasetDatadocs.
- class asr_eval.bench.evaluator.DatasetData(samples, full_samples, dataset_metric)[source]¶
Output format for the
get_dataset_datafunction.- Parameters:
samples (list[SampleData])
full_samples (list[int])
dataset_metric (dict[str, DatasetMetric])
- samples: list[SampleData]¶
A list of
SampleDatafor all the sample ids for which we have at least one prediction.
- full_samples: list[int]¶
A list of sample ids for which all the pipelines have a prediction. These sample ids are used for averaging metrics, to avoid a problem where different pipeline predicions are averaged across differen samples set, and hence are not directly comparable.
- dataset_metric: dict[str, DatasetMetric]¶
Metrics for each pipeline, averaged across
full_samples, if thefull_sampleslist is not empty.
- class asr_eval.bench.evaluator.SampleData(sample_id, baseline_transcription_html, baseline_is_ground_truth, pipelines, baseline_name='')[source]¶
- Parameters:
sample_id (int)
baseline_transcription_html (str | None)
baseline_is_ground_truth (bool)
pipelines (dict[str, SamplePipelineData])
baseline_name (str)
- class asr_eval.bench.evaluator.SamplePipelineData(err_positions, metrics, elapsed_time, transcription_html, alignment)[source]¶
A field of the
DatasetDatadataclass, represents theAlignmentbetween ground truth and prediction, as well as other useful information.- Parameters:
err_positions (dict[OuterLoc, ErrorListingElement])
metrics (Metrics)
elapsed_time (float)
transcription_html (str | None)
alignment (Alignment)
- err_positions: dict[OuterLoc, ErrorListingElement]¶
The output of
error_listing()
- metrics: Metrics¶
The output of
error_listing()
- elapsed_time: float¶
Inference time, may be NaN if not known.
- transcription_html: str | None¶
The aligned transcription in HTML to display.
- asr_eval.bench.evaluator.compare_pipelines(dataset_data, pipeline_name_1, pipeline_name_2)[source]¶
An utility for fine-grained comparison of two pipelines on the same dataset.
To be documented.
- Return type:
DatasetPipelinePairComparison- Parameters:
dataset_data (DatasetData)
pipeline_name_1 (str)
pipeline_name_2 (str)
- class asr_eval.bench.loader.PredictionLoader(storage, cache, pipelines=('*',), dataset_specs=('*',))[source]¶
Loads and aligns predictions saved with
run_pipeline().See also
More details and examples in the user guide Evaluation and dashboard.
- Parameters:
storage (
BaseStorage) – A storage where the predictions were saved, typically aShelfStorage.cache (
BaseStorage) – A cache to store alignments and other data to cache, may be initially filled or empty. The cache is reusable.pipelines (
Sequence[str]) – A list of pipelines names or patterns to load. By default loads all pipelines.dataset_specs (
Sequence[str|DatasetSpec]) – A list of dataset names, patterns or specs to load. By default loads all datasets. In simple case just use dataset name, such asdataset_specs=['fleurs']. For more complex case, see the example below.
Dataset specs (specificators with semicolons) allow to specify augmentors, parsers or sample count to load, see Evaluation and dashboard for details. Examples:
Example
PredictionLoader(dataset_specs=["fleurs:n=100!"])will search for the fleurs dataset in the storage. For every “key” consisting of (pipeline + augmentor + parser) it will try to load exactly 100 first samples of the fleurs dataset. Will drop keys that have not all of these samples. Will drop all other samples. This ensures that for all the “keys” exactly the same sample set is loaded, which allows to compare them on the same data.- grouped_loaded_predictions: dict[GroupKey, dict[int, SamplePrediction]]¶
A public attribute that exposes a mapping. The keys are combinations of dataset + pipeline + augmentor + parser. The values are mapping from sample id to a prediction that keeps the predicted text and the inference time.
- get_multiple_alignments(dataset_name, augmentor_name='none', parser_name='default', pipeline_patterns=('*',))[source]¶
Compares multiple pipelines on a dataset.
See also
More details and examples in the user guide Evaluation and dashboard.
Given a list of
pipeline_patterns, searches for all keys in thegrouped_loaded_predictionsthat match the given pipeline, dataset, augmentor and parser. Since we can only compare pipelines with the same augmentor and parser, this provides all the results we have: pipelines, and their predictons on sample ids. Importantly, different pipelines may have different sets of sample ids: say, we run the first pipeline on 100 samples and the second pipeline only on 10 samples. Let we have pipelines P_1, …, P_N and their sets of sample ids S_1, …, S_N. The current function returns a dict where keys are union(S_1, …, S_N), and for each sample id aMultipleAlignmentis provided, with all pipelines that have this id predicted. In our example, the function returns a dict of all sample ids, and for 10 of them theMultipleAlignmenthas 2 pipelines, while for the remaining 90 ids theMultipleAlignmenthas only 1 pipeline. Further we can: 1) visualize all the alignments, 2) callget_dataset_data()function that averages metrics.- Return type:
dict[int,MultipleAlignment]- Parameters:
dataset_name (str)
augmentor_name (str)
parser_name (str)
pipeline_patterns (Sequence[str])
- get_ordered_sample_ids(dataset_name)[source]¶
For a given registered dataset, returns a sequence of sample ids in the standard (shuffled) version, that can be obtained by
get_dataset(dataset_name, shuffle=True).- Return type:
list[int]- Parameters:
dataset_name (str)
- get_annotation(dataset_name, parser_name, sample_id)[source]¶
Get a parsed annotation for the given dataset, parser name and sample id. If not in cache, retrieves the annotation by instantiating this dataset.
- Return type:
- Parameters:
dataset_name (str)
parser_name (str | Literal['default'])
sample_id (int)
- class asr_eval.bench.loader.GroupKey(pipeline_name, dataset_name, augmentor, parser)[source]¶
A key to group predictions in
PredictionLoader.- Parameters:
pipeline_name (str)
dataset_name (str)
augmentor (str)
parser (str)
- class asr_eval.bench.loader.SamplePrediction(text, elapsed_time)[source]¶
A value to group predictions in
PredictionLoader.- Parameters:
text (str)
elapsed_time (float)
A command line wrapper around
run_dashboard() to run web dashboard
to visualize the predictions of the ASR models and their metrics.
See more details and examples in the user guide Evaluation and dashboard.
usage: sphinx-build [-h] [-s STORAGE] [-c CACHE] [--assets_dir ASSETS_DIR]
[-p [PIPELINES ...]] [-d [DATASETS ...]]
[-a [ANNOTATIONS ...]] [--export-audio] [--host HOST]
[--port PORT] [--import IMPORT_ [IMPORT_ ...]]
options:
-h, --help show this help message and exit
-s STORAGE, --storage STORAGE
Path of the storage file to load the results from. Use .csv or .dbm file extension (the latter is binary and more efficient).
-c CACHE, --cache CACHE
Path of the ShelfStorage to cache alignments during evaluation (creates if not exists). If not specified, disables caching.
--assets_dir ASSETS_DIR
Directory for web assets (creates if not exists)
-p [PIPELINES ...], --pipelines [PIPELINES ...]
Pipelines to load from the storage (load all if not specified)
-d [DATASETS ...], --datasets [DATASETS ...]
Datasets to load from the storage (load all if not specified)
-a [ANNOTATIONS ...], --annotations [ANNOTATIONS ...]
Custom annotations for datasets not registered in asr_eval in form of path(s) to CSV files with columns names "dataset_name", "sample_id" and "text".
--export-audio Export audio .mp3 to the assets dir while starting the dashboard. If not set, will export .mp3 on demand, but this may slow down the response to the user requests.
--host HOST A dashboard host
--port PORT A dashboard port
--import IMPORT_ [IMPORT_ ...]
Will import this module by name. Useful to registeradditional components, such as `my_package.asr.models`.
- asr_eval.bench.dashboard.run.run_dashboard(loader, assets_dir='tmp/dashboard_assets', pre_export_audio=False, host='0.0.0.0', port=8051)[source]¶
Runs a web dashboard to visualize the predictions of the ASR models and their metrics.
Has also a CLI version, see
python -m asr_eval.bench.dashboard.run --helpSee also
More details and examples in the user guide Evaluation and dashboard.
- Parameters:
loader (
PredictionLoader) – Prediction loader that loads and aligns predictions.assets_dir (
str|Path) – Directory for web assets (creates if not exists).pre_export_audio (
bool) – Export audio .mp3 to the assets dir while starting the dashboard. If False, will export .mp3 on demand, but this may slow down the response to the user requests.host (
str) – A dashboard host.port (
int) – A dashboard port.
- class asr_eval.bench.augmentors.AudioAugmentor[source]¶
Bases:
ABCAbstract audio preprocessor, primarily for evaluation with artificial noises.
To register an augmentor, one need to subclass this class and define the
__call__method that processes an audio sample.Preferrably should not modify the input dict and return a copy.
TODO example.
- asr_eval.bench.augmentors.get_augmentor(name)[source]¶
Retrieve a registered augmentor. Will instantiate this augmentor and return it on all subsequent calls.
- Parameters:
name (str)
- asr_eval.bench.parsers.register_parsers(name, true_parser, pred_parser)[source]¶
Register a pair of parsers: one for the annotation and another for the prediction. To specify a custom parser, you need to subclass the
Parserclass so that the constructor does not accept arguments, and register it here.Example
>>> # we will register a new char-wise parser >>> from asr_eval.align.parsing import PUNCT, Parser >>> from asr_eval.bench.parsers import register_parsers >>> from asr_eval.bench.parsers._registry import get_parser >>> class CharWiseParser(Parser): ... def __init__(self): ... super().__init__(tokenizing=rf'[^\s{PUNCT}]') >>> register_parsers('charwise', CharWiseParser, CharWiseParser) >>> transcription = ( ... get_parser('charwise', 'true') ... .parse_single_variant_transcription('hello!') ... ) >>> [token.value for token in transcription.blocks] ['h', 'e', 'l', 'l', 'o']
- asr_eval.bench.parsers.get_parser(name, type)[source]¶
Retrieve a registered parser for annotation (
type='true') of prediction (type='pred'). Will instantiate this parser and return it on all subsequent calls (useful for parsers containing neural text normalizers).- Parameters:
name (str)
type (Literal['true', 'pred'])
- class asr_eval.bench.parsers.RuNormParser[source]¶
Bases:
ParserA parser with Russian text normalization. Includes translit normalization, Silero normalization and filler words removal.
- class asr_eval.bench.datasets.AudioSample[source]¶
Bases:
TypedDictA TypedDict typization for Hugging Face audio sample in standard asr_eval format.
This class is for typing purposes only. A sample in Hugging Face dataset is a plain dict.
In asr_eval standard workflow, sampling rate should be 16_000 and all samples should have unique “sample_id” value. Dataset may include other custom fields as well.
See also
More details and examples in the user guide Evaluation and dashboard.
Example
>>> # instantiation from `get_dataset`: >>> from asr_eval.bench.datasets import get_dataset >>> dataset = get_dataset('podlodka') >>> sample: AudioSample = dataset[0]
>>> # AudioSample inner structure: >>> audio_data: AudioData = sample['audio'] >>> assert audio_data['sampling_rate'] == 16_000 >>> waveform: FLOATS = audio_data['array'] >>> transcription: str = sample['transcription']
>>> # instantiation from Hugging Face: >>> from datasets import load_dataset, Audio >>> from asr_eval.bench.datasets import AudioSample, AudioData >>> from asr_eval.utils.types import FLOATS >>> from asr_eval.bench.datasets.mappers import assign_sample_ids >>> dataset = ( ... load_dataset('PolyAI/minds14', name='en-US', split='train') ... .cast_column('audio', Audio(sampling_rate=16_000)) ... .map(assign_sample_ids, with_indices=True) ... ) >>> sample: AudioSample = dataset[0]
- audio: AudioData¶
An Audio feature. In asr_eval standard workflow, should be obtained with
.cast_column('audio', Audio(decode=True, sampling_rate=16_000)).
- transcription: str¶
A transcription as text, possibly with multivariant annotation, may optionally include punctuation or capitalization.
- sample_id: int¶
A sample ID that should be unique in the dataset. Normally should equal a sample index in the unshuffled and not filtered version.
- class asr_eval.bench.datasets.AudioData[source]¶
Bases:
TypedDictA TypedDict typization for Audio feature in Hugging Face dataset.
See examples in the docs for
AudioSample.- array: ndarray[tuple[int, ...], dtype[floating[Any]]]¶
1-D audio waveform of floats, normalized roughly from -1 to 1, with sampling rate specified in
sampling_rate(normally 16000).
- asr_eval.bench.datasets.register_dataset(name, splits=('test',), unlabeled=False)[source]¶
Register a new dataset in asr_eval. The dataset will be available under the registered name in
get_dataset().- Parameters:
name (
str) – A unique name for the dataset.splits (
tuple[str,...]) – A list of available splits. All the datasets should at least have “test” split available, because asr_eval is for testing purposes. If a dataset has a “train” split only, consider registering it as “test” if you want to test on it. Datasets can have other splits registered with any names, primarily to check for train-test overlap.unlabeled (
bool) – If the dataset is unlabeled. Experimental feature.
See many examples in asr_eval.bench.datasets._registered package.
Example
>>> from datasets import Audio, load_dataset, Dataset >>> from asr_eval.bench.datasets import register_dataset, get_dataset >>> from asr_eval.bench.datasets.mappers import assign_sample_ids >>> @register_dataset('podlodka-new', splits=('train', 'test')) >>> def load_podlodka(split: str = 'test') -> Dataset: ... return ( ... load_dataset('bond005/podlodka_speech', split=split) ... .cast_column('audio', Audio(sampling_rate=16_000)) # type: ignore ... .map(assign_sample_ids, with_indices=True) ... ) >>> dataset = get_dataset('podlodka-new')
- class asr_eval.bench.datasets.DatasetInfo(instantiate_fn, splits, unlabeled, filter=None)[source]¶
A container for dataset information that is stored if a dataset gets registered.
- Parameters:
instantiate_fn (Callable[[str], Dataset])
splits (tuple[str, ...])
unlabeled (bool)
filter (Callable[[str], list[int]] | None)
- class asr_eval.bench.datasets.DatasetSpec(name_pattern, augmentor='all', parser='default', n_samples='all', n_samples_mode='up_to')[source]¶
Represents an extended syntax for specifying datasets when running pipelines and dashboard. Allows to specify the required samples count, augmentor and parser.
The dataset spec is understanded and used by two utilities:
A dataset spec has a string representation as a semicolon-separated string. The first value is a name pattern, other values are modifiers in form
<key>=<value>.The “a” modifier specifies the augmentor to use (see
AudioAugmentor). Has a special value “all” (a default value) - when running pipelines it is treated as “run without augmentor”, and when running dashboard it is treated as “load the results will all augmentors available in storage”.The “p” modifier specifies the parser to use (see
get_parser()). It is ignored when running pipelines, and when running dashboard the specified parser will be used. By default uses a “default” parser (DEFAULT_PARSER).The “n” modifier specifies the number of samples. If may be either “all” or integer, where “all” means all the samples in the dataset. The value may also have exclamation mark as suffix (example: “n=20!”) - when running pipelines it is ignored, and when running a dashboard it will drop all the pipeline with not enough samples. For example, if “n=all!”, then all the pipelines with partial results will not be displayed in the dashboard.
See also
See details and examples in the user guide Evaluation and dashboard.
Example
>>> from asr_eval.bench.datasets import DatasetSpec >>> DatasetSpec.from_string('fleurs-*:p=ru-norm:n=50!') DatasetSpec( name_pattern='fleurs-*', augmentor='all', parser='ru-norm', n_samples=50, n_samples_mode='exactly' )
- Parameters:
name_pattern (str)
augmentor (str | Literal['none', 'all'])
parser (str | Literal['default'])
n_samples (int | Literal['all'])
n_samples_mode (Literal['up_to', 'exactly'])
- asr_eval.bench.datasets.get_dataset(name, augmentor_name=None, split='test', shuffle=True, filter=True)[source]¶
Instantiates a registered dataset.
- Parameters:
name (
str) – A dataset name under which it was registered.augmentor_name (
Union[str,None,Literal['none']]) – An augmentor name to apply, None by default (seeAudioAugmentor).split (
str) – A split name, “test” by default.shuffle (
bool) – Whether to performshuffle(seed=0), True by default. The shuffling is used to ensure that the first N samples form a representative set. The sample IDs help to track the original indices, before shuffling or filtering.filter (
bool) – Whether to filter out duplicate and malformed samples, ifset_filterwas done for this dataset. True by default. This ensures that the datasets in asr_eval by default do not contain duplicate of malformed samples.
- Return type:
Dataset
- asr_eval.bench.datasets.get_dataset_info(name)[source]¶
Get info for a registered dataset.
- Return type:
- Parameters:
name (str)
- asr_eval.bench.datasets.get_dataset_sample_by_id(dataset_name, split, sample_id, augmentor_name=None)[source]¶
An utility to simply retrieve the required sample ID for the given dataset. Internally instantiates a dataset if not instantiated yet.
- Return type:
- Parameters:
dataset_name (str)
split (str)
sample_id (int)
augmentor_name (str | None)
- asr_eval.bench.datasets.set_filter(dataset_name)[source]¶
Register a sample filter for the given registered dataset.
The filter should accept split name and return a list of sample IDs to filter out. Is primarily used for deduplication. The
get_dataset()function by default returns a filtered dataset if filter was set.- Parameters:
dataset_name (str)
A registry for pipelines.
- class asr_eval.bench.pipelines.TranscriberPipeline(warmup=False)[source]¶
Bases:
ABCAn abstract class for pipelines.
Pipeline is any speech recognition algorithm that processes audio into text or timed text. Each pipeline is stored under unqiue name.
See also
More details and examples in the user guides Evaluation and dashboard.
See many examples in asr_eval.bench.pipelines._registered package.
To register a pipeline, you need to subclass as follows:
Example
>>> from datasets import load_dataset, Audio >>> from asr_eval.bench.pipelines import TranscriberPipeline, get_pipeline >>> from asr_eval.models.base.longform import LongformCTC >>> from asr_eval.models.wav2vec2_wrapper import Wav2vec2Wrapper >>> class _(TranscriberPipeline, register_as='example-wav2vec2'): ... def init(self): ... # override init to return a pipeline instance ... return LongformCTC( ... Wav2vec2Wrapper('facebook/wav2vec2-base-960h') ... )
>>> # now you can load the registered pipeline: >>> pipeline_instance = get_pipeline('example-wav2vec2')() >>> dataset = ( ... load_dataset('PolyAI/minds14', name='en-US', split='train') ... .cast_column('audio', Audio(sampling_rate=16_000)) ... ) >>> sample = dataset[4] >>> pipeline_instance.run(sample) {'text': 'CAN NOW YOU HELP ME SET UP AN JOINT LEAKACCOUNT ', 'elapsed_time': 0.23598575592041016}
- Parameters:
warmup (bool)
- asr_eval.bench.pipelines.get_pipeline(name)[source]¶
Get a registered pipeline class.
- Return type:
type[TranscriberPipeline]- Parameters:
name (str)
- asr_eval.bench.pipelines.get_pipeline_index(name)[source]¶
Get an index (in registration order) for a registered pipeline, or -1 if not registered.
- Return type:
int- Parameters:
name (str)
A command line utility for streaming evaluation.
Reads from the storage file obtained by run,
finds results of the streaming pipelines, analyzes histories of input
and output chunks and make various diagrams.
See also
More details and examples in the user guide Evaluation and dashboard.
usage: sphinx-build [-h] [-s STORAGE] [-o OUTPUT] [-a [ANNOTATIONS ...]]
[--import IMPORT_ [IMPORT_ ...]]
options:
-h, --help show this help message and exit
-s STORAGE, --storage STORAGE
Path of the storage file to load the results from. Use .csv or .dbm file extension (the latter is binary and more efficient).
-o OUTPUT, --output OUTPUT
Directory to save the results
-a [ANNOTATIONS ...], --annotations [ANNOTATIONS ...]
Custom annotations for datasets not registered in asr_eval in form of path(s) to CSV files with columns names "dataset_name", "sample_id" and "text".
--import IMPORT_ [IMPORT_ ...]
Will import this module by name. Useful to registeradditional components, such as `my_package.asr.models`.
A command line utility for streaming evaluation.
Scans the directory created by make_plots
tool and runs a web interface to visualize the results.
See more details and examples in the user guide Evaluation and dashboard.
usage: sphinx-build [-h] [-d DIR] [--host HOST] [--port PORT]
options:
-h, --help show this help message and exit
-d DIR, --dir DIR Directory with plots, created by `make_plots` tool.
--host HOST A dashboard host
--port PORT A dashboard port