Models list¶

asr_eval provides more than 50 various built-in model configurations for speech recognition, and you can add custom ones.

For now, in early development, we focused mainly on Russian models, but there are also multilingual ones.

Adding models

You can add new models locally, as described in Framework quickstart. You also can suggest new models by opening an issue or pull request.

To be able to wrap or combine components, we suggest implementing:

CTC interface for CTC models.
TimedTranscriber interface for ASR models that return text with timings.
Return TimedDiarizationText for models with diarization.
Transcriber interface for ASR models that return text without timings.
StreamingASR interface for streaming ASR modeels.
Segmenter interface for voice activity detection models.

Whisper pipelines¶

Base and fine-tuned checkpoints, Faster-Whisper.

📄 See source code for definitions.

Wav2vec2 pipelines¶

Various checkpoints from Hugging Face.

📄 See source code for definitions.

NVIDIA NeMo pipelines¶

Canary, Parakeet, Conformer, FastConformer.

📄 See source code for definitions.

Audio-LLM pipelines¶

Voxtral, Vikhr Borealis, Flamingo, Gemma3n, Qwen2-audio.

📄 See source code for definitions.

SpeechBrain pipelines¶

Streaming GigaSpeech conformer.

📄 See source code for definitions.

GigaAM pipelines¶

GigaAM v2, v3 versions, CTC and RNNT, with with end-to-end punctuation.

📄 See source code for definitions.

Vosk pipelines¶

Vosk 0.42 streaming and 0.54 non-streaming.

📄 See source code for definitions.

T-One pipelines¶

T-One steaming model.

📄 See source code for definitions.

API pipelines¶

Yandex SpeechKit, Salute API.

📄 See source code for definitions.

Pisets pipelines¶

A wav2vec + Whisper pipeline.

📄 See source code for definitions.

Composite and experimental pipelines¶

See source code for definitions.