Models list

asr_eval provides more than 50 various built-in model configurations for speech recognition, and you can add custom ones.

For now, in early development, we focused mainly on Russian models, but there are also multilingual ones.

Adding models

You can add new models locally, as described in Framework quickstart. You also can suggest new models by opening an issue or pull request.

To be able to wrap or combine components, we suggest implementing:

Whisper pipelines

Base and fine-tuned checkpoints, Faster-Whisper.

📄 See source code for definitions.

Wav2vec2 pipelines

Various checkpoints from Hugging Face.

📄 See source code for definitions.

NVIDIA NeMo pipelines

Canary, Parakeet, Conformer, FastConformer.

📄 See source code for definitions.

Audio-LLM pipelines

Voxtral, Vikhr Borealis, Flamingo, Gemma3n, Qwen2-audio.

📄 See source code for definitions.

SpeechBrain pipelines

Streaming GigaSpeech conformer.

📄 See source code for definitions.

GigaAM pipelines

GigaAM v2, v3 versions, CTC and RNNT, with with end-to-end punctuation.

📄 See source code for definitions.

Vosk pipelines

Vosk 0.42 streaming and 0.54 non-streaming.

📄 See source code for definitions.

T-One pipelines

T-One steaming model.

📄 See source code for definitions.

API pipelines

Yandex SpeechKit, Salute API.

📄 See source code for definitions.

Pisets pipelines

A wav2vec + Whisper pipeline.

📄 See source code for definitions.

Composite and experimental pipelines

See source code for definitions.