asr_eval.normalizing¶
Utils for text normalization.
- class asr_eval.normalizing.silero.RuSileroNormalizer(model_path=CACHE_DIR / 'silero_normalizer/jit_s2s.pt', device='auto')[source]¶
Converts numbers into words and makes other various normalization steps for evaluating WER on Russian text.
A rare exception is handled that would create an inifinite loop, comparing with the original version. The normalizer is based on a neural network, so it is recommended to use caching.
TODO release a model required for RuSileroNormalizer.
Example
>>> from asr_eval.normalizing.silero import RuSileroNormalizer >>> from asr_eval.utils.cacheable import DiskCacheable >>> normalizer = RuSileroNormalizer() >>> normalizer = DiskCacheable(normalizer, cache_path='sliero_normalizer_cache.db') >>> print(normalizer('С 12.01.1943 г. площадь сельсовета — 1785,5 га.')) С двенадцатого января тысяча девятьсот сорок третьего года площадь сельсовета — тысяча семьсот восемьдесят пять целых и пять десятых гектара
The code is adapted from https://github.com/snakers4/russian_stt_text_normalization
The model taken from (TODO upload the model to HF) https://t.me/silero_speech/6056
- Parameters:
model_path (str | Path)
device (str | torch.device | Literal['auto'])