Weighted Automata for Speech and Text Processing in NLP
DOI:
https://doi.org/10.55927/modern.v5i2.41Keywords:
Natural Language Processing (NLP), Recurrent Neural Networks (RNN), Weighted Automata, Automatic Speech Recognition (ASR)Abstract
Weighted automata offer an interpretable structure to illustrate sequential behavior, and thus can be useful in speech and text processing as part of natural language processing. The current research study analyses the concept of integration of weighted finite state automata (WFA) and weighted finite state transducers (WFST) into NLP pipelines with specific focus on interpretability and post processing refinement of automatic speech recognition (ASR) results. One of them is the translation of the latent decision patterns of recurrent neural networks (RNNs) to transparent weighted automata. Through examining state paths, regions of activation, recurring transitions, the inertia by sustaining recurrent transitions, the proposed method itself converts the inner logic of RNNs into understandable set of automata thus explaining how the model handles sequences. Such undertaking enriches the interpretability, and allows the re-use of the rules inferred thereof in lower symbolic integrations. To ensure the extracted automata are accurate models of the behavior of the RNN, the study will use decision-guided extraction methods, including selective sampling of informative inputs, clustering algorithms to match automaton states with neuromodelling dynamics, and forecasting transition weights to indicate levels of confidence of the RNN
References
Black, A., & Lenzo, K. (2001). Building Synthetic Voices. CSLU Technical Report.
Chen, S., & Wang, X. (2019). Homophone Resolution in ASR Outputs via WFST–Based Rule Systems. Proceedings of Interspeech.
Graves, A., & Schmidhuber, J. (2005). Framewise Phoneme Classification with Bidirectional LSTM and other Neural Network Architectures. Neural Networks, 18(5–6), 602–610.
Gupta, R., et al. (2023). Dynamic Tagging for Adaptive Text Normalization. Transactions of the Association for Computational Linguistics, 11, 103–118.
Hernández, P., & Silva, M. (2024). Multilingual ASR Normalization: A Case Study on Spanish and Portuguese. Proceedings of Interspeech.
Li, Y., & Zhao, L. (2022). Hybrid Neuro–Symbolic Approaches for ASR Post‑Processing. Journal of Artificial Intelligence Research, 75, 345–368.
Mohri, M., Pereira, F., & Riley, M. (2002). Weighted Finite-State Transducers in Speech Recognition. Computer Speech & Language, 16(1), 69–88.
Müller, K., & Bernstein, J. (2024). Lightweight Automata for Edge-Device ASR Correction.
Proceedings of the Conference on Embedded Machine Learning.
Rogahn, C., et al. (2021). On-Device Streaming Text Normalization for ASR using WFSTs. IEEE Workshop on Spoken Language Technology.
Sproat, R., et al. (2016). Multilingual Text Normalization for ASR: A WFST-Based Approach. Proceedings of Interspeech.
Tyers, F., & Lichtenstein, M. (2020). Toward Transparent ASR: Extracting Interpretable Automata from Transformer-Based Speech Models. Proceedings of ICASSP.
Weiss, J., Suzuki, T., & Neubig, G. (2019). Interpreting Neural Sequence Models using Finite-State Proxies. Proceedings of ACL.
Weiss, R. J., et al. (2017). Sequence-to-Sequence Models for Speech Recognition — A Comparison with WFST Decoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12), 2187–2196.
Zeyer, A., Mauser, A., Ney, H., & Zeiler, S. (2018). Decision‑guided Automata Extraction from RNNs for Sequence Modeling. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
Zhang, Q., & Li, H. (2025). Evaluation Metrics for Model Interpretability: Proposals and Case Studies. Journal of Machine Learning Interpretability, 2(1), 45–62


















