About Communications       Author's Guide       Reviewers       Editorial Members       Archive
Archive
Volume 8
2021
Volume 7
2020
Volume 6
2019
Volume 5
2018
Volume 4
2017
Volume 3
2016
Volume 2
2015
Volume 1
2014
AASCIT Communications | Volume 3, Issue 1 | Jan. 27, 2016 online | Page:32-42
Similarities and Differences of Speech Recognition Accuracy and Speech Quality Measures Behavior
Abstract
Noise and late reverberation reduction algorithms were compared by means of objective speech quality and speech recognition accuracy (Acc%) measures. Negative effects of excessive noise reduction for automatic speech recognition (ASR) had been shown. It was found possibility of improvement the noise suppression algorithms quality, in terms of Acc%, by proper choice of a priori signal-to-noise assessment technique. It was shown that decision-directed technique is the best for speech quality, when “rough” assessment technique is the best for ASR, and the maximum likelihood technique occupies an intermediate position. When studying late reverberation suppression algorithms, it was found existence of optimal, in terms of Acc%, parameters values of the algorithms. It was shown also that these parameters values are different for ASR and for speech enhancement. Thus, late reverberation suppression algorithms behavior is similar to one of noise suppression algorithms. Study of speech quality measures had showed that only few of them were in good agreement with Acc%. But existence of such measures is very important, because it enables use them instead of Acc% and, thus, enables essentially simplify assessment of noise and reverberation robustness in ASR.
Authors
[1]
Arkadiy Prodeus, Acoustic and Electroacoustic Department, Faculty of Electronics, National Technical University of Ukraine, Kyiv, Ukraine.
Keywords
Noise, Late Reverberation, Reduction, Algorithm, Speech Quality Measure, Automatic Speech Recognition Accuracy
Reference
[1]
J. Benesty, M. Sondhi, Y. Huang (Ed.), Springer Handbook of Speech Processing. Berlin Heidelberg: Springer-Verlag, 2008.
[2]
Y. Ephraim and D. Malah, “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator,” IEEE Transactions on Acoustic, Speech, and Signal Processing, vol. ASSP-32, No. 6, Dec. 1984.
[3]
Y. Ephraim and D. Malah, “Speech Enhancement Using a Minimum Mean Square Error Log-Spectral Amplitude Estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 443-445, 1985.
[4]
S. Naida, “Acoustic Theory Problems of Speech Production in the Light of the Discovery of the Formula for the Middle Ear Norm Parameter,” Proc. of IEEE 35th Int. Sc. Conf. Electronics and Nanotechnology (ELNANO), pp. 347-350, 21-24 April 2015, Kyiv, Ukraine.
[5]
C. Plapous, C. Marro, P. Scalart, “Improved Signal-to-Noise Ratio Estimation for Speech Enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, Is. 6, pp. 2098-2108, Nov. 2006.
[6]
C. Plapous, C. Marro, P. Scalart, and L. Mauuary, “A Two-Step Noise Reduction Technique,” IEEE Int. Conf. on Acoustics, Speech and Signal Proc., Vol. 1, pp. 289–292, 17–21 May, 2004.
[7]
A. Prodeus, “Performance measures of noise reduction algorithms in voice control channels of UAVs,” Proc. of IEEE 3rd Int. Conf. «Actual Problems of Unmanned Aerial Vehicles Developments», pp. 189-192, October 13-15, 2015, Kyiv, Ukraine.
[8]
A. N. Prodeus, V. S. Didkovskyi, "Assessment of a priori signal-to-noise ratio in noise reduction algorithms," Data Processing System, Kharkiv, pp. 29-34, 2015 (in Russian).
[9]
A. Prodeus, “Parameter Optimization of the Single Channel Late Reverberation Suppression Technique,” Proc. 35th International Conference on Electronics and Nanotechnology (ELNANO-2015), pp. 269-274, 2015, Kyiv, Ukraine.
[10]
S. Quackenbush, T. Barnwell, M. Clements, “Objective Measures of Speech Quality,” Prentice Hall, Englewood Cliffs, NJ, 1988.
[11]
Y. Hu, P. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on Speech and Audio Processing, 16(1), pp. 229-238, 2008.
[12]
S. Young et al, “The HTK Book,” Cambridge: University Engineering Department, 2009.
[13]
P. Naylor, N. Gaubitch, “Speech Dereverberation,” Springer-Verlag: London, 2010.
[14]
J. Beerends, E. Larsen, N. Iyer, J. van Vugt, “Measurement of speech intelligibility based on the PESQ approach,” Proc. Int. Conf. “Measurement of Speech and Audio Quality in Networks” (MESAQIN), 2 June, 2004, Prague, Czech Republic.
[15]
P. Loizou, “Speech enhancement: Theory and Practice,” Boca Raton: CRC Press, 2007.
[16]
D. Ellis, “PLP and RASTA (and MFCC, and inversion) in Matlab,” [Online]. Available: http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/
[17]
M. Brooks, “VOICEBOX: Speech Processing Toolbox for MATLAB,” Imperial College London, Electrical Engineering Department, 2014. [Online]. Available: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
[18]
Recommendation P.862 (2001) Amendment 2 (11/05) [Online]. Available: http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
[19]
A. Prodeus, “Calculations of speech quality measure PESQ in MATLAB,” Proc. of XIVth Int. Sc. Conf. «The Latest Network Technology in Ukraine», Partenit, proceedings "Vestnik UNIIS", pp. 70-76, 17-19 September, 2012, Kiev, Ukraine (in Russian).
Arcticle History
Submitted: Dec. 21, 2015
Accepted: Dec. 28, 2015
Published: Jan. 27, 2016
The American Association for Science and Technology (AASCIT) is a not-for-profit association
of scientists from all over the world dedicated to advancing the knowledge of science and technology and its related disciplines, fostering the interchange of ideas and information among investigators.
©Copyright 2013 -- 2019 American Association for Science and Technology. All Rights Reserved.