An Approach Towards Generating Subtitles Automatically from Videos by Extracting Audio

Rizwan Sheikh, P.r.pote college of engineering amravati; Swapnil Suryajoshi ,P.r.pote college of engineering amravati; Shivam Gupta ,P.r.pote college of engineering Amravati; Sushant Tayde ,

sphinx,ffmpeg,videos

Videos are very important and helpful in our daily life for understanding and comprehend the information. With the help of videos user can gain the knowledge and spreads that knowledge to the other peoples. Hence, here it becomes important to make videos available to the people having auditory problems and even more for the people to remove the gaps of their native language. This all things are getting by using the subtitles for videos. There are several websites which are providing the subtitle files for the videos but not for all videos. Downloading subtitles from internet is a monotonous process. Hence to generate subtitles automatically with the help of software is the valid subject to research, this research paper resolves the problems mentioned above through the speech recognition technology. There are three models helps to generate subtitles for videos, Audio Extraction helps to extract audio from video and convert into .wav format for speech recognition process. Here 24% reduction rate has been achieved in the size of the video after the extraction. With the help of PocketSphinx speech recognition engine the .wav file get processed and the speech is get converted into text format and stored in .srt file for future use.
    [1] Boris Guenebaut, “Automatic Subtitle Generation for Sound in Videos”, Tapro 02, University West, pp. 35, 2009. [2] Abhinav Mathur, Tanya Saxena, “Generating Subtitles Automatically using Audio Extraction and Speech Recognition”, 7th International Conference on Contemporary Computing (IC3), 2015. [3] Ibrahim Patel1 Dr. Y. Srinivas Rao, “Speech Recognition Using HMM with MFCC- An Analysis using Frequency Spectral Decomposition Technique”, Signal & Image Processing: An International Journal(SIPIJ), Vol.1, No.2, December 2010. [4] B. H. Juang; L. R. Rabiner, “Hidden Markov Models for Speech Recognition”, Journal of Techno metrics, Vol.33, No. 3. Aug. 1991. [5] Youhao Yu “Research on Speech Recognition Technology and Its Application,” Electronics and Information Engineering, International Conference on Computer Science and Electronics Engineering, 2012. [6] Zoubin Ghahramani, “An introduction to hidden Markov models and Bayesian networks”, World Scientific Publishing Co., Inc. River Edge, NJ, USA, 2001. [7] ImTOO Software Studio, “ImTOO DVD to Video Family”, http://www.imtoo.com/dvd-ripper.html/ [8] Xilisoft Corporation, “Xilisoft DVD to Video Ultimate”, http://www.xilisoft.com/dvd-ripper.html/ [9] Frederick Jelinek, “Statistical Methods for Speech Recognition”. MIT Press, pp. 7, 1999. [10] Sadaoki Furui, Li Deng, Mark Gales, Hermann Ney, and Keiichi Tokuda,” Fundamental Technologies in Modern Speech Recognition,” Signal Processing, IEEE Signal Processing Society, November 2012. [11] S. Ross, “Sphinx-4 a speech recognizer written entirely in the java programming language”, pp. 5, 1999-2008. URL http://cmusphinx.sourceforge.net/sphinx4/.
Paper ID: GRDJEV03I050098
Published in: Volume : 3, Issue : 5
Publication Date: 2018-05-01
Page(s): 63 - 67