International Journal of Innovative Research in Computer Science and Technology
Year: 2025, Volume: 13, Issue: 3
First page : ( 55) Last page : ( 61)
Online ISSN : 2350-0557.
DOI: 10.55524/ijircst.2025.13.3.9 |
DOI URL: https://doi.org/10.55524/ijircst.2025.13.3.9
Crossref
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0)
Article Tools: Print the Abstract | Indexing metadata | How to cite item | Email this article | Post a Comment
Maruti Maurya , Mohd Zaheer, Nawab Mohammad, Sadaf siddiqui, Mohd Zeeshan Khan, Mohd Ayan Akram
This paper presents an automated speech recognition (ASR) system that transcribes audio from YouTube videos into accurate text using OpenAI's Whisper model. Leveraging tools such as yt_dlp, FFmpeg, and PyTorch, the system creates a robust speech-to-text pipeline. On receiving a video URL, the system extracts and preprocesses audio, transcribes it using Whisper, and evaluates transcription quality through metrics like Word Error Rate (WER), Character Error Rate (CER), and Match Error Rate (MER). The pipeline supports offline use, making it suitable for accessible, cost-effective deployment in educational, research, and assistive applications.
Assistant Professor, Department of Computer Science and Engineering, Integral University, Lucknow, India
No. of Downloads: 14 | No. of Views: 530
Mohd Tanveer, Mohd Azat, Inayat Husain, Sakil Ahmad, Mohammad Aalam Khan.
May 2025 - Vol 13, Issue 3
MD Shahid Ali, Saif Ali , Abdullah Parwez, Abu Sufiyan, Mohd Haroon.
May 2025 - Vol 13, Issue 3
Shra Fatima, Maliha Fatima, Wafa Zaidi.
May 2025 - Vol 13, Issue 3