<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2d1 20170631//EN" "JATS-journalpublishing1.dtd">
<ArticleSet>
  <Article>
    <Journal>
      <PublisherName>IJIRCSTJournal</PublisherName>
      <JournalTitle>International Journal of Innovative Research in Computer Science and Technology</JournalTitle>
      <PISSN>I</PISSN>
      <EISSN>S</EISSN>
      <Volume-Issue>Volume 13 Issue 3</Volume-Issue>
      <PartNumber/>
      <IssueTopic>Computer Science and Engineering</IssueTopic>
      <IssueLanguage>English</IssueLanguage>
      <Season>May - June 2025</Season>
      <SpecialIssue>N</SpecialIssue>
      <SupplementaryIssue>N</SupplementaryIssue>
      <IssueOA>Y</IssueOA>
      <PubDate>
        <Year>2025</Year>
        <Month>05</Month>
        <Day>15</Day>
      </PubDate>
      <ArticleType>Computer Sciences</ArticleType>
      <ArticleTitle>Speech Recognition Technologies: Design, Challenges, and Real-World Applications</ArticleTitle>
      <SubTitle/>
      <ArticleLanguage>English</ArticleLanguage>
      <ArticleOA>Y</ArticleOA>
      <FirstPage>55</FirstPage>
      <LastPage>61</LastPage>
      <AuthorList>
        <Author>
          <FirstName>Maruti Maurya</FirstName>          
          <AuthorLanguage>English</AuthorLanguage>
          <Affiliation/>
          <CorrespondingAuthor>Y</CorrespondingAuthor>
          <ORCID/>
                      <FirstName>Mohd Zaheer</FirstName>          
          <AuthorLanguage>English</AuthorLanguage>
          <Affiliation/>
          <CorrespondingAuthor>N</CorrespondingAuthor>
          <ORCID/>
                    <FirstName>Nawab Mohammad</FirstName>          
          <AuthorLanguage>English</AuthorLanguage>
          <Affiliation/>
          <CorrespondingAuthor>N</CorrespondingAuthor>
          <ORCID/>
                    <FirstName>Sadaf siddiqui</FirstName>          
          <AuthorLanguage>English</AuthorLanguage>
          <Affiliation/>
          <CorrespondingAuthor>N</CorrespondingAuthor>
          <ORCID/>
                    <FirstName>Mohd Zeeshan Khan</FirstName>          
          <AuthorLanguage>English</AuthorLanguage>
          <Affiliation/>
          <CorrespondingAuthor>N</CorrespondingAuthor>
          <ORCID/>
                    <FirstName>Mohd Ayan Akram</FirstName>          
          <AuthorLanguage>English</AuthorLanguage>
          <Affiliation/>
          <CorrespondingAuthor>N</CorrespondingAuthor>
          <ORCID/>
           
        </Author>
      </AuthorList>
      <DOI>https://doi.org/10.55524/ijircst.2025.13.3.9</DOI>
      <Abstract>This paper presents an automated speech recognition (ASR) system that transcribes audio from YouTube videos into accurate text using OpenAI&amp;#39;s Whisper model. Leveraging tools such as yt_dlp, FFmpeg, and PyTorch, the system creates a robust speech-to-text pipeline. On receiving a video URL, the system extracts and preprocesses audio, transcribes it using Whisper, and evaluates transcription quality through metrics like Word Error Rate (WER), Character Error Rate (CER), and Match Error Rate (MER). The pipeline supports offline use, making it suitable for accessible, cost-effective deployment in educational, research, and assistive applications.</Abstract>
      <AbstractLanguage>English</AbstractLanguage>
      <Keywords>OpenAI Whisper Model, YouTube Audio Transcription, Word Error Rate (WER), Character Error Rate (CER), Multilingual Speech Recognition, Audio Preprocessing</Keywords>
      <URLs>
        <Abstract>https://ijircst.org/abstract.php?article_id=1372</Abstract>
      </URLs>      
    </Journal>
  </Article>
</ArticleSet>