The large quantity of education video out there on web, a rise the usability of video data are growing rapidly. Video transcription within which should be conversion video lecture into text information. This can be manner of produce document or notes through the video. This paper present an ASR technique supported Hidden Markov Model. First of all, extract audio from video and transforms speech wave form into multiple frame used by recognition, applying Automatic Speech Recognition on audio track and extract raw data from audio. Then analysis of data in order to get the phonetic dictionary, the pronunciation of every word must be represent phonetically. And represent text document as output of video file