Time Reduction Mechanism in Information Extraction Using Parse Tree Query Language
K. Venkatesh , Mr. B. Vijaya Bhaskar Reddy
Information extraction (IE) is the task of automatically extracting structured information from unstructured and semi-structured machinereadable document. In this paper, we propose a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time. To realize this new information extraction framework, we propose to choose database management systems over filebased storage systems to address the dynamic extraction needs. To demonstrate the feasibility of incremental extraction approach, experiments are performed to highlight two important aspects of an information extraction system: efficiency and quality of extraction results.
Text mining, query languages, information storage and retrieval
 D. Ferrucci and A. Lally, “UIMA: An Architectural Approach to Unstructured Information Processing in the Corporate Research Environment,” Natural Language Eng., vol. 10, nos. 3/4, pp. 327-348, 2004.
 E. Agichtein and L. Gravano, “Snowball: Extracting Relations from Large Plain-Text Collections,” Proc. Fifth ACM Conf. Digital Libraries, pp. 85-94, 2000.
 M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni, “Open Information Extraction from the Web,” Proc. Joint Conf. Artificial Intelligence (IJCAI), 2007.
 W. Baumgartner, Z. Lu, H. Johnson, J. Caporaso, J.Paquette, E. White, O. Medvedeva, K. Cohen, and Hunter, “An Integrated Approach to Concept Recognition in Biomedical Text,” Proc. Second Bio Creative Challenge, 2006.
 S. Bird, Y. Chen, S.B. Davidson, H. Lee, and Y. Zheng, Extending XPath to Support Linguistic Queries,” Proc. Workshop Programming Language Technologies for XML (PLAN-X), 2005.
 M. Cafarella, D. Downey, S. Soderland, and O. Etzioni, “Knowitnow: Fast, Scalable Information Extraction from the Web,” Proc. Conf. Human Language Technology and Empirical Methods in Natural Language Processing (HLT ’05), pp. 563-570, 2005.
 J.T. Chang and R.B. Altman, “Extracting and Characterizing GeneDrug Relationships from the Literature,” Pharmacogenetics, vol. 14, no. 9, pp. 577-586, Sept. 2004.
 F. Chen, A. Doan, J. Yang, and R. Ramakrishnan, “Efficient Information Extraction over Evolving Text Data,” Proc IEEE 24th Int’l Conf. Data Eng. (ICDE ’08), pp. 943-952, 2008.
 F. Chen, B. Gao, A. Doan, J. Yang, and R. Ramakrishnan, “Optimizing Complex Extraction Programs over Evolving Text Data,” Proc 35th ACM SIGMOD Int’l Conf. Management of Data (SIGMOD ’09), pp. 321-334, 2009.
 H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,” Proc. 40th Ann. Meeting of the ACL, 2002
[K. Venkatesh , Mr. B. Vijaya Bhaskar Reddy (2014), Time Reduction Mechanism in Information Extraction Using Parse Tree Query Language, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), Vol-2, Issue-5, Page No-17-21], (ISSN 2347 - 5552). www.ijircst.org
B.Tech degree from the department of Computer Science and Engineering from Sree Vidyanikethan Engineering College of Engineering, A.Rangampet, Tirupathi(Affiliated to JNTU Ananthapuramu). He is pursuing M.Tech from the department of Computer Science and Engineering in Shri Shirdi Sai Institute of Science and Engineering, Vadiyampeta, Ananthapuramu (Affiliated to JNTUAnanthapuramu). His current research interests include “Time Reduction Mechanism in Information Extraction Using PTQL”.