A Survey of Information Extraction Using Different Databases

Miss. Aparna M. Bagde, KJCOE, Pisoli, Pune; Prof. D. C. Mehetre ,KJCOE, Pisoli, Pune

Information extraction, DBMS, file-based storage systems, extraction framework, Efficiency of extraction.

Information extraction is one time process for extraction of a particular kind of relationships of interest from a document collection. Information Extraction is the task of automatically extracting structured information from unstructured or semi-structured machine readable documents. A pipeline of special-purpose processing modules is implemented by Information extraction systems. And a pipeline of special-purpose processing modules targeting the extraction of a particular kind of information. But this kind of extraction of information is not enough because there is some disadvantages occurs i.e. when the information have to be modified or improved, here only small part of the corpus might be affected. In this seminar we proposed the new extraction technique in which extraction needs are expressed in the form of database queries, which are evaluated and optimized by database systems. Furthermore, our approach provides automated query generation components so that casual users do not have to learn the query language in order to perform extraction. “Efficiency and quality of extraction “are the two things in which we highlighted in the information extraction system. In this, we propose a new paradigm for information extraction. In this extraction framework, intermediate output of each text processing component is stored so that only the improved component has to be deployed to the entire corpus. Extraction is then performed on both the previously processed data from the unchanged components as well as the updated data generated by the improved component. Performing such kind of incremental extraction can result in a tremendous reduction of processing time. To realize this new information extraction framework, we propose to choose database management systems over file-based storage systems to address the dynamic extraction needs. Our proposed information extraction is composed of two phase’s i.e. initial phase and extraction phase. Here we use different types of DBMS and its comparison with conclusion that, which database is efficient for incremental information extraction.
    [1] D.Ferrucci and A.Lally,"UIMA: An incremental Approach to unstructured information processing in the corporate Research Environment,"Natural language Eng.,vol. 10,nos.3/4,pp.327-348,2004. [2] H.Cunningham, D.maynard, K.Bontcheva, and V. Tablan,"GATE: A Framework and Graphical Development Environment for robust NLP tools and applicationas,"Proc.40th Ann.Meeting of the ACL, 2002. [3] D.Grinberg, J.Lafferty, and D.Sleator," A Robust Parsing Algorithm for link Grammars," Technical Report CMU-CS-TR-95-125, Carnegie Mellon Univ. 1995 [4] S.Bird et al., "Designing and Evaluating Programs an XPath Dialect for Linguistic Queries," Proc 22nd Int'l conf.Data Eng.,(ICDE '06),2006. [5] D.D.Sleator and D. Temperley," Parsing English with a link Grammar," Proc Third Int'l workshop Parsing Technologies,1993. [6] E.Agichtein and L.Gravano, "Querying Text Databases for Efficient Information Extraction,"Proc.Int'l conf. Data Eng.(ICDE),pp.113-124,2003. [7] J.Hakenberg,C.Plake, R.Leaman, M.Schroeder, and G.Gonzalez," Inter-Species Normalization of Gene Mentions with GNAT," Proc. European Conf. Computational Biology (ECCB '08),2008. [8] L.Hunter , Z.Lu, J.Firby, W. Banumgartnar, H.Johnson, P.Ogren, and K.B.Cohen,"OpenDMAP:An Open Source ,Ontology Driven Concept Analysis Engine, with application capturing Knowledge Regarding Protein Transport, Protein Interactions and Celltype- Specific Gene Expression,"BMC Bioinformatics, vol.0,article no.78,2008. [9] A.Jain, A.Doan, L.Gravano," Optimizing SQL Queries over text database," Proc IEEE 24th int'l conf. Data Eng.(ICDE '08),pp.636-645,2008. [10] A.R. Aronson, “Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program,” Proc. AMIA Symp., p. 17, 2001. [11] A.Carolin Arockia Mary, S.Abirami, A.Ajitha, “Incremental Information Extraction Using Dependency Parser”, International Journal of Engineering Research & Technology (IJERT), 2013.
Paper ID: GRDJEV02I010057
Published in: Volume : 2, Issue : 1
Publication Date: 2017-01-01
Page(s): 35 - 43