Volume- 2
Issue- 5
Year- 2014
Article Tools: Print the Abstract | Indexing metadata | How to cite item | Email this article | Post a Comment
B. Sindhuja , Mrs. VeenaTrivedi
This paper presents textual document clustering using two approaches namely cosine similarity and frequency and inverse document frequency. With the combination of these approaches a similarity measure values are generated between keywords in the documents and between the documents. Using this approach, the best related document can be identified on the basis of clustering method called correlation preserving index in which related documents are stored in an index format.
[1]R.T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB), pp. 144-155, 1994.
[2] A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
[3] S. Kotsiantis and P. Pintelas, “Recent Advances in Clustering: A Brief Survey,” WSEAS Trans. Information Science and Applications, vol. 1, no. 1, pp. 73-81, 2004.
[4] R. Mihalcea and C. Corley “Measuring the Semantic Similarity of Texts,” Proc. ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005, page. 13-18.
[5] X. Liu, Y. Gong, W. Xu, and S. Zhu, “Document Clustering with Cluster Refinement and Model Selection Capabilities,” Proc. 25th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’02), page. 191-198, 2002.
[6] D. Cai, X. He, and J. Han, “Document Clustering Using Locality Preserving Indexing,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 12, pp. 1624-1637, Dec. 2005.
[7]S.C. Deerwester, S.T. Dumais, “Indexing by Latent Semantic Analysis,” J. Am.Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[8] K.P.N.V.Satya Sree1, Dr.J V R Murthy2 ”Clustering Based On Cosine Similarity Measure” International Journal Of Engineering Science & Advanced Technology Volume-2, Issue-3,2012
[9] An improved TF-IDF approach for textclassification ZHANG Yun-tao, GONG Ling 2004
[10]Taiping Zhang, Yuan Yan Tang, Bin Fang and Yong Xiang “Document Clustering in Correlation Similarity Measure Space” Ieee Transactions On Knowledge And Data Engineering, Vol. 24, No. 6, June 2012
Information Technology, Gokaraju Rangarju Institute of Engineering and Technology, Hyderabad, India, 9032663923
No. of Downloads: 15 | No. of Views: 4830
Anmol Chauhan, Sana Rabbani, Devendra Agarwal, Nikhat Akhtar, Yusuf Perwej.
July 2024 - Vol 12, Issue 4
Dr S. A. Talekar, Shravani A. Lajurkar, Divya S. Patil, Rutika A. Benke, Pranjal A. Kunde.
May 2024 - Vol 12, Issue 3
Dr. Deepika Rani.
May 2024 - Vol 12, Issue 3