Usage of Cosine Similarity and term Frequency count for Textual document Clustering
B. Sindhuja , Mrs. VeenaTrivedi
This paper presents textual document clustering using two approaches namely cosine similarity and frequency and inverse document frequency. With the combination of these approaches a similarity measure values are generated between keywords in the documents and between the documents. Using this approach, the best related document can be identified on the basis of clustering method called correlation preserving index in which related documents are stored in an index format.
R.T. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB), pp. 144-155, 1994.
 A.K. Jain, M.N. Murty, and P.J. Flynn, “Data Clustering: A Review,” ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
 S. Kotsiantis and P. Pintelas, “Recent Advances in Clustering: A Brief Survey,” WSEAS Trans. Information Science and Applications, vol. 1, no. 1, pp. 73-81, 2004.
 R. Mihalcea and C. Corley “Measuring the Semantic Similarity of Texts,” Proc. ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, 2005, page. 13-18.
 X. Liu, Y. Gong, W. Xu, and S. Zhu, “Document Clustering with Cluster Refinement and Model Selection Capabilities,” Proc. 25th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’02), page. 191-198, 2002.
 D. Cai, X. He, and J. Han, “Document Clustering Using Locality Preserving Indexing,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 12, pp. 1624-1637, Dec. 2005.
S.C. Deerwester, S.T. Dumais, “Indexing by Latent Semantic Analysis,” J. Am.Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
 K.P.N.V.Satya Sree1, Dr.J V R Murthy2 ”Clustering Based On Cosine Similarity Measure” International Journal Of Engineering Science & Advanced Technology Volume-2, Issue-3,2012
 An improved TF-IDF approach for textclassification ZHANG Yun-tao, GONG Ling 2004
Taiping Zhang, Yuan Yan Tang, Bin Fang and Yong Xiang “Document Clustering in Correlation Similarity Measure Space” Ieee Transactions On Knowledge And Data Engineering, Vol. 24, No. 6, June 2012
[B. Sindhuja, Mrs. VeenaTrivedi (2014), Usage of Cosine Similarity and term Frequency count for Textual document Clustering, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), Vol-2, Issue-5, Page No-9-12], (ISSN 2347 - 5552). www.ijircst.org
Information Technology, Gokaraju Rangarju Institute of Engineering and Technology, Hyderabad, India, 9032663923