ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Efficient bottom-up hybrid hierarchical clustering techniques for protein sequence classification

Vijaya, PA and Murty, Narasimha M and Subramanian, DK (2006) Efficient bottom-up hybrid hierarchical clustering techniques for protein sequence classification. In: Pattern Recognition, 39 (12). pp. 2344-2355.

[img] PDF
Efficient_bottom-up_hybrid_hierarchical_clustering_techniques_for_protein_sequence.pdf
Restricted to Registered users only

Download (241Kb) | Request a copy

Abstract

Hybrid hierarchical clustering techniques which combine the characteristics of different partitional clustering techniques or partitional and hierarchical clustering techniques are interesting. In this paper, efficient bottom-up hybrid hierarchical clustering (BHHC) techniques have been proposed for the purpose of prototype selection for protein sequence classification. In the first stage, an incremental partitional clustering technique such as leader algorithm (ordered leader no update (OLNU) method) which requires only one database (db) scan is used to find a set of subcluster representatives. In the second stage, either a hierarchical agglomerative clustering (HAC) scheme or a partitional clustering algorithm-'K-medians' is used on these subcluster representatives to obtain a required number of clusters. Thus, this hybrid scheme is scalable and hence would be suitable for clustering large data sets and we also get a hierarchical structure consisting of clusters and subclusters and the representatives of which are used for pattern classification. Even if more number of prototypes are generated, classification time does not increase much as only a part of the hierarchical structure is searched. The experimental results (classification accuracy (CA) using the prototypes obtained and the computation time) of the proposed algorithms are compared with that of the hierarchical agglomerative schemes, K-medians and nearest neighbour classifier (NNC) methods. The proposed methods are found to be computationally efficient with reasonably good CA.

Item Type: Journal Article
Additional Information: Copyright of this article belongs to Elsevier.
Keywords: Hybrid clustering;Hierarchical structure;Protein sequences; Median strings/sequences;Prototypes;Feature selection; Classification accuracy;
Department/Centre: Division of Electrical Sciences > Computer Science & Automation (Formerly, School of Automation)
Date Deposited: 28 May 2008
Last Modified: 19 Sep 2010 04:45
URI: http://eprints.iisc.ernet.in/id/eprint/14063

Actions (login required)

View Item View Item