ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins

Sandhya, S and Mudgal, R and Jayadev, C and Abhinandan, KR and Sowdhamini, R and Srinivasan, N (2012) Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. In: MOLECULAR BIOSYSTEMS, 8 (8). pp. 2076-2084.

[img] PDF
mol_bio_sys_8-8_2076-2084_2012.pdf - Published Version
Restricted to Registered users only

Download (2709Kb) | Request a copy
[img] PDF
mol_bio_sys_8-8_2078-2084-supp_2012.pdf - Supplemental Material
Restricted to Registered users only

Download (977Kb) | Request a copy
Official URL: http://dx.doi.org/10.1039/c2mb25113b

Abstract

Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of `protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a `roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.

Item Type: Journal Article
Additional Information: Copy right for this article belongs to The Royal Society of Chemistry 2012
Department/Centre: Division of Biological Sciences > Molecular Biophysics Unit
Division of Information Sciences > Supercomputer Education & Research Centre
Others
Date Deposited: 06 Aug 2012 07:22
Last Modified: 06 Aug 2012 07:28
URI: http://eprints.iisc.ernet.in/id/eprint/44932

Actions (login required)

View Item View Item