Publication
IBM Systems Journal
Paper

Fastfinger: A study into the use of compressed residue pair separation matrices for protein sequence comparison

View publication

Abstract

Protein sequences are diverse in size and in content meaningful to researchers They are rich in what seems to be "noise," or aspects of lesser interest that obscure clearer core features required to establish true relatedness and function. This paper represents part of a larger study that explores the possible efficient use and storage of "fingers" for protein sequence analysis, i.e., matrices of uniform size and shape that can "stand for" protein sequences by making more explicit the essential aspects of protein sequence pattern information. The essence of the study relates to data compression. Compression invokes an interesting alternative idea of pattern-the concept of "primeness" as in number theory is used to create the notion of an irreducible and potentially recurrent pattern element, and then this philosophy is mapped onto number theory by the unique factorization theorem, in order to define a novel measure of pattern difference. Other possible approaches are also discussed. Because compression and other approximations involve information loss, this is also a study of performance in the face of such loss. Because of the effects of this loss, no claims are made that encourage replacement of established sequence comparison methods, but the concept may have value in a number of applications within, and outside, molecular biology.

Date

Publication

IBM Systems Journal

Authors

Topics

Share