Joel L. Wolf, Mark S. Squillante, et al.
IEEE Transactions on Knowledge and Data Engineering
In this paper, we focus on mining periodic patterns allowing some degree of imperfection in the form of random replacement from a perfect periodic pattern. Information gain was proposed to identify patterns with events of vastly different occurrence frequencies and adjust for the deviation from a pattern. However, it does not take any penalty if there exists some gap between the pattern occurrences. In many applications, e.g., bio-informatics, it is important to identify subsequences that a pattern repeats perfectly (or near perfectly). As a solution, we extend the information gain measure to include a penalty for gaps between pattern occurrences. We call this measure as generalized information gain. Furthermore, we want to find subsequence S′ such that for a pattern P, the generalized information gain of P in S′ is high. This is particularly useful in locating repeats in DNA sequences. In this paper, we developed an effective mining algorithm, InfoMiner+, to simultaneously mine significant patterns and the associated subsequences. © 2002 IEEE.
Joel L. Wolf, Mark S. Squillante, et al.
IEEE Transactions on Knowledge and Data Engineering
Junyi Xie, Jun Yang, et al.
ICDE 2008
Douglas W. Cornell, Daniel M. Dias, et al.
IEEE Transactions on Software Engineering
Bruno Ciciani, Daniel M. Dias, et al.
IEEE Transactions on Software Engineering