Abstract: |
A frequent pattern that occurs in a database can be an interesting explanatory variable. For instance, in market basket analysis, a frequent pattern is used as an association rule for historical purchasing data. Moreover, specific frequent patterns as emerging patterns and contrast patterns are a promising way to estimate classes in a classification problem. A classification model using the emerging patterns, Classification by Aggregating Emerging Patterns(CAEP) has been proposed (Dong et al., 1999) and several applications have been reported. It is a simple and effective method, but for some practical data, it can be computationally costs to enumerate large emerging patterns or may cause unpredicted cases. We think that there are two major reasons for this. One is emerging patterns, which are powerful when constructing a predictive model; however, they are not able to cover frequent transactions. Because of this, some of the transactions are not estimated, and the accuracy of the estimation becomes poor. Another reason is the normalization method. In CAEP, scores for each class are normalized by dividing by the median. It is a simple method, but the score distribution is sometimes biased. Instead, we propose the use of the z − score for normalization. In this paper, we propose a new, CAEP-based classification model, Classification by Aggregating Contrast Patterns (CACP). The main idea is to use contrast patterns instead of emerging patterns and to improve the normalizing method. Our computational experiments show that our method, CACP, performs better than the existing CAEP method on real data. |