Propositionalization has been proven to be a very effective solution for multi-relational data mining problems. The approaches usually follow a two-step principle: transforming the relational data into a single, flat table and applying a propositional learning algorithm. During the transformation, the target table gets expanded by adding many new features summarizing the information of the non-target tables. Based on the used feature construction strategy, this leads to a table of very high dimensionality with a lot of irrelevant and/or redundant features that can negatively affect the predictive performance. In this paper, we propose a modification of the traditional two-step framework to overcome such problems. The proposed approach evaluates the features during the construction phase and reports only a subset of highly predictive features to the propositional learner. We present an implementation of this approach using a genetic algorithm to search for an optimal feature subset. Our experiments on a number of benchmark datasets suggest that the modified framework can help propositionalization methods to significantly improve their predictive performance.
|Journal||International Journal of Software Engineering and Knowledge Engineering|
|Publication status||Published - 2018|