The naïve Bayes classifier is one of the commonly used data mining methods for classification. Despite its simplicity, naïve Bayes is effective and computationally efficient. Although the strong attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning, this assumption may not hold in real-world applications. Many enhancements to the basic algorithm have been proposed in order to alleviate the violation of attribute independence assumption. While these methods improve the classification performance, they do not necessarily retain the mathematical structure of the naïve Bayes model and some at the expense of computational time. One approach to reduce the naïveté of the classifier is to incorporate attribute weights in the conditional probability. In this paper, we proposed a method to incorporate attribute weights to naïve Bayes. To evaluate the performance of our method, we used the public benchmark datasets. We compared our method with the standard naïve Bayes and baseline attribute weighting methods. Experimental results show that our method to incorporate attribute weights improves the classification performance compared to both standard naïve Bayes and baseline attribute weighting methods in terms of classification accuracy and F1, especially when the independence assumption is strongly violated, which was validated using the Chi-square test of independence.

The naïve Bayes classifier is one of the widely used algorithms for data mining applications. The naïveté in the classifier is that all the attributes are assumed to be independent given the class. Such assumption simplifies the computation to infer the probability of a class given the data. Although the attribute independence assumption in the naïve Bayes classifier makes it a tractable method for learning, this assumption may not hold in real-world applications.

Various approaches have been proposed to relax the attribute independence assumption. One of the approaches is to combine the naïve Bayes with a pre-processing step. In this approach, an attribute selection is first performed to identify the set of informative attributes before training the naïve Bayes [

Another approach to mitigate the independence assumption is the structure extension [

Since some attributes have more influences in discriminating the classes, an alternative approach is to apply an attribute weighting method. In this approach, different weights are assigned to different attributes with a higher weight for attributes that have more influences. Although there are many methods proposed to calculate the weights for naïve Bayes learning, these methods incorporate the weights by raising the power of the conditional probabilities. However, incorporating weights in this manner may cause the conditional probabilities to behave inversely. In this paper, we propose a method to address this issue. The proposed method is evaluated on public benchmark datasets and compared to other baseline attribute-weighting methods.

The remainder of this paper is structured as follows. Section 2 reviews the related work. Section 3 presents our proposed method. Section 4 describes the experimental setup. Section 5 discusses the experimental results. Section 6 draws the conclusions of the research.

Attribute weighting methods in naive Bayesian learning can be divided into two methods: wrapper-based and filter-based. The wrapper-based methods use a classifier as an evaluation function to score the attributes based on their classification performance, whereas filter-based methods apply some heuristics to evaluate the characteristics of the attributes.

Among the earlier work that used a filter-based method for attribute weights is the work of Lee et al. [

Yu et al. [

There are works [

In this section, we describe our proposed method and the rationale of our method.

The naïve Bayes classifier is a probabilistic model that applied the Bayes theorem in classification [_{1}, a_{2}, …, a_{n}), we can calculate the conditional probability that the class of this instance is

In practice, we are only interested in the numerator in

The naïve Bayes classifier classifies the instance by maximizing the probability at the right hand side of _{j} if

In real application of classification, the assumption of conditional independence between attributes is not always true. One approach to alleviate the independence assumption is to incorporate attribute weights into the naïve Bayes. We propose to incorporate the weight _{i} of attribute _{i} into the naïve Bayes classifier as in

The conditional probability _{i} given that the class is _{j}. This probability should be larger if the chance to observe _{i} is highly dependent on class _{j}. The weight _{i} of attribute _{i} should also be larger if this attribute has a higher influence compared to other attributes. When we incorporate the weights into the naïve Bayes, we are looking for a relationship as in

Since the conditional probability is always bounded between 0 and 1, _{i} when the power of this conditional probability is raised to _{i}. In order to correctly reflect the importance of the attributes, the power should be raised as the negative of attribute weights. The exponential function is included in our weighted naïve Bayes formula to ensure that the conditional probability remains unchanged when the weight is zero. Incorporating attribute weights this way not only help to relax the conditional independence assumption, but also preserve the original mathematical structure of the naïve Bayes.

To further illustrate our proposed method, let us assume that the conditional probability of the attribute _{i} given the class _{j},

The performance of our method is evaluated on a collection of 9 public benchmark datasets obtained from UCI repository [

All instances with missing values were removed from the dataset and numerical attributes were discretized using the supervised discretisation method of Fayyad et al. [

To evaluate the performance of our method, these data were cross-validated using a 5-fold cross-validation method. Stratified sampling method is used to sample the training data. The train:test ratio is 70:30 for each class. The classification performance was obtained by averaging the results from the 5 runs. Two measurements were used to evaluate the classification performance: accuracy and F1. Accuracy measures the number of times the model correctly makes the prediction, while F1 is the harmonic mean of precision and recall.

Dataset | No. of instances | No. of attributes | % of missing values |
---|---|---|---|

Abalone (AB) | 4177 | 8 | 0 |

Default of credit card (CC) | 30000 | 24 | 0.18 |

Indian liver patient (LP) | 583 | 10 | 0.69 |

Mushroom (MR) | 8124 | 22 | 30.52 |

Adult (AD) | 48842 | 13 | 37.11 |

Adverse drug event (AE) | 18424 | 7 | 0 |

Heard disease (HD) | 298 | 13 | 0 |

Breast cancer wisconsin (BC) | 699 | 32 | 2.29 |

Credit approval (CA) | 690 | 15 | 5.36 |

Tic-Tac-Toe endgame (TTT) | 958 | 9 | 0 |

We conducted two sets of experiments. The first experiment compared our method with standard naïve Bayes,

The classification performance of our method using KL measure as attribute weights and standard naïve Bayes is presented in

When we used IG as the attribute weights, we obtained a similar result as in KL when compared to the standard naïve Bayes for accuracy. As shown in

The majority of the existing work for weighted naïve Bayes [

This experiment compares the classification performance of our method to incorporate attribute weights (following

When IG measure is used as the method to compute attribute weights, we observed the same performance results. As shown in

In order to better understand the experimental results, we have conducted a test to evaluate the assumption of conditional independence in naïve Bayes. This independence assumption is valid only if all the variables are pairwise independent. The chi-square test of independence is used to verify if a pair of attributes are independent; thus it is a suitable test for the verification of this assumption. Since we are evaluating the independence between attributes condition on the class, each dataset is divided into different subsets according to their classes. The chi-square test is applied on each subset to evaluate pairwise independence between attributes.

The null hypothesis (H_{o}) of the chi-square test is that a pair of attributes are independent and H_{o} is rejected if the

A1 | A2 | A3 | |
---|---|---|---|

A1 | – | I | NI |

A2 | – | NI | |

A3 | – |

Sex | Length | Diameter | Height | Whole | Shucked | Viscera | Shell | |
---|---|---|---|---|---|---|---|---|

(a) Class = “Larger” | ||||||||

Sex | – | NI | NI | NI | NI | NI | NI | NI |

Length | – | NI | NI | NI | NI | NI | NI | |

Diameter | – | NI | NI | NI | NI | NI | ||

Height | – | NI | NI | NI | NI | |||

Whole | – | NI | NI | NI | ||||

Shucked | – | NI | NI | |||||

Viscera | – | NI | ||||||

Shell | – | |||||||

(b) Class = “smaller” | ||||||||

Sex | – | NI | NI | NI | NI | NI | NI | NI |

Length | – | NI | NI | NI | NI | NI | NI | |

Diameter | – | NI | NI | NI | NI | NI | ||

Height | – | NI | NI | NI | NI | |||

Whole | – | NI | NI | NI | ||||

Shucked | – | NI | NI | |||||

Viscera | – | NI | ||||||

Shell | – |

A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 | |
---|---|---|---|---|---|---|---|---|---|

(a) Class = “positive” | |||||||||

A1 | – | I | NI | I | NI | NI | NI | NI | I |

A2 | – | I | NI | I | NI | NI | NI | NI | |

A3 | – | NI | NI | I | I | NI | NI | ||

A4 | – | I | NI | I | NI | NI | |||

A5 | – | I | NI | I | NI | ||||

A6 | – | NI | NI | I | |||||

A7 | – | I | NI | ||||||

A8 | – | I | |||||||

A9 | – | ||||||||

(b) Class = “negative” | |||||||||

A1 | – | I | I | I | NI | NI | I | NI | I |

A2 | – | I | NI | I | NI | NI | NI | NI | |

A3 | – | NI | NI | I | I | NI | I | ||

A4 | – | I | NI | I | NI | NI | |||

A5 | – | I | NI | I | NI | ||||

A6 | – | NI | NI | I | |||||

A7 | – | I | I | ||||||

A8 | – | I | |||||||

A9 | – |

For the other datasets where our method performed better, we see a similar results of chi-square test alike the one in AB dataset. For example, out of 55 pairwise chi-square test in MR dataset with 11 attributes, only 2 pairs of attributes are independent in one class and 7 pairs are independent in another. This again means that there is a serious violation on the independence assumption. Our method to incorporate attribute weights is still able to achieve a higher performance even when the independence assumption is seriously violated.

We have also compared the performance of the baseline method to the standard naïve Bayes and the results are presented in

Dataset | Standard |
Baseline method | |
---|---|---|---|

KL | IG | ||

(a) Accuracy (%) | |||

AB | 61 | 62 | 62 |

CC | 73 | 70 | 71 |

LP | 67 | 69 | 65 |

MR | 85 | 85 | 85 |

AD | 83 | 77 | 77 |

AE | 61 | 61 | 61 |

HD | 83 | 80 | 80 |

BC | 97 | 97 | 97 |

CA | 86 | 86 | 85 |

TTT | 72 | 71 | 71 |

(b) F1-measure (%) | |||

AB | 69 | 69 | 70 |

CC | 83 | 80 | 81 |

LP | 74 | 76 | 71 |

MR | 78 | 76 | 76 |

AD | 88 | 83 | 83 |

AE | 66 | 66 | 66 |

HD | 81 | 75 | 78 |

BC | 98 | 98 | 98 |

CA | 84 | 86 | 85 |

TTT | 80 | 77 | 77 |

This paper proposed a new method to incorporate the attribute weights in the computation of conditional probabilities for naïve Bayes classifier. In this paper, we explored two attribute weights based on Kullback-Leibler and information gain measures, and incorporated these weights using our method. We evaluated our method on public benchmark datasets obtained from UCI and FDA repositories. Our attribute weighting method outperformed both the standard naïve Bayes and baseline weighting methods in terms of accuracy and F1. Our method is able to achieve a better performance even when the conditional independence assumption is seriously violated, which is validated with the chi-square test.