machine learning - Generating a Decision Tree that Perfectly Models the Training Set? -
i have data set rules, , want generate decision tree @ least has 100% accuracy @ classifying rules, can never 100%. set minnumobjs 1 , made unpruned 84% correctly classified instances.
my attributes are:
@attribute users numeric @attribute bandwidth numeric @attribute latency numeric @attribute mode {c,h,dcf,mp,dc,ind} ex data:
2,200000,0,c 2,200000,1000,c 2,200000,2000,mp 2,200000,5000,c 2,400000,0,c 2,400000,1000,dcf can me understand why can never 100% of instances classified , how can 100% of them classified (while still allowing attributes numeric)
thanks
it impossible 100% accuracy due identical feature vectors having different labels. guessing in case users, bandwidth, , latency features, while mode label trying predict. if so, there may identical values of {users, bandwidth, latency} happen have different mode labels.
in general, having different labels same features may occur through 1 of several ways:
- there noise in data due bad reading of data.
- there source of randomness not captured.
- there more possible features can distinguish between different labels, features not in data set.
one thing can run training set through decision tree , find items misclassified. try determine why wrong , see if data instances exhibit wrote above (namely there data instances same features different labels).
Comments
Post a Comment