machine learning - Generating a Decision Tree that Perfectly Models the Training Set? -
i have data set rules, , want generate decision tree @ least has 100% accuracy @ classifying rules, can never 100%. set minnumobjs 1 , made unpruned 84% correctly classified instances.
my attributes are:
@attribute users numeric @attribute bandwidth numeric @attribute latency numeric @attribute mode {c,h,dcf,mp,dc,ind}
ex data:
2,200000,0,c 2,200000,1000,c 2,200000,2000,mp 2,200000,5000,c 2,400000,0,c 2,400000,1000,dcf
can me understand why can never 100% of instances classified , how can 100% of them classified (while still allowing attributes numeric)
thanks
it impossible 100% accuracy due identical feature vectors having different labels. guessing in case users
, bandwidth
, , latency
features, while mode
label trying predict. if so, there may identical values of {users
, bandwidth
, latency
} happen have different mode
labels.
in general, having different labels same features may occur through 1 of several ways:
- there noise in data due bad reading of data.
- there source of randomness not captured.
- there more possible features can distinguish between different labels, features not in data set.
one thing can run training set through decision tree , find items misclassified. try determine why wrong , see if data instances exhibit wrote above (namely there data instances same features different labels).
Comments
Post a Comment