machine learning - Generating a Decision Tree that Perfectly Models the Training Set? -


i have data set rules, , want generate decision tree @ least has 100% accuracy @ classifying rules, can never 100%. set minnumobjs 1 , made unpruned 84% correctly classified instances.

my attributes are:

@attribute users numeric @attribute bandwidth numeric @attribute latency numeric @attribute mode {c,h,dcf,mp,dc,ind} 

ex data:

2,200000,0,c 2,200000,1000,c 2,200000,2000,mp 2,200000,5000,c 2,400000,0,c 2,400000,1000,dcf 

can me understand why can never 100% of instances classified , how can 100% of them classified (while still allowing attributes numeric)

thanks

it impossible 100% accuracy due identical feature vectors having different labels. guessing in case users, bandwidth, , latency features, while mode label trying predict. if so, there may identical values of {users, bandwidth, latency} happen have different mode labels.

in general, having different labels same features may occur through 1 of several ways:

  1. there noise in data due bad reading of data.
  2. there source of randomness not captured.
  3. there more possible features can distinguish between different labels, features not in data set.

one thing can run training set through decision tree , find items misclassified. try determine why wrong , see if data instances exhibit wrote above (namely there data instances same features different labels).


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -