arrays - python, scikit-learn - Weird behaviour using LabelShuffleSplit -


following scikit-learn documentation labelshufflesplit, wish randomise train/validation batches ensure i'm training on possible data (e.g. ensemble).

according doc, should see (indeed, notice train/validation sets are evenly split via test_size=0.5):

>>> sklearn.cross_validation import labelshufflesplit  >>> labels = [1, 1, 2, 2, 3, 3, 4, 4] >>> slo = labelshufflesplit(labels, n_iter=4, test_size=0.5, random_state=0) >>> train, test in slo: >>>     print("%s %s" % (train, test)) ... [0 1 2 3] [4 5 6 7] [2 3 6 7] [0 1 4 5] [2 3 4 5] [0 1 6 7] [4 5 6 7] [0 1 2 3] 

but tried using labels = [0, 0, 0, 0, 0, 0, 0, 0] returned:

...  [] [0 1 2 3 4 5 6 7] [] [0 1 2 3 4 5 6 7] [] [0 1 2 3 4 5 6 7] [] [0 1 2 3 4 5 6 7] 

(i.e not evenly split - data has been put validation set?) understand in case doesn't matter indices put train/validation sets, hoping still 50%:50% split???


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -