python - Doc2vec : TaggedLineDocument() -
so,i'm trying learn , understand doc2vec. i'm following tutorial. input list of documents i.e list of lists of words. code looks like:
input = [["word1","word2",..."wordn"],["word1","word2",..."wordn"],...] documents = taggedlinedocument(input) model = doc2vec.doc2vec(documents,size = 50, window = 10, min_count = 2, workers=2)
but getting unicode error(tried googling error, no ):
typeerror('don\'t know how handle uri %s' % repr(uri))
can please me understand going wrong ? thank !
taggedlinedocument should instantiated file path. make sure file setup in format 1 document equals 1 line.
documents = taggedlinedocument('myfile.txt') documents = taggedlinedocument('compressed_text.txt.gz')
from source code:
the uri
(the think instantiating taggedlinedocument with) can either:
1. uri local filesystem (compressed ``.gz`` or ``.bz2`` files handled automatically): `./lines.txt`, `/home/joe/lines.txt.gz`, `file:///home/joe/lines.txt.bz2` 2. uri hdfs: `hdfs:///some/path/lines.txt` 3. uri amazon's s3 (can supply credentials inside uri): `s3://my_bucket/lines.txt`, `s3://my_aws_key_id:key_secret@my_bucket/lines.txt` 4. instance of boto.s3.key.key class.
Comments
Post a Comment