python - Doc2vec : TaggedLineDocument() -


so,i'm trying learn , understand doc2vec. i'm following tutorial. input list of documents i.e list of lists of words. code looks like:

    input = [["word1","word2",..."wordn"],["word1","word2",..."wordn"],...]       documents = taggedlinedocument(input)      model = doc2vec.doc2vec(documents,size = 50, window = 10, min_count = 2, workers=2)  

but getting unicode error(tried googling error, no ):

   typeerror('don\'t know how handle uri %s' % repr(uri)) 

can please me understand going wrong ? thank !

taggedlinedocument should instantiated file path. make sure file setup in format 1 document equals 1 line.

documents = taggedlinedocument('myfile.txt') documents = taggedlinedocument('compressed_text.txt.gz') 

from source code:

the uri (the think instantiating taggedlinedocument with) can either:

1. uri local filesystem (compressed ``.gz`` or ``.bz2`` files handled automatically):    `./lines.txt`, `/home/joe/lines.txt.gz`, `file:///home/joe/lines.txt.bz2` 2. uri hdfs: `hdfs:///some/path/lines.txt` 3. uri amazon's s3 (can supply credentials inside uri):    `s3://my_bucket/lines.txt`, `s3://my_aws_key_id:key_secret@my_bucket/lines.txt` 4. instance of boto.s3.key.key class. 

Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -