python - how can I divide the frequency of bigram pair by unigram word? -


below code.

from __future__ import division import nltk import re  f = open('c:/python27/brown_a1_half.txt', 'ru') w = open('c:/python27/brown_a1_half_out.txt', 'w')  #to read whole file using read()  filecontents = f.read() nltk.tokenize import sent_tokenize sent_tokenize_list = sent_tokenize(filecontents)  sentence in sent_tokenize_list:     sentence = "start " + sentence + " end"     tokens = sentence.split()     bigrams = (tuple(nltk.bigrams(tokens)))     bigrams_frequency = nltk.freqdist(bigrams)     k,v in bigrams_frequency.items():         print k, v  

then printing result "(bigrams), frequency ". here, want each bigram pair, divide bigram frequency first appearing unigram word frequency. (for example, if there bigram ('red', 'apple') , frequency "3", want divide frequency of 'red'). obtaining mle prob, "mle prob = counting of (w1, w2) / counting of (w1)" . me plz...

you can add following in loop (after print k, v):

number_unigrams = tokens.count(k[0]) prob = v / number_unigrams 

that should give mle prob each bigram.


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -