python - Searching unique web links -

- April 15, 2015

i wrote program extract web links http://www.stevens.edu/. facing following problems program.

1- want links starting http , https

2 - getting parser warning bs4 concerning lack of specification on parser - solved

how can fix problems? not getting proper direction solve problem.

my code -

import urllib2  bs4 import beautifulsoup bs url = raw_input('please enter url want see unique web links -')  print "\n"  urls (mostly http) in complex world req = urllib2.request(url, headers={'user-agent': 'mozilla/5.0'})   html = urllib2.urlopen(req).read() soup = bs(html) tags = soup('a') count = 0 web_link = [] tag in tags:     count = count + 1     store = tag.get('href', none)     web_link.append(store)  print "total no. of extracted web links are",count,"\n"  print web_link  print "\n"  unique_list = set(web_link)  unique_list = list(unique_list)   print "no. of unique web links after using set method", len(unique_list),"\n"

for second problem, need specify parser while creating bs of page.
soup = bs(html,"html.parser")

this should remove warning.

Search This Blog

Arrya Code

python - Searching unique web links -

Comments

Post a Comment

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -