python lxml xpath get the nodes attributes with specific string pattern -
im learning xpath , trying value of node specific node attribute example(google playstore) using python lxml/html. below code wanted developer email value node "a" attribute "href" starting "mailto:". python code snippet returns app name empty developer email. thank you
<html> <div class="id-app-title" tabindex="0">candy crush saga</div> <div class="meta-info meta-info-wide"> <div class="title"> developer </div> <a class="dev-link" href="https://www.google.com/url?q=http://candycrush.com" rel="nofollow" target="_blank"> visit website </a> <a class="dev-link" href="mailto:candycrush@kingping.com" rel="nofollow" target="_blank">candycrush@kingping.com </a> ##interesting part here </div> </html>
python code (2.7)
def get_app_from_link(self,link): start_page=requests.get(link) #print start_page.text tree = html.fromstring(start_page.text) name = tree.xpath('//div[@class="id-app-title"]/text()')[0] #developer=tree.xpath('//div[@class="dev-link"]//*/div/@href') developer=tree.xpath('//div[contains(@href,"mailto") , @class="dev-link"]/text()') print name,developer return
now using tag div
, not a
:
'//a[contains(@href,"mailto") , @class="dev-link"]/text()'
also, function don't return items. use return
like:
def get_app_from_link(self,link):: # code return name, developer
Comments
Post a Comment