scrapy - content exists, but xpath could not find it, why? -

- August 15, 2010

i using "scrapy shell" test xpath. looked like:

scrapy shell https://item.taobao.com/item.htm?spm=a219e.1191392.1111.1.fglwuh&id=40978681727&scm=1029.newlist-0.1.50002766&ppath=&sku=&ug=#detail

the xpath looked like:

response.xpath("//a[@class='shop-name-link']")

the result none, page content contains

<a class="shop-name-link" href="//shop103857282.taobao.com" target="_blank"      data-goldlog-id="/tbwmdd.1.044">长岛小两口创业</a>

why?

if have problems finding results xpaths use firepath or chrome browser dev tools investigate page source. remember scrapy spider sees page source unrendered. not rendered javascript. view source spider sees use firepath in browser javascript disabled.

i cannot see link class shop-name-link in page linked in question. either you're not giving proper link or element displayed after user action, or page shown in different ways different users in different countries. possible page relies on presence of cookies have dont have.

there nice shortcut:

 scrapy.utils.response import open_in_browser  open_in_browser(response)

this open response spider in browser. use when need check spider sees. in many (if not cases) differs see in browser.

if want share reproducible example of how see page chrome dev tools have useful feature "copy curl" copy request headers , cookies clipboard. if paste in question people able see page see (provided of course there no geolocation restrictions on ips).

Search This Blog

Arrya Code

scrapy - content exists, but xpath could not find it, why? -

Comments

Post a Comment

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -