scrapy - content exists, but xpath could not find it, why? -


i using "scrapy shell" test xpath. looked like:

scrapy shell https://item.taobao.com/item.htm?spm=a219e.1191392.1111.1.fglwuh&id=40978681727&scm=1029.newlist-0.1.50002766&ppath=&sku=&ug=#detail 

the xpath looked like:

response.xpath("//a[@class='shop-name-link']")  

the result none, page content contains

<a class="shop-name-link" href="//shop103857282.taobao.com" target="_blank"      data-goldlog-id="/tbwmdd.1.044">长岛小两口创业</a> 

why?

if have problems finding results xpaths use firepath or chrome browser dev tools investigate page source. remember scrapy spider sees page source unrendered. not rendered javascript. view source spider sees use firepath in browser javascript disabled.

i cannot see link class shop-name-link in page linked in question. either you're not giving proper link or element displayed after user action, or page shown in different ways different users in different countries. possible page relies on presence of cookies have dont have.

there nice shortcut:

 scrapy.utils.response import open_in_browser  open_in_browser(response)  

this open response spider in browser. use when need check spider sees. in many (if not cases) differs see in browser.

if want share reproducible example of how see page chrome dev tools have useful feature "copy curl" copy request headers , cookies clipboard. if paste in question people able see page see (provided of course there no geolocation restrictions on ips).


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -