javascript - Scraping dynamic content through Selenium? -


i'm trying scrap dynamic content blog through selenium returns un rendered javascript.

to test behavior tried wait till iframe loads , printed it's content prints fine again when move parent frame displays un rendered javascript.

i'm looking in i'm able print rendered html content

from selenium import webdriver selenium.webdriver.support.ui import webdriverwait selenium.webdriver.common.by import selenium.webdriver.support import expected_conditions  driver = webdriver.chrome("path chrome driver")    driver.get('http://justgivemechocolateandnobodygetshurt.blogspot.com/')  webdriverwait(driver, 40).until(expected_conditions.frame_to_be_available_and_switch_to_it((by.id, "navbar-iframe")))  # rendered iframe html printed. content = driver.page_source print content.encode("utf-8")  # when switch parent frame again prints non rendered javascript. driver.switch_to.parent_frame() content = driver.page_source print content.encode("utf-8") 

the problem - the .page_source works in current context. there "current top-level browsing context" notation..meaning, if call on default content - not inner html of child iframeelements - have switch context of frame , call .page_source.

in other words, complete html of page including page source of iframes, have switch iframe contexts 1 one , sources separately.

see also:


old answer:

i wait @ least 1 blog entry loaded before getting page_source:

from selenium.webdriver.common.by import selenium.webdriver.support.ui import webdriverwait selenium.webdriver.support import expected_conditions ec  wait = webdriverwait(driver, 40) wait.until(ec.visibility_of_element_located((by.css_selector, ".entry-content")))  print(driver.page_source) 

Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -