javascript - Scraping dynamic content through Selenium? -
i'm trying scrap dynamic content blog through selenium returns un rendered javascript.
to test behavior tried wait till iframe loads , printed it's content prints fine again when move parent frame displays un rendered javascript.
i'm looking in i'm able print rendered html content
from selenium import webdriver selenium.webdriver.support.ui import webdriverwait selenium.webdriver.common.by import selenium.webdriver.support import expected_conditions driver = webdriver.chrome("path chrome driver") driver.get('http://justgivemechocolateandnobodygetshurt.blogspot.com/') webdriverwait(driver, 40).until(expected_conditions.frame_to_be_available_and_switch_to_it((by.id, "navbar-iframe"))) # rendered iframe html printed. content = driver.page_source print content.encode("utf-8") # when switch parent frame again prints non rendered javascript. driver.switch_to.parent_frame() content = driver.page_source print content.encode("utf-8")
the problem - the .page_source
works in current context. there "current top-level browsing context" notation..meaning, if call on default content - not inner html of child iframe
elements - have switch context of frame
, call .page_source
.
in other words, complete html of page including page source of iframes, have switch iframe contexts 1 one , sources separately.
see also:
old answer:
i wait @ least 1 blog entry loaded before getting page_source
:
from selenium.webdriver.common.by import selenium.webdriver.support.ui import webdriverwait selenium.webdriver.support import expected_conditions ec wait = webdriverwait(driver, 40) wait.until(ec.visibility_of_element_located((by.css_selector, ".entry-content"))) print(driver.page_source)
Comments
Post a Comment