ruby - Searching webpage with regex -
i'd search through webpage sentences including 'small business'
, , same every link on page, 3 or 4 layers deep.
my attempt this:
def get_sentences sentences = [] doc = nokogiri::html(open("http://www.brampton.ca/en/business/pages/top-links.aspx")) @sentences = doc.search(/[^.]*small business[^.]*\./i) links = doc.search('a[href]').select{ |n| n['href'][/\.html$/] }.map{ |n| n['href'] }) doc1 = links.each { |x| nokogiri::html(open(x)) } @sentences << doc1.search(/[^.]*small business[^.]*\./ig) links1 = links.each { |x| x.search('a[href]').select{ |n| n['href'][/\.html$/] }.map{ |n| n['href'] } doc2 = links1.each { |x| nokogiri::html(open(x)) } @sentences << doc2.search(/[^.]*small business[^.]*\./ig) links2 = links1.each { |x| x.search('a[href]').select{ |n| n['href'][/\.html$/] }.map{ |n| n['href'] } doc3 = links2.each { |x| nokogiri::html(open(x)) } @sentences << doc3.search(/[^.]*small business[^.]*\./ig) end edit, narrowed down lol @sentences = [] doc = nokogiri::html(open("https://en.wikipedia.org/wiki/small_business")) regex = /[^.]*small business[^.]*\./i = doc.traverse { |x| if x.text =~ regex @sentences << x end
but i'm out of league after month.
..........worked!
Comments
Post a Comment