ruby - Regex to remove p tags within li tags and td tags -
i have html content:
<p>this paragraph:</p> <ul> <li> <p>point 1</p> </li> <li> <p>point 2</p> <ul> <li> <p>point 3</p> </li> <li> <p>point 4</p> </li> </ul> </li> <li> <p>point 5</p> </li> </ul> <ul> <li> <p><strong>sub-head : </strong>this para followed heading, para followed heading, para followed heading, para followed heading</p> </li> <li> <p><strong>sub-head 2: </strong></p> <p>this para followed heading, para followed heading, para followed heading, para followed heading</p> </li> </ul>
i want remove <p>&</p> tags between <li>&</li> irrespective of position between <li>&</li>. need remove p tags between td tags inside table.
this controller code far:
nogo={"<li>\n<p>" =>'<li>', "</p>\n</li>" => '</li>', "<td>\n<p>" => '<td>', "</p>\n</td>" => '</td>', '<p> </p>' => '','<ul>' => "\n<ul>",'</ul>' => "</ul>\n", '</ol>' => "</ol>\n" , '<table>' => "\n<table width='100%' border='0' cellspacing='0' cellpadding='0' class='table table-curved'>", '<' => '<', '>'=>'>','<br>' => '','<p></p>' => '', ' rel="nofollow"' => '' c=params[:content] bundle_out=sanitize.fragment(c,sanitize::config.merge(sanitize::config::basic, :elements=> sanitize::config::basic[:elements]+['table', 'tbody', 'tr', 'td', 'h1', 'h2', 'h3'], :attributes=>{'a' => ['href']}) )#.split(" ").join(" ") re = regexp.new(nogo.keys.map { |x| regexp.escape(x) }.join('|')) @bundle_out=bundle_out.gsub(re, nogo)
im passing above html content code through params[:content] ive assigned variable c.
following o/p not expected. close p tags , open p tags still between li , close li tags
<p>this paragraph:</p> <ul> <li>point 1</li> <li>point 2</p> <ul> <li>point 3</li> <li>point 4</li> </ul> </li> <li>point 5</li> </ul> <ul> <li><strong>sub-head : </strong>this para followed heading, para followed heading, para followed heading, para followed heading</li> <li><strong>sub-head 2: </strong></p> <p>this para followed heading, para followed heading, para followed heading, para followed heading</li> </ul>
my aim simple want remove p tags inside li , td tags, im not able correctly. appreciated.
i use regex this. , know using regex not correct way parse html content.
i won't recommend using regex because they're dead-end unless html trivial , create it. and, if 1 creating it, modifying after generating wrong way go generating content.
use parser. nokogiri de-facto standard ruby, and, knowledge of css or xpath, can learn search, or modify, html , xml:
require 'nokogiri' doc = nokogiri::html(<<eot) <html> <body> <ul> <li> <p>foo</p> </li> <li> <span> <p>bar</p> </span> </li> </ul> </body> </html> eot doc.search('li p').each |p_tag| p_tag.remove end puts doc.to_html
running results in:
<!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd"> <html> <body> <ul> <li> </li> <li> <span> </span> </li> </ul> </body> </html>
the tutorials on nokogiri site starting point. stack overflow resource there many different easily-searchable questions aspects of using gem.
Comments
Post a Comment