ruby - Regex to remove p tags within li tags and td tags -


i have html content:

<p>this paragraph:</p> <ul> <li> <p>point 1</p> </li> <li> <p>point 2</p> <ul> <li> <p>point 3</p> </li> <li> <p>point 4</p> </li> </ul> </li> <li> <p>point 5</p> </li> </ul> <ul> <li> <p><strong>sub-head : </strong>this para followed heading, para followed heading, para followed heading, para followed heading</p> </li> <li> <p><strong>sub-head 2: </strong></p> <p>this para followed heading, para followed heading, para followed heading, para followed heading</p> </li> </ul> 

i want remove <p>&</p> tags between <li>&</li> irrespective of position between <li>&</li>. need remove p tags between td tags inside table.

this controller code far:

nogo={"<li>\n<p>" =>'<li>', "</p>\n</li>" => '</li>', "<td>\n<p>" => '<td>', "</p>\n</td>" => '</td>',    '<p> </p>' => '','<ul>' => "\n<ul>",'</ul>' => "</ul>\n", '</ol>' => "</ol>\n"   ,    '<table>' => "\n<table width='100%' border='0' cellspacing='0' cellpadding='0' class='table table-curved'>",   '&lt;' => '<', '&gt;'=>'>','<br>' => '','<p></p>' => '', ' rel="nofollow"' => ''  c=params[:content]        bundle_out=sanitize.fragment(c,sanitize::config.merge(sanitize::config::basic,        :elements=> sanitize::config::basic[:elements]+['table', 'tbody', 'tr', 'td', 'h1', 'h2', 'h3'],        :attributes=>{'a' => ['href']}) )#.split(" ").join(" ")        re = regexp.new(nogo.keys.map { |x| regexp.escape(x) }.join('|'))        @bundle_out=bundle_out.gsub(re, nogo) 

im passing above html content code through params[:content] ive assigned variable c.

following o/p not expected. close p tags , open p tags still between li , close li tags

<p>this paragraph:</p>  <ul> <li>point 1</li> <li>point 2</p> <ul> <li>point 3</li> <li>point 4</li> </ul> </li> <li>point 5</li> </ul>  <ul> <li><strong>sub-head : </strong>this para followed heading, para followed heading, para followed heading, para followed heading</li> <li><strong>sub-head 2: </strong></p> <p>this para followed heading, para followed heading, para followed heading, para followed heading</li> </ul> 

my aim simple want remove p tags inside li , td tags, im not able correctly. appreciated.

i use regex this. , know using regex not correct way parse html content.

i won't recommend using regex because they're dead-end unless html trivial , create it. and, if 1 creating it, modifying after generating wrong way go generating content.

use parser. nokogiri de-facto standard ruby, and, knowledge of css or xpath, can learn search, or modify, html , xml:

require 'nokogiri'  doc = nokogiri::html(<<eot) <html>   <body>     <ul>       <li>         <p>foo</p>       </li>       <li>         <span>           <p>bar</p>         </span>       </li>     </ul>   </body> </html> eot  doc.search('li p').each |p_tag|   p_tag.remove end  puts doc.to_html 

running results in:

<!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd"> <html>   <body>     <ul>       <li>        </li>       <li>         <span>          </span>       </li>     </ul>   </body> </html> 

the tutorials on nokogiri site starting point. stack overflow resource there many different easily-searchable questions aspects of using gem.


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -