wrong regex tag matching
See original GitHub issueHello,
I encountered an issue by pasting text from a website The html text was :
<HTML><HEAD><TITLE> LogFile </TITLE>
<SCRIPT id="LogScript" language="JavaScript1.2" src="/somepath"></SCRIPT>
<SCRIPT language="JavaScript1.2">
<!--
[Some script]
//-->
</SCRIPT>
<STYLE type="text/css">
/* reset */
html, body, div, span, applet, object, iframe,
h1, h2, h3, h4, h5, h6, p, blockquote, pre,
a, abbr, acronym, address, big, cite, code,
del, dfn, em, font, img, ins, kbd, q, s, samp,
small, strike, strong, sub, sup, tt, var,
b, u, i, center,
dl, dt, dd, ol, ul, li,
fieldset, form, label, legend,
table, caption, tbody, tfoot, thead, tr, th, td {
margin: 0;
padding: 0;
outline: 0;
font-size: 12px;
/* vertical-align: baseline; */
background: transparent;
}
</STYLE><!-- includes/footer.ihtml --> </HEAD><BODY onload="Loading();"><DIV id="Errors"><TABLE class="Error" id="Ob2197" border="2"><TBODY><TR class="Member" id="Mb$Program"><TD nowrap=""><TABLE class="Error" id="Ob2201" border="2"><TBODY><TR class="Member" id="Mb$Context"><TD class="Data" nowrap=""><!--StartFragment-->-ERR Logon failure: unknown user name or bad password.
<BR><!--EndFragment--></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></DIV></BODY></HTML>
And the regex (with “head” as badTag) didn’t match correctly
It match Head with the “thead” inside the <style> tag. So that the match was :
<Head> [...] thead [...]</STYLE>
instead of <HEAD></HEAD>
I Hope the explanation are clear 😃
I suggest replacing
tS=new RegExp('<'+bT[i]+'.*?'+bT[i]+'(.*?)>','gi');
with
tS=new RegExp('<'+bT[i]+'\\b.*>.*</'+bT[i]+'>','gi');
It permit to match the exact word and the closing tag
Issue Analytics
- State:
- Created 6 years ago
- Comments:10 (8 by maintainers)
Top Results From Across the Web
regex - RegExp find wrong tags - Stack Overflow
I have some urls saved in DB like <a href="some/site/hello.html<br/>">hello world</a> with break tags, so i need to delete them, the problem ...
Read more >Git tag regex does not match error message #19886 - GitHub
From this message, it sounds like <PX4 version> and <custom version> can each have a suffix ( [-rc<rc>|-beta<beta>|-alpha<alpha>|-dev] ), but ...
Read more >Using a Regular Expression to Match HTML - Haacked
One way I've done it is to use a regex that matches html tags and ... Is this possible with a regular expression?...
Read more >HTML regex (regex remove html tags) - UI Bakery
Match all HTML tags. Below is a simple regex to validate the string against HTML tag pattern. This can be later used to...
Read more >Regex Tutorial - Backreferences To Match The Same Text Again
This is to make sure the regex won't match incorrectly paired tags such as <boo>bold</b>. You may think that cannot happen because the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@DiemenDesign done; @LoloDf I wrote your username there, alright?
@DiemenDesign don’t worry, hehe.