C\C++ header tags in html gets removed
See original GitHub issuePackage’s name
string-strip-html
Describe the bug
Not sure if this is by design or a bug, but C\C++ #include <header.h>
code in HTML gets removed.
To Reproduce
Run:
import { stripHtml } from 'string-strip-html';
console.log(stripHtml('<code>#include <stdio.h>;<code>').result);
console.log(stripHtml('<code>#include <stdio.h></code>').result);
Output:
#include;
#include
Expected behavior Non-Html tags shouldn’t be removed, so the output should be:
#include <stdio.h>;
#include <stdio.h>;
Live Demo https://stackblitz.com/edit/node-jmbga3?file=index.js
Additional context Faced that problem when tried to parse questions content from StackOverflow.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
html - Removing newline after <h1> tags? - Stack Overflow
the new line is caused because they are block-level elements, which mean they take up all the horizontal space where they appear (by...
Read more >Mail flow rule actions in Exchange Online | Microsoft Learn
Action in the EAC Action parameter in PowerShell Property
Prepend the subject of the message with PrependSubject String
Notify the recipient with a message GenerateNotification...
Read more >Dreamweaver CC using div, header, section, aside and footer ...
In this dreamweaver tutorial I will cover how to create a website layout using div, header, section, aside and footer tags.
Read more >mail - Manual - PHP
String or array to be inserted at the end of the email header. This is typically used to add extra headers (From, Cc,...
Read more >Lesson 1: Understanding ID and Class in CSS
What is id? In HTML, every element on your web page can be assigned a unique id attribute. This can be any text...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The option
opts.stripRecognisedHTMLOnly
is good.In addition, consider also a case where one has a
<code>
block with nested HTML tags. There are use-cases where it is desired to keep element nested under the<code>
blocked completely unremoved.One such use-case is
scrapingusing API to get content from sites like StackOverflow (similar to my use-case above), where questions contain code blocks that should be kept intact.So maybe a more sophisticated option
opts.ignoreUnderQuerySelector
should be provided, where the user supplies a query selector (e.g.,code
,.code
, or#code
) and then the engine ignores nested content under matched elements.released in v.9.1.0: