question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

C\C++ header tags in html gets removed

See original GitHub issue

Package’s name string-strip-html

Describe the bug Not sure if this is by design or a bug, but C\C++ #include <header.h> code in HTML gets removed.

To Reproduce

Run:

import { stripHtml } from 'string-strip-html';

console.log(stripHtml('<code>#include <stdio.h>;<code>').result);
console.log(stripHtml('<code>#include &lt;stdio.h&gt;</code>').result);

Output:

#include;
#include

Expected behavior Non-Html tags shouldn’t be removed, so the output should be:

#include <stdio.h>;
#include <stdio.h>;

Live Demo https://stackblitz.com/edit/node-jmbga3?file=index.js

Additional context Faced that problem when tried to parse questions content from StackOverflow.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
OfirD1commented, Sep 23, 2021

The option opts.stripRecognisedHTMLOnly is good.

In addition, consider also a case where one has a <code> block with nested HTML tags. There are use-cases where it is desired to keep element nested under the <code> blocked completely unremoved.
One such use-case is scraping using API to get content from sites like StackOverflow (similar to my use-case above), where questions contain code blocks that should be kept intact.

So maybe a more sophisticated option opts.ignoreUnderQuerySelector should be provided, where the user supplies a query selector (e.g., code, .code, or #code) and then the engine ignores nested content under matched elements.

0reactions
reveltcommented, Nov 22, 2021

released in v.9.1.0:

// Ignores code tags and their contents

import { strict as assert } from "assert";
import { stripHtml } from "string-strip-html";

const someHtml = `<code>#include <stdio.h>;</code> and <code>#include &lt;stdio.h&gt;</code>`;

// default behaviour:
assert.equal(stripHtml(someHtml).result, `#include; and #include`);

// ignore <code> tag pairs
assert.equal(
  stripHtml(someHtml, {
    ignoreTagsWithTheirContents: ["code"],
    skipHtmlDecoding: true,
  }).result,
  someHtml
);
Read more comments on GitHub >

github_iconTop Results From Across the Web

html - Removing newline after <h1> tags? - Stack Overflow
the new line is caused because they are block-level elements, which mean they take up all the horizontal space where they appear (by...
Read more >
Mail flow rule actions in Exchange Online | Microsoft Learn
Action in the EAC Action parameter in PowerShell Property Prepend the subject of the message with PrependSubject String Notify the recipient with a message GenerateNotification...
Read more >
Dreamweaver CC using div, header, section, aside and footer ...
In this dreamweaver tutorial I will cover how to create a website layout using div, header, section, aside and footer tags.
Read more >
mail - Manual - PHP
String or array to be inserted at the end of the email header. This is typically used to add extra headers (From, Cc,...
Read more >
Lesson 1: Understanding ID and Class in CSS
What is id? In HTML, every element on your web page can be assigned a unique id attribute. This can be any text...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found