question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strings are not conforming to the RSS spec for valid chars.

See original GitHub issue

The RSS spec specifies exactly which characters are considered valid in RSS:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

At present, this library doesn’t handle ensuring that the strings it outputs conform to the spec. This means that the RSS feeds that are generated can easily become broken. We’re having this problem in Ghost, when users copy & paste data from elsewhere - things like form feed and other control characters are completely invisible, but cause the RSS feed to become invalid & unusable.

There is some interesting information around about fixing this sort of problem:

http://stackoverflow.com/questions/397250/unicode-regex-invalid-xml-characters http://stackoverflow.com/questions/2670037/how-to-remove-invalid-utf-8-characters-from-a-javascript-string

And here’s an example regex that I have been trying out for fixing the issue:

/(?![\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD])./g

Here it is in action:

https://regex101.com/r/pQ7aB6/1

I have a branch with this implemented in Ghost, and it seems to work ok: https://github.com/ErisDS/Ghost/commit/7acb3f9df3e7f2cec54eae8173de6a3947bfaaf8

This seems to work well, the only question is whether the regex is a bit too naive / slow / memory intensive for use in a library like node-rss?

I’d be happy to PR a fix to node-rss, but interested to get some feedback on the regex and whether a different approach might be better.

Issue Analytics

  • State:open
  • Created 8 years ago
  • Comments:6

github_iconTop GitHub Comments

4reactions
ErisDScommented, Oct 15, 2015

Would be great to get some feedback on this, and see if we could move it forward.

0reactions
ErisDScommented, May 9, 2017

That’s all well and good - but a unit test is a self-fulfilling prophecy, it only makes sense when you’re certain the concept is correct, which I am not 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

RSS 2.0 specification - The W3C Markup Validation Service
All RSS files must conform to the XML 1.0 specification, as published on ... RSS documents can be tested for validity in the...
Read more >
c# - Format of the initialization string does not conform to ...
I have an ASP.NET application which runs fine on my local development machine. When I run this application online, it shows the following...
Read more >
RSS Best Practices Profile - RSS Advisory Board
An RSS document, also called a feed, must conform to the XML 1.0 ... url elements must be valid URLs, so an IRI...
Read more >
Strings and Characters — The Swift Programming Language ...
To make a multiline string literal that begins or ends with a line feed, write a blank line as the first or last...
Read more >
String Datatypes (XML Schema)
This string datatype is the only predefined datatype for which no whitespace replacement is performed. As we will see in the next chapter,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found