Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strings are not conforming to the RSS spec for valid chars.

See original GitHub issue

The RSS spec specifies exactly which characters are considered valid in RSS:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

At present, this library doesn’t handle ensuring that the strings it outputs conform to the spec. This means that the RSS feeds that are generated can easily become broken. We’re having this problem in Ghost, when users copy & paste data from elsewhere - things like form feed and other control characters are completely invisible, but cause the RSS feed to become invalid & unusable.

There is some interesting information around about fixing this sort of problem:

http://stackoverflow.com/questions/397250/unicode-regex-invalid-xml-characters http://stackoverflow.com/questions/2670037/how-to-remove-invalid-utf-8-characters-from-a-javascript-string

And here’s an example regex that I have been trying out for fixing the issue:

/(?![\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD])./g

Here it is in action:

https://regex101.com/r/pQ7aB6/1

I have a branch with this implemented in Ghost, and it seems to work ok: https://github.com/ErisDS/Ghost/commit/7acb3f9df3e7f2cec54eae8173de6a3947bfaaf8

This seems to work well, the only question is whether the regex is a bit too naive / slow / memory intensive for use in a library like node-rss?

I’d be happy to PR a fix to node-rss, but interested to get some feedback on the regex and whether a different approach might be better.

Issue Analytics

State:
Created 8 years ago
Comments:6

Top GitHub Comments

4reactions

ErisDScommented, Oct 15, 2015

Would be great to get some feedback on this, and see if we could move it forward.

0reactions

ErisDScommented, May 9, 2017

That’s all well and good - but a unit test is a self-fulfilling prophecy, it only makes sense when you’re certain the concept is correct, which I am not 😉