question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to remove attributes from element whose name has accents from different culture

See original GitHub issue

The following code snippet will result in an infinite loop

HtmlParser parser = new HtmlParser();
var document = parser.Parse("<TAG MÍ=\"\" />");
var element = (IElement)document.Body.FirstChild;
while (element.Attributes.Length > 0)
{
    element.RemoveAttribute(element.Attributes[0].Name);
}

I didn’t have time to investigate deeply, but I noticed in Element.cs there’s two calls to ToLower and two to ToLowerInvariant. I don’t know any Spanish, but I think the Í character is capital, so my guess is that some parts of the code is converting it to lowercase but some parts are not. What particularly surprises me is that no exception is being thrown, as I see what appears to be some error checking in NamedNodeMap.

Finally, thanks for a really great library. It’s been really helpful.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
FlorianRapplcommented, Sep 30, 2016

I now know the source of the error. It is tricky …

I tried running this in the browser

<!DOCTYPE html>
<html>

<body>
<TAG MÍ="" />
<script>
var element = document.body.firstElementChild;
var name = element.attributes[0].name;
alert(name);
alert(element.attributes.length);
element.removeAttribute(name);
alert(element.attributes.length);
</script>
</body>

</html>

and apparently the result is / 1 / 1 with Chromium (e.g., Chrome, Opera). However, IE and Firefox seem to handle this well (we get … / 0).

HTML defines what is considered uppercase for the tokenizer (on a char-by-char basis), but does not really refer to this range for the DOM model. It is now clear, however, that the same rules need to be applied (should have been obvious, but I seemed to miss that connection).

In my naive way of thinking I used the ToLower (or ToLowerInvariant) methods from .NET here, but I need to provide my own extension method following the W3C specification for the definition of “what is an upper char”.

I also filed a bug for Chromium (see https://bugs.chromium.org/p/chromium/issues/detail?id=651946). I don’t know if this will be fixed there.

1reaction
FlorianRapplcommented, Sep 30, 2016

Yes, method calls are fast, but C# is an indirect language. Hence you’ll pay for every reference by a potential cache miss. Of course, the CLR does a pretty good job on hiding / minimizing this, but in hot paths (and the attribute one is certainly such a hot path, e.g., for QuerySelector) this may add up. I am not sure and the architectural elegance may in the end just outweigh the performance savings (for most code this is certainly the case).

That’s why I always run the performance benchmarks before/after such a change (i.e., I am not sure about the impact; all I did was explaining my fears / another arguments against that change - which would indeed by more elegant).

Long story short: Thanks for your input; even though it potentially is not applicable in the described scenario it is certainly helpful in others.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to remove style attribute in chrome - javascript
The problem is a Webkit with element.removeAttribute('style') which doesn't work consistently. What jQuery does now (1.6.4) which was ...
Read more >
'Remove attributes' not working
Solved: I have a video with several different clips which I now can't remove the attributes. I click remove attributes, I click effects,...
Read more >
Use xmlstarlet to remove an entire element that matches ...
Within a bash script, I need to remove an entire <folder> element when the id attribute matches a given value. I'm actually using...
Read more >
TIP 59: Improving Cultural Competence
individual's cultural identity, and that identity is not a static attribute. There are many forces at work that pressure a person to alter...
Read more >
ThemeData's accent properties have been deprecated
The ThemeData accentColor, accentColorBrightness, accentIconTheme, and accentTextTheme properties have been deprecated.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found