Unable to remove attributes from element whose name has accents from different culture
See original GitHub issueThe following code snippet will result in an infinite loop
HtmlParser parser = new HtmlParser();
var document = parser.Parse("<TAG MÍ=\"\" />");
var element = (IElement)document.Body.FirstChild;
while (element.Attributes.Length > 0)
{
element.RemoveAttribute(element.Attributes[0].Name);
}
I didn’t have time to investigate deeply, but I noticed in Element.cs there’s two calls to ToLower
and two to ToLowerInvariant
. I don’t know any Spanish, but I think the Í character is capital, so my guess is that some parts of the code is converting it to lowercase but some parts are not. What particularly surprises me is that no exception is being thrown, as I see what appears to be some error checking in NamedNodeMap.
Finally, thanks for a really great library. It’s been really helpful.
Issue Analytics
- State:
- Created 7 years ago
- Comments:17 (10 by maintainers)
Top Results From Across the Web
Unable to remove style attribute in chrome - javascript
The problem is a Webkit with element.removeAttribute('style') which doesn't work consistently. What jQuery does now (1.6.4) which was ...
Read more >'Remove attributes' not working
Solved: I have a video with several different clips which I now can't remove the attributes. I click remove attributes, I click effects,...
Read more >Use xmlstarlet to remove an entire element that matches ...
Within a bash script, I need to remove an entire <folder> element when the id attribute matches a given value. I'm actually using...
Read more >TIP 59: Improving Cultural Competence
individual's cultural identity, and that identity is not a static attribute. There are many forces at work that pressure a person to alter...
Read more >ThemeData's accent properties have been deprecated
The ThemeData accentColor, accentColorBrightness, accentIconTheme, and accentTextTheme properties have been deprecated.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I now know the source of the error. It is tricky …
I tried running this in the browser
and apparently the result is
mÍ
/1
/1
with Chromium (e.g., Chrome, Opera). However, IE and Firefox seem to handle this well (we get … /0
).HTML defines what is considered uppercase for the tokenizer (on a char-by-char basis), but does not really refer to this range for the DOM model. It is now clear, however, that the same rules need to be applied (should have been obvious, but I seemed to miss that connection).
In my naive way of thinking I used the
ToLower
(orToLowerInvariant
) methods from .NET here, but I need to provide my own extension method following the W3C specification for the definition of “what is an upper char”.I also filed a bug for Chromium (see https://bugs.chromium.org/p/chromium/issues/detail?id=651946). I don’t know if this will be fixed there.
Yes, method calls are fast, but C# is an indirect language. Hence you’ll pay for every reference by a potential cache miss. Of course, the CLR does a pretty good job on hiding / minimizing this, but in hot paths (and the attribute one is certainly such a hot path, e.g., for
QuerySelector
) this may add up. I am not sure and the architectural elegance may in the end just outweigh the performance savings (for most code this is certainly the case).That’s why I always run the performance benchmarks before/after such a change (i.e., I am not sure about the impact; all I did was explaining my fears / another arguments against that change - which would indeed by more elegant).
Long story short: Thanks for your input; even though it potentially is not applicable in the described scenario it is certainly helpful in others.