Tags are case-sensitive, and it's very hard to store and query them case-insensitively
See original GitHub issueTags are provided by django-taggit. It has a setting, TAGGIT_CASE_INSENSITIVE
, which is supposed to make storing and retrieving tags ignore its case. This setting has multiple issues:
- It doesn’t do anything about tags currently in the database.
- Setting this option to True while you have tags like
economics
andEconomics
won’t make the uppercase version go away. When you try to save a page that has either version ofeconomics
you’ll get hit with aMultipleObjectsReturned
error. - It’s also very hard to query all the objects that reference a tag and make them all point to lowercase versions… so while a management command is possible, I couldn’t figure out how to write one.
- Setting this option to True while you have tags like
- It takes the first version of the given tag.
- The case insensitive flag does not coerce all tags into lowercase. Instead, it compares all new tags by a case-insensitive match against the tag name.
- If you store
eCoNomiCs
the first time, every time you try to writeeconomics
it will be coerced into the horrible, disfigured version. Content editors can’t change this in Wagtail once it’s been done. This would be less bad if all tags were at least rendered in lowercase on the page editor.
This effectively means that tags can not be used in the way that tags are usually intended to be used. From my standpoint, django-taggit is a completely broken library. These issues have been raised multiple times upstream, but with no fix.
To demonstrate why this is so bad: imagine you’re running an online shop with tens of thousands of t-shirts. You have the tags black
and Black
which each matching 50% of black t-shirts. You cannot simply have a filter called “black” which shows all the black shirts. You have to run some post-processing after querying the tags to try to “merge” the tags. Yet, this is what tags are typically used for. I cannot even think of a case where case-sensitive tags are relevant (maybe for some strange reason “guy” ie a human male, is distinct from “Guy” ie the given name - this seems much more like the edge case than the rule).
I know this was a lot of information, and it may be hard to see my point unless you’ve experienced exactly what I’m talking about, but there is seriously no straightforward way to create a sidebar filter like this where economics
and Economics
are viewed as the same tag:
The closest I’ve gotten is by writing some code like this:
discipline_tags = Tag.objects.filter(
course_materials_disciplinetag_items__isnull=False
).annotate(
num_results=Count('course_materials_disciplinetag_items'),
lower_name=Lower('name')
).order_by('-num_results', lower_name).distinct('lower_name')
But this code actually errors out, because Django doesn’t support using annotate
and then using the result in distinct
. Also, this would only solve half my problem: it would filter out the duplicate tags, but it wouldn’t show the correct results count, and I couldn’t use it to get the combined results for rendering the filtered page.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:6 (4 by maintainers)
Opened issue #4798 for @harrislapiroff to add wagtail-autocomplete to core as discussed in the core team meeting yesterday. Whether or not that also closes this issue is a decision I will leave to someone else.
If we consider the route of replacement, it could be worth considering replacing tags with m2ms and wagtail-autocomplete which would require some work for integration into core but has autocompletion, on-the-fly creation, and the flexibility of being able to make your own tag model
cc @harrislapiroff @emilyhorsman