Stop removing stopwords from auto-generated page slugs
See original GitHub issueIssue Summary
When a new Wagtail page is saved without the user entering a slug, its slug is auto-generated from the page’s title. Wagtail’s slug generation uses a copy of the logic used by Django’s SlugField
(the Wagtail codebase contains a copy of the “bit of JavaScript” mentioned there). This logic includes the removal of certain stopwords if the page title contains only ASCII characters. For example, a page with title “To be or not to be, that is the question” will be given a slug “be-or-not-be-question”, not “to-be-or-not-to-be-that-is-the-question” as one might expect (this can be entered manually if desired).
This is longstanding behavior (since #8) but I’m opening this issue to solicit feedback on whether Wagtail should stop removing these stopwords, motivated in particular by #4881 and related discussion in #4884.
Here is the logic that performs stopword stripping, which includes the list of words that are removed:
This behavior comes from Django’s urlify.js, which is used in the Django admin in conjunction with ModelAdmin.prepopulated_fields
on SlugField
fields, of which Page.slug
is one. There’s no backend code related to removal of these words.
As @jjanssen mentions on #4884, the behavior of having a title like “Before the Sunrise” converted to a slug of just plain “sunrise” feels quite unexpected. Another example would be three pages named “Before Wagtail Space”, “At Wagtail Space”, and “Since Wagtail Space”. All three of these would be given the same auto-generated slug: “wagtail-space”.
Another downside of the existing logic is the unequal treatment given to English over other languages. Only English stopwords are removed, and if a title contains even a single Unicode character, no stopwords are. Worse, stopwords can be removed even though they may be valid words in other languages (see django#12905).
Granted, the current behavior does given users the ability to override the removal of stopwords if desired – by retyping the slug – but this is an extra step they’d have to remember to do. Especially given the complexity introduced by something like #4881, might it be better to default to a slightly simpler behavior?
The most straightforward change would be to remove the lines excerpted above, but keep the rest of the slug generation logic, which would maintain similarity with Django.
One consideration is around deviating from Django’s logic. Because Page.slug
is a SlugField
, it might be best to stay in step with Django slug creation, perhaps proposing any change in functionality there. On the other hand, Wagtail should be first and foremost focused on the needs of users, not on maintaining compatibility with Django admin behavior. Additionally, Wagtail is already using a copy of the logic instead of directly pulling in the Django JS file (and has to work to keep it up-to-date with Django changes, see for example #4897). The fact that Django’s stopword list hasn’t changed since 2005 (!) might be a sign that the Django community is happy with the existing logic – or the use of Django SlugField
s with auto-generation might be an edge case that doesn’t get much attention.
Wagtail could instead consider making this logic customizable, so that users can define their own slugging behavior if it differs from the Wagtail default. This is at least theoretically possible now by including an alternate wagtailadmin/js/vendor/urlify.js
file in a project’s templates, although maybe not as simple or well-contained as one might like; it’d be nicer if you could override just the stopwords piece while using the default unicode character handling, for example. Customization of urlify.js logic in Django seems to have been discussed a fair bit in Django tickets and on the django-developers mailing list, but the decision was made not to implement.)
Other potential questions to consider:
- Do other CMSes remove these words from URLs? Wordpress doesn’t seem to. Django CMS does seem to, referencing Django’s JS file directly.
- Are there modern SEO concerns around stopword removal? Here’s an arbitrary SEO page that seems to think stopwords are okay. Here’s another that advises against – and look at the bottom of that page for a longer list of what they consider stopwords in 2018. The Django list that Wagtail copies is unchanged since 2005 – is it still reasonable and effective to filter out this specific list? Here’s an arbitrary SO answer that comes down somewhere in the middle.
- How frequently in practice do Wagtail users manually edit the auto-generated slug when creating a new page? It’d be interesting to create a script that one could run against a Wagtail DB to compute this. One example would be a post like What’s It Like For A Developer Moving To Wagtail From Drupal? on the Wagtail blog. This currently uses the slug “whats-it-like-for-a-developer-moving-to-wagtail-from-drupal”, instead of what would be the current default “whats-like-developer-moving-wagtail-drupal”.
Steps to Reproduce
- Start a new project with
wagtail start myproject
./manage.py migrate && ./manage.py createsuperuser
./manage.py runserver
, visit http://localhost:8000/admin, and login with the superuser you just created.- Add a new child page under Home and give it a title like “To be or not to be, that is the question”. Note in the Promote tab that the autogenerated slug is “be-or-not-be-question”.
- I have confirmed that this issue can be reproduced as described on a fresh Wagtail project: yes
Technical details
- Python version: 3.6.3
- Django version: 2.1.3
- Wagtail version: 2.3
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:5 (2 by maintainers)
Top GitHub Comments
Just want to throw my +1 here - I support this change. It should at least be an option, so it won’t modify the default behavior others are used to. It’s annoying having to manually modify every slug, but it’s so much cleaner in my opinion if the URL matches the title as closely as possible.
I’ve posted to django-developers to solicit feedback on making this change upstream in Django, which feels like the right place for this to happen.