question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Stop removing stopwords from auto-generated page slugs

See original GitHub issue

Issue Summary

When a new Wagtail page is saved without the user entering a slug, its slug is auto-generated from the page’s title. Wagtail’s slug generation uses a copy of the logic used by Django’s SlugField (the Wagtail codebase contains a copy of the “bit of JavaScript” mentioned there). This logic includes the removal of certain stopwords if the page title contains only ASCII characters. For example, a page with title “To be or not to be, that is the question” will be given a slug “be-or-not-be-question”, not “to-be-or-not-to-be-that-is-the-question” as one might expect (this can be entered manually if desired).

This is longstanding behavior (since #8) but I’m opening this issue to solicit feedback on whether Wagtail should stop removing these stopwords, motivated in particular by #4881 and related discussion in #4884.


Here is the logic that performs stopword stripping, which includes the list of words that are removed:

https://github.com/wagtail/wagtail/blob/882f8f3cf8ddd79c30e611a48882b309e90dad0c/wagtail/admin/static_src/wagtailadmin/js/vendor/urlify.js#L158-L170

This behavior comes from Django’s urlify.js, which is used in the Django admin in conjunction with ModelAdmin.prepopulated_fields on SlugField fields, of which Page.slug is one. There’s no backend code related to removal of these words.

As @jjanssen mentions on #4884, the behavior of having a title like “Before the Sunrise” converted to a slug of just plain “sunrise” feels quite unexpected. Another example would be three pages named “Before Wagtail Space”, “At Wagtail Space”, and “Since Wagtail Space”. All three of these would be given the same auto-generated slug: “wagtail-space”.

Another downside of the existing logic is the unequal treatment given to English over other languages. Only English stopwords are removed, and if a title contains even a single Unicode character, no stopwords are. Worse, stopwords can be removed even though they may be valid words in other languages (see django#12905).

Granted, the current behavior does given users the ability to override the removal of stopwords if desired – by retyping the slug – but this is an extra step they’d have to remember to do. Especially given the complexity introduced by something like #4881, might it be better to default to a slightly simpler behavior?


The most straightforward change would be to remove the lines excerpted above, but keep the rest of the slug generation logic, which would maintain similarity with Django.

One consideration is around deviating from Django’s logic. Because Page.slug is a SlugField, it might be best to stay in step with Django slug creation, perhaps proposing any change in functionality there. On the other hand, Wagtail should be first and foremost focused on the needs of users, not on maintaining compatibility with Django admin behavior. Additionally, Wagtail is already using a copy of the logic instead of directly pulling in the Django JS file (and has to work to keep it up-to-date with Django changes, see for example #4897). The fact that Django’s stopword list hasn’t changed since 2005 (!) might be a sign that the Django community is happy with the existing logic – or the use of Django SlugFields with auto-generation might be an edge case that doesn’t get much attention.

Wagtail could instead consider making this logic customizable, so that users can define their own slugging behavior if it differs from the Wagtail default. This is at least theoretically possible now by including an alternate wagtailadmin/js/vendor/urlify.js file in a project’s templates, although maybe not as simple or well-contained as one might like; it’d be nicer if you could override just the stopwords piece while using the default unicode character handling, for example. Customization of urlify.js logic in Django seems to have been discussed a fair bit in Django tickets and on the django-developers mailing list, but the decision was made not to implement.)

Other potential questions to consider:

  • Do other CMSes remove these words from URLs? Wordpress doesn’t seem to. Django CMS does seem to, referencing Django’s JS file directly.
  • Are there modern SEO concerns around stopword removal? Here’s an arbitrary SEO page that seems to think stopwords are okay. Here’s another that advises against – and look at the bottom of that page for a longer list of what they consider stopwords in 2018. The Django list that Wagtail copies is unchanged since 2005 – is it still reasonable and effective to filter out this specific list? Here’s an arbitrary SO answer that comes down somewhere in the middle.
  • How frequently in practice do Wagtail users manually edit the auto-generated slug when creating a new page? It’d be interesting to create a script that one could run against a Wagtail DB to compute this. One example would be a post like What’s It Like For A Developer Moving To Wagtail From Drupal? on the Wagtail blog. This currently uses the slug “whats-it-like-for-a-developer-moving-to-wagtail-from-drupal”, instead of what would be the current default “whats-like-developer-moving-wagtail-drupal”.

Steps to Reproduce

  1. Start a new project with wagtail start myproject
  2. ./manage.py migrate && ./manage.py createsuperuser
  3. ./manage.py runserver, visit http://localhost:8000/admin, and login with the superuser you just created.
  4. Add a new child page under Home and give it a title like “To be or not to be, that is the question”. Note in the Promote tab that the autogenerated slug is “be-or-not-be-question”.
  • I have confirmed that this issue can be reproduced as described on a fresh Wagtail project: yes

Technical details

  • Python version: 3.6.3
  • Django version: 2.1.3
  • Wagtail version: 2.3

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
jrdemasicommented, Apr 2, 2020

Just want to throw my +1 here - I support this change. It should at least be an option, so it won’t modify the default behavior others are used to. It’s annoying having to manually modify every slug, but it’s so much cleaner in my opinion if the URL matches the title as closely as possible.

2reactions
chosakcommented, Apr 9, 2020

I’ve posted to django-developers to solicit feedback on making this change upstream in Django, which feels like the right place for this to happen.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why You Should NOT Remove Stop Words From URLs (Slugs)
Removing stop words from URLs/slugs can hurt SEO since it makes URLs read funny (don't enable this in your SEO plugin).
Read more >
What Is a Slug and How Does It Affect Your URL?
However, removing stop words from all slugs may not be a good idea, as stop ... For pages, the slug will also appear...
Read more >
Better SEO to remove "stop" words from an article's URL Slug?
The thought was, that removing stop words was a good default practice because it would help address situations where longer URLs would be ......
Read more >
How to Translate URL Slugs for Your WordPress Site
When creating a multilingual WordPress website, translating URL slugs ... Only remove stop words if it doesn't change the meaning of a slug....
Read more >
Dropping common terms: stop words - Stanford NLP Group
Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found