[Doc][improve] Auto-detect broken links and integrate into CI process
See original GitHub issueSearch before asking
- I searched in the issues and found nothing similar.
What issue do you find in Pulsar docs?
After #17495 and #17599, we can see many broken links in the Pulsar document. And more links may break during further evolution.
What is your suggestion?
In #17599 I use a script but it’s not reliable enough so I have to check the links list manually. I suggest developing a reliable script to auto-detect the incorrect links in the Pulsar document. Maybe we can integrate this script into the CI process which is related to document change.
Before development, I’d like to enumerate all kinds of broken links.
1. wrong markdown file reference
For example the link of this page:
The markdown content is [configuration](reference-configuration.md)
, but the reference-configuration.md
file not exists.
2. 404 URL path
For example the link of this page:
The markdown content is [type](/api/client/index.html?org/apache/pulsar/client/api/CompressionType.html)
, but the Pulsar site doesn’t have this path.
3. confusing URL path
For example the link of this page:
The markdown content is [Pulsar Functions CLI](/tools/pulsar-admin/)
, but this refers to a confusing page:
4. invalid title anchor
We can use #
to refer to a specific block of HTML this way: [dataDir](reference-configuration.md#zookeeper-dataDir)
. So if our script can detect the anchor will be better.
Our script should be able to detect these broken links and print warning messages to users.
cc @tisonkun @Anonymitaet @momo-jun @michaeljmarshall
Any reference?
No response
Are you willing to submit a PR?
- I’m willing to submit a PR!
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:18 (17 by maintainers)
Top GitHub Comments
FYI: https://github.com/apache/pulsar/pull/18132/files#diff-2f6afd23b9f4fddf8ea934eb85c62599f7d05e01bb457cbabc2a178cd92ddab9 is adding
reference-configuration.md
backCC @SignorMercurio
@Anonymitaet yes. I think that we should retain the URL
/docs/<path>
for convenience and current usage, but it helps to serve it as an alias.That is, the source of truth is
/docs/<latest-version>/<path>
and we set up/docs/<content>
as an alias to the latest stable versioned one. In this way, users can use/docs/<path>
to access the latest stable version (which can be changed), while where there needs an immutable link, it uses/docs/<latest-version>/<path>
.I think this should be done in the building stage. And let’s move the discussion to #17438 instead of polluting this thread 😃