Add source suggestions for Brave News
See original GitHub issueUrl format https://[hostname]/source-suggestions/source_similarity_t10.[region].json
Where region is, e.g. en_US
The format is:
{
[key: PublisherID]: {
source: PublisherID
score: number
}[]
}
There is also a human readable file at https://[hostname]/source-suggestions/source_similarity_t10_hr.[region].json
, the only purpose of which is to more easily check expected results, where the format is:
{
[key: PublisherName]: {
source: PublisherName
score: number
}[]
}
Each file provides a lookup for a given PublisherID to a list of similar PublisherIDs with a score ranking for how similar they are to each other (higher score means more similar).
Sources we should compare from, in priority order:
- Sources the user has directly subscribed to
- Sources the user has indirectly subscribed to (i.e. as part of a channel) and the user has visited the site recently
- Sources the user has indirectly subscribed to (i.e. as part of a channel) and we have no interest signal
We will take that source list and use the similarity matrix map to produce a list of “suggested sources”.
List we should show, in priority order:
- Sources that the user is not directly or indirectly subscribed to
- Sources that the user is indirectly subscribed to (i.e. as part of a channel) (We should not show sources that the user is already directly subscribed to)
Note: when talking about “direct” subscriptions above, we refer to any mode of subscription: combined sources or rss feed.
Which similarity region files to download? Any regions which the user has channel or feed subscriptions. i.e. the same regions we download feed.json files for.
When should we download the similarity files? An appropriate time seems to be when downloading feed subscriptions, since that occurs when the user modifies their feed subscriptions, and is also when we calculate which regions to download from. However, there may be a couple benefits to doing it when downloading sources, since that is when we search through history. However, we can search history for publisher matches again at this new “source similarity comparison” time.
Issue Analytics
- State:
- Created a year ago
- Comments:14 (7 by maintainers)
Top GitHub Comments
Absolutely fine to at least start with that then build incrementally if needed, since that’s contained within the suggestion above.
I was thinking something like this. For the comparing priority, I would treat indirectly subscribed sources (via Channels) as simple unsubscribed sources, and only consider the following signals:
As for showing:
I wouldn’t consider the indirect subscription signal unless it’s supported by a stronger interest signal (history), because for some categories/channels there might be sources that the user might entirely ignore and we should not prioritise those (i.e. a user subscribed to Entertainment but that is not interested in Music [Pitchfork, NME] at all).