Discussion: extending base HPI/overlays/overrides
See original GitHub issueRelated issues: https://github.com/karlicoss/HPI/issues/12, https://github.com/karlicoss/HPI/issues/46; but I think worth a separate discussion.
From my experience, it’s pretty hard to predict how other people want to use their data:
- you might miss some attributes they care about
- some people want to be more paranoid or more defensive (e.g. timezone handling/None safety/etc)
- they might want to do some extra filtering
- they might want to merge in extra data sources or suppress existing
The list is endless! So it would be nice if it was possible to easily override small bits of HPI modules/mix in other data, etc.
The main goals are:
- low effort: ideally it should be a matter of a few lines of code to override something.
- good interop: e.g. ability to keep with the upstream, use modules coming from separate repositories, etc.
- ideally mypy friendly. This kind of means ‘not too dynamic and magical’, which is ultimately a good thing even if you don’t care about mypy.
Once again, I see Emacs as a good role model. Everything is really decentralized, you have some core library, you have certain patterns that everyone follows… but apart from that the modules are mostly independent. Many people still use ‘monolith’ base configurations (e.g. Doom/Spacemacs), because it’s kinda convenient, as long as you have a maintainer. Arguably this is what this repository is at the moment, although it’s obviously not as popular as Emacs distributions.
Emacs fits these goals well:
- low effort: the simplest way to confugure something is to override a variable in your config (thanks to dynamic scope, it ‘just works’). You can even literally override whole functions as a means of quickly getting the behaviour you want.
- good interop: yes, unless the developer broke some APIs, usually you can safely update the upstream module.
How to achieve this within HPI:
For combining independent modules together (say, something like my/youtube.py
and my/vimeo.py
coming from different repositories), the easiest is to use:
- symlinks (at least if you have just a few files/directories to mixin)
- namespace packages (more on them later)
Now, the tricky case is when you want to partially override something. The first option is: fork & apply your modifications on top. For example: https://github.com/seanbreckenridge/HPI
- effort: very straightforward
- interop: merging with the upstream a bit manual, but if you use atomic commits & interactive rebase/cherry pick, should be manageable
- at least not any more magical than the original repository
Not sure if there is much to discuss here, so straight to the second and a more flexible option.
Once again, we rely on namespace packages! I’ll just explain on a couple of examples, since it’s easier.
-
example: mixing in a data source
The main idea is that you can override
all.py
(also some discussion here), and remove/add extra data sources. Sinceall.py
is tiny, it’s not a big problem to just copy/paste it and apply your changes.Some existing modules implemented with this approach in mind:
- https://github.com/karlicoss/HPI/blob/master/my/body/sleep/main.py
- https://github.com/karlicoss/HPI/blob/master/my/rss/all.py
- https://github.com/karlicoss/HPI/blob/master/my/twitter/all.py
- https://github.com/karlicoss/HPI/blob/master/my/github/all.py
(I still haven’t settled on the naming.
all
andmain
as the entry point kind of both make sense) -
example:
my.calendar.holidays
As you can guess, this module is responsible for flagging days as holidays, by exposing
is_holiday
function. As a reasonable default, it’s just using the user’s country of residence and flags national holidays. However, you might also want to mix in your work vacation, and this is harder to make uniform for everyone, and it’s a good candidate for a custom user override:import my.orig.my.calendar.holidays as M from my.orig.my.calendar.holidays import * is_holiday_orig = M.is_holiday def is_holiday(d: DateIsh) -> bool: # if it's a public holiday, definitely a holiday? if is_holiday_orig(d): return True # then check private data of days off work if is_day_off_work(d): return True return False M.is_holiday = is_holiday
Thanks to namespace packages, when I import
my.calendar.holidays
it will hit my override first, monkey patch theis_holiday
function, and expose the rest intact due toimport *
. For example,hpi doctor my.calendar.holiday
will run against the override, reusing thestats
function or any other original functions.My personal HPI override has more example code, and I’ll gradually move some stuff from this repository there as well (for example most things in my.body don’t make much sense for other people).
Things I’m not sure about with this approach:
- To import the ‘original’ module and monkey patch it, you need some alternative way of referencing it.
-
for now, I’m using a symlink (
/code/hpi-overlay/src/my/orig -> /code/hpi/src/my
)This is simple enough, but maintaining the symlink manually, referencing the ‘original’ package through
my.orig
… meh. Also not sure what to do if there are multiple overrides, e.g. ‘chain’ (although this is probably a bit extreme). -
it’s probably possible to do something hacky and dynamic. E.g. take
__path__
, remove the first entry (which would be the ‘override’), and then useimportlib
to import the ‘original’ module.The downside is that it’s gonna be unfriendly to mypy (and generally a bit too magical?).
-
another option is to have some sort of dynamic ‘hook’, which is imported before anything else.
In the hook code, you import the original module and monkey patch. Same downsides, a bit too dynamic and not mypy friendly, but possible.
-
Caveats I know of:
-
packages can’t contain
__init__
, otherwise the whole namespace package thing doesn’t work -
you need to be careful about the namespace package resolution order. It seems that the last installed package will be the last in the import order.
-
so you’d need to run
pip install -e /path/to/override
and thenpip install -e /path/to/original
(even if it’s already installed). -
another option is to reorder stuff in
~/.local/lib/python3.x/site-packages/easy-install.pth
manually, but it’s not very robust either (although at least it clearly shows the order)hpi doctor my.module
displays some helpful info, but it’s still easy to forget/mess it up by accident.$ hpi doctor my.calendar.holidays ✅ import order: ['/code/hpi-overlay/src/my', '/code/hpi/my']
-
-
import *
doesn’t import functions that start from the underscore (‘private’).Possible to get around this dynamically, but would be nice to cooperate with mypy somehow…
Happy to hear suggestions and thoughts on that. Once there’s been some discussion, I’ll move this to doc/
, perhaps.
TODOS:
- also thought that it should e possible to reuse the configuration in
~/.config/my
as the ‘default’ overlay. In fact, treating it like a proper namespace package (at the moment it’s a bit of dynamic hackery) might make everything even cleaner and simpler. - find some good tutorial on monkey patching and link? Wouldn’t want to duplicate the efforts twice…
- add some examples of motivation for overrides, just for documentation purposes
- update docs here https://github.com/karlicoss/HPI/blob/master/doc/SETUP.org#addingmodifying-modules
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
Yeah, sorry about it, I moved some things recently (hopefully in the direction of simplifying). But yeah, symlinks are probably the easiest way.
Yes, I totally agree! (this is kind of what I mean about data access layer). The main reason there is still some code which does data ‘interpretation’ in HPI is just laziness/general hassle of scattering things across repositories. But for the most ‘common’ ones like Reddit/Github/instapaper the HPI bits (‘connection files’) are fairly thin. Definitely could do that more for GDPR modules as well. It’s kind of a shame extracting in a separate repository has such overhead – e.g. setup.py, ideally github actions (and all the corresponding configs). Maybe there could be some ‘superepository’ which imports all of them as submodules, and runs CI checks just for the ease of config maintenance.
Yep, good point, in fact – it would probably be better to start gradually using
my.core
since eventually it will be in a separate package> (relative imports will still work, but would be kinda awkward)Hmm. Maybe not a bad idea? Could be files or anything else that needs to be added to PYTHONPATH, basically. Yeah, mypy won’t work with these, but I guess many people don’t care about it anyway, so it could be nice to have this way of setting things up. Definitely easier than messing with editable installs for some people.
Yeah, at least certainly not reinvent from scratch!
I guess another important principle I keep in mind is that there shouldn’t be any ‘hard’ requirements for base modules like special base classes/interfaces/etc – should be as code agnostic and rely on common Python mechanisms as long as possible. That way it would be easy to plug in custom stuff with no hassle. So I’d avoid plugin systems that try to impose such structure, but would be cool to try something that’s can simplify monkey patching (e.g. one example I like is patchy).
In that sense all of the approaches above: symlinks/overrides/dynamic imports work, so it’s up to the ‘downstream’ user what to choose. But still would be nice to have some ‘natural’ way of doing this.
Oh, just recalled, another issue with symlinks is that it’s not very
pip install
friendly (when it’s non editable). I guess it’s not that big of a problem for an overlay, but might be annoying when HPI overlay is used as a dependency (or even during continuous integration). But also possible to work around by putting a symlinks directly insite-packages/hpi-overlay
as a ‘last resort’ measure…