allow regexp matching in workspace operations
See original GitHub issueIMO at least the following operations (both in the CLI and API) on METS deserve allowing regular expression matching:
remove_file
: instead of a fixed ID to look forremove_file_group
: instead of a fixed ID to look forfind_files
: both MIME type (e.g. justimage/
) and IDadd_file_group
: for both file names and file IDs (possibly with back-references), cf. discussion here
To minimise possible ambiguity between a verbatim string and regex interpretation, while still keeping the existing argument/option names (and not introducing a new flavour each time), I recommend either using POSIX Basic Regular Expression syntax (which is perhaps hard to get by in Python) or allowing some kind of extra notation in the input, e.g. a re:
prefix.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Guidelines for using regular expressions - Google Support
A regular expression, also called a regex, is a method for matching text with patterns. For example, a regular expression can describe the...
Read more >Regular Expressions | Adobe Analytics
Regular expressions are used across all data workbench search fields ... Iteration metacharacters let you match a pattern more than once.
Read more >Regexp entities | Dialogflow ES - Google Cloud
With regexp entities, you can provide regular expressions for matching. ... Note: Enabling auto speech adaptation is recommended when using regexp entities.
Read more >A Guide to R Regular Expressions With Examples - DataCamp
Explore regular expressions in R, why they're important, the tools and ... Below are the main functions that search for regex matches in...
Read more >Excel Regex: match strings using regular expressions - Ablebits
To match a string in a single cell, refer to that cell in the first argument. The second argument is supposed to contain...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry, just saw your response, already had #507 queued as fix.
I did without mirroring…
I agree these make sense and I can implement them. This is related to https://github.com/OCR-D/core/issues/446#issuecomment-590328336 (inefficient find_files) and #448, so I’m implementing it as part of #448. Have a look at https://github.com/kba/ocrd-core/commit/8b6d277640335bf8afa1d815f0cf26f5b9290060, this implements the regex search for
find_files
with are:
prefix, i.e. you can domets.find_files(mimetype="re:image/jpe?g")
ormets.find_files(ID="re:.*0001.*")
.Since this is coupled to the “single-pass find_files changeset”, I still need to do performance testing but do let me know if this is going in the wrong direction.
Not sure about the
re:
prefix because of possible conflicts. How about~
or@
which are presumably more rare in digitization data thanre:
?