postgresql - to_tsquery docs and implementation detail
See original GitHub issueMigrated issue, originally created by jvanasco (@jvanasco)
I recently realized some implementation details of to_tsquery
that are incompatible with the docs I drafted and how sqlalchemy integrates it.
Not sure how to handle this.
I based the docs/examples on existing tests and text. So we have this as the first bit of “Full Text Search” ( http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#full-text-search )
select([sometable.c.text.match("search string")])
SELECT text @@ to_tsquery('search string') FROM table
well, if we put this into psql…
badsql=> select to_tsquery('search string') ;
ERROR: syntax error in tsquery: "search string"
that’s because tsquery has a rigid enforcement of input. text must either be tokenized and joined with acceptable operators :
select to_tsquery('cat & rat');
be quoted :
select to_tsquery('''search string''');
or use the alternate function
select plainto_tsquery('search string');
So, these are acceptable:
select to_tsquery('search & string');
select to_tsquery('''search string''');
select plainto_tsquery('search string');
but this is not:
select to_tsquery('search string');
I’m not sure the best way to handle this nuance in the docs.
As far as the implementation detail goes…
this creates issues with Column.match
, because that generates (invalid) sql like this:
table.column @@ to_tsquery('search string')
not valid sql like:
table.column @@ to_tsquery('''search string''')
table.column @@ plainto_tsquery('search string')
This is an annoying implementation detail , but the content/type of string will affect the function or format t search on.
i thought about just regexing this into submission, but the multitude of possible operators and edge cases suggests that the text would need to be parsed for tokenization instead – otherwise throwing in a bit of text that has an operator will function as a real operator and probably trigger an invalid syntax.
i think the easiest scenario would be to replace match’s to_tsquery
with plainto_tsquery
Issue Analytics
- State:
- Created 9 years ago
- Reactions:1
- Comments:29 (3 by maintainers)
Joe Futrelle (@joefutrelle) wrote:
ELI5: psql to_tsquery fails with syntax error on typical freeform text queries (e.g., something a user typed into a form), and plainto_tsquery is provided for that case. Match in sqla uses to_tsquery and so there’s no obvious way in sqla to perform a full-text search on a freeform text query.
To add to jvanasco’s workaround above, here’s an ORM-based workaround, that should probably at least be documented:
jvanasco, should I be passing ‘english’ to the ‘text’ function like you are? This works for me without doing that.
jvanasco (@jvanasco) wrote:
Likely a 1.0
Wanted to bring this up for discussion; reach out to others who have been working on full text support.
Still wrapping my head around this, and what the proper flow should be. Perhaps no code change, but a lot of docs… But I’m not sure
There are times where “to_tsquery” would be appropriate, and others where it’s not. I think the bulk of usage scenarios in the orm would fall under a “plainto” match though.
Personally, I’ve replaced all my ‘col1.match(col2)’ with “col1.op(”@@“)(func.plainto_tsquery(col2))”