question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

apply omission of "local" columns when applying aliasing to join conditions around a secondary

See original GitHub issue

Hello! My issue is about creating many-to-many relationship, based on a column with array of ids (using PostgreSQL) I’ve seen the issue https://github.com/sqlalchemy/sqlalchemy/issues/4472 and it looks like in one of the last messages there’s an error like my issue, but I’m not sure, so here’s a new one.

TL;DR:

When creating a many-to-many relationship based on a column with array of ids (using PostgreSQL) (and yes. I know, that it’s a violation of SQL pattern, but this is present in a huge project and I’m unable to change this, but I want to get relations in one query) if I use joinedload, alchemy creates an invalid query (I’ll list raw SQL queries, generated by alchemy, below)


I’ve also seen this question and used something from the answers as a base for my solution https://stackoverflow.com/questions/9729381/sqlalchemy-relationships-with-postgresql-array

Tested using SQLAlchemy==1.3.10

The fully-working standalone code:

from sqlalchemy import Column, Integer, String, create_engine, select, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy.orm import relationship, sessionmaker, scoped_session, joinedload, contains_eager

#### Creating connection
engine = create_engine('postgres://test:test@127.0.0.1:5432/test')
# Session = sessionmaker(autocommit=False, autoflush=False, bind=engine)
Session = scoped_session(
    sessionmaker(autocommit=False, autoflush=False, bind=engine)
)
Base = declarative_base()
Base.metadata.bind = engine
##

## Helping util
from sqlalchemy.dialects.postgresql.psycopg2 import PGDialect_psycopg2

psycopg_dialect = PGDialect_psycopg2()

def compile_query(q):
    compiled = q.statement.compile(dialect=psycopg_dialect)
    return str(compiled) % compiled.params

##

class Author(Base):
    __tablename__ = 'authors'

    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False, unique=True)
    books = Column(ARRAY(Integer), default=[], nullable=False)

    def __repr__(self):
        return f'Author(name={self.name!r}, books={self.books!r})'


class Book(Base):
    __tablename__ = 'books'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False, unique=True)

    def __repr__(self):
        return f'Book(name={self.name!r})'


### creating relationships
## idea from:
# https://stackoverflow.com/questions/9729381/sqlalchemy-relationships-with-postgresql-array

authors_books_selectable = select([
    func.unnest(Author.books).label('book_id'),
    Author.id.label('author_id'),
]).alias()


join_primary = Book.id == authors_books_selectable.c.book_id  # book_id is a label for unnested ids
join_secondary = authors_books_selectable.c.author_id == Author.id  # author_id label from selectable

Book._relationship_inverse_authors_ = relationship(
    Author,
    secondary=authors_books_selectable,
    primaryjoin=join_primary,
    secondaryjoin=join_secondary,
    viewonly=True,
)


Author._relationship_books_ = relationship(
    Book,
    secondary=authors_books_selectable,
    primaryjoin=join_primary,
    secondaryjoin=join_secondary,
    viewonly=True,

    # # Does not work with backref :(
    # backref='_relationship_inverse_authors_',
    #
    # backref=backref(
    #     '_relationship_inverse_authors_',
    #     uselist=True,
    #     viewonly=True,
    # )
)

#


def create_tables():
    print('[Re]Creating tables')
    # # uncomment if wanna clear tables
    # comment out for preserving the tables for each run
    # Base.metadata.drop_all()
    Base.metadata.create_all()


def create_entities():
    print('Creating entities')

    b1 = Book(name='First')
    b2 = Book(name='Second')
    b3 = Book(name='Third')
    bn = Book(name='Nth')
    Session.add(b1)
    Session.add(b2)
    Session.add(b3)
    Session.add(bn)
    Session.commit()

    a1 = Author(name='A1', books=[b1.id, b2.id])
    a2 = Author(name='A2', books=[b2.id, b3.id])
    Session.add(a1)
    Session.add(a2)
    Session.commit()


def print_all_of_model(model):
    print()
    print('Querying', model.__name__, 'from', model.__tablename__)
    items = Session.query(model).all()
    print('The result is:')
    print(items)


def show_whats_inside():
    """
    Outputs

    Querying Author from authors
    The result is:
    [Author(name='A1', books=[1, 2]), Author(name='A2', books=[2, 3])]

    Querying Book from books
    The result is:
    [Book(name='First'), Book(name='Second'), Book(name='Third'), Book(name='Nth')]

    :return:
    """
    for model in (Author, Book):
        print_all_of_model(model)


def show_query_for_one_author(q):
    print('\nCompiled query:')
    print(compile_query(q))
    a = q.one()
    print('.\nauthor:', a)
    print('his books:', a._relationship_books_)


def just_show_one_author():
    """
    Outputs

    Querying author:

    Compiled query:
    SELECT authors.id, authors.name, authors.books
    FROM authors
    WHERE authors.id = 2
    .
    author: Author(name='A2', books=[2, 3])
    his books: [Book(name='First'), Book(name='Second'), Book(name='Third')]

    :return:
    """
    print('Querying author:')
    q = Session.query(Author).filter(Author.id == 2)
    show_query_for_one_author(q)


def get_author_and_books_using_joinedload():
    """
    This one failes, output is:

    getting one author and his books using joined load

    Compiled query:
    SELECT authors.id, authors.name, authors.books, books_1.id, books_1.name
    FROM authors LEFT OUTER JOIN ((SELECT unnest(authors.books) AS book_id, authors.id AS author_id
    FROM authors) AS anon_1 JOIN books AS books_1 ON anon_1.author_id = anon_1.author_id) ON books.id = anon_1.book_id
    WHERE authors.id = 2
    failed: (psycopg2.ProgrammingError) invalid reference to FROM-clause entry for table "books"
    LINE 3: ...ooks_1 ON anon_1.author_id = anon_1.author_id) ON books.id =...
                                                                 ^
    HINT:  Perhaps you meant to reference the table alias "books_1".

    [SQL: SELECT authors.id AS authors_id, authors.name AS authors_name, authors.books AS authors_books, books_1.id AS books_1_id, books_1.name AS books_1_name
    FROM authors LEFT OUTER JOIN ((SELECT unnest(authors.books) AS book_id, authors.id AS author_id
    FROM authors) AS anon_1 JOIN books AS books_1 ON anon_1.author_id = anon_1.author_id) ON books.id = anon_1.book_id
    WHERE authors.id = %(id_1)s]
    [parameters: {'id_1': 2}]
    (Background on this error at: http://sqlalche.me/e/f405)
    :return:
    """
    print('-----')
    print('getting one author and his books using joined load')
    q = Session.query(Author).filter(Author.id == 2)
    q = q.options(joinedload(Author._relationship_books_))
    show_query_for_one_author(q)


def get_author_and_books_using_contains_eager():
    """
    So. this is my solution. Output:

    getting one author and his books using contains_eager

    Compiled query:
    SELECT authors.id, authors.name, authors.books, books.id, books.name
    FROM books JOIN (SELECT unnest(authors.books) AS book_id, authors.id AS author_id
    FROM authors) AS anon_1 ON books.id = anon_1.book_id JOIN authors ON anon_1.author_id = authors.id
    WHERE authors.id = 2
    .
    author: Author(name='A2', books=[2, 3])
    his books: [Book(name='Second'), Book(name='Third')]
    :return:
    """
    print('---')
    print('getting one author and his books using contains_eager')

    q = (
        Session.query(Author).filter(Author.id == 2)
        .join(Book._relationship_inverse_authors_)
        .options(
            contains_eager(Author._relationship_books_)
        )
    )
    show_query_for_one_author(q)


def main():
    # # We need these two only for the first run
    create_tables()
    create_entities()

    show_whats_inside()
    just_show_one_author()

    try:
        get_author_and_books_using_joinedload()
    except Exception as e:
        Session.rollback()
        print('failed:', e)
        # outputs:
        """
        failed: (psycopg2.ProgrammingError) invalid reference to FROM-clause entry for table "books"
        LINE 3: ...ooks_1 ON anon_1.author_id = anon_1.author_id) ON books.id =...
                                                                     ^
        HINT:  Perhaps you meant to reference the table alias "books_1".
        
        [SQL: SELECT authors.id AS authors_id, authors.name AS authors_name, authors.books AS authors_books, books_1.id AS books_1_id, books_1.name AS books_1_name 
        FROM authors LEFT OUTER JOIN ((SELECT unnest(authors.books) AS book_id, authors.id AS author_id 
        FROM authors) AS anon_1 JOIN books AS books_1 ON anon_1.author_id = anon_1.author_id) ON books.id = anon_1.book_id 
        WHERE authors.id = %(id_1)s]
        [parameters: {'id_1': 2}]
        (Background on this error at: http://sqlalche.me/e/f405)
        """

    # This one works!
    get_author_and_books_using_contains_eager()


if __name__ == '__main__':
    main()

So!

The wrong compiled SQL (created using joinedload) is:

q = Session.query(Author).filter(Author.id == 2)
q = q.options(joinedload(Author._relationship_books_))
SELECT authors.id, authors.name, authors.books, books_1.id, books_1.name
FROM authors
         LEFT OUTER JOIN ((SELECT unnest(authors.books) AS book_id, authors.id AS author_id
                           FROM authors) AS anon_1 JOIN books AS books_1 ON anon_1.author_id = anon_1.author_id)
                         ON books.id = anon_1.book_id
WHERE authors.id = 2

As you can see, in the ON clause it uses books instead of alias books_1: ON books.id. It also uses LEFT OUTER JOIN, which would return all of the books Also, there’s a useless ON clause: ON anon_1.author_id = anon_1.author_id

And the working solution (created using contains_eager) compiled SQL query is:

q = (
    Session.query(Author).filter(Author.id == 2)
    .join(Book._relationship_inverse_authors_)
    .options(
        contains_eager(Author._relationship_books_)
    )
)
SELECT authors.id, authors.name, authors.books, books.id, books.name
FROM books
         JOIN (SELECT unnest(authors.books) AS book_id, authors.id AS author_id
               FROM authors) AS anon_1 ON books.id = anon_1.book_id
         JOIN authors ON anon_1.author_id = authors.id
WHERE authors.id = 2

So, is it OK to use it like this? Am I missing something? Maybe I’m doing wrong the joinedload? Do I have to add any more aliases manually? Is this any kind of an issue, or just my fault?

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
zzzeekcommented, Nov 10, 2019

you can use relationship to secondary but the issue with “anon_1.id == anon_1.id” is fixed in the above gerrit and will be in 1.3.11.

0reactions
sqla-testercommented, Nov 10, 2019

Mike Bayer has proposed a fix for this issue in the master branch:

Exclude local columns when adapting secondary in a join condition https://gerrit.sqlalchemy.org/1571

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Use Aliases with SQL JOINs
SQL aliases are custom names that you can give to the columns and tables you include in your queries. Aliases are very useful, ......
Read more >
MySQL 8.0 Reference Manual :: 13.2.13.2 JOIN Clause
The NATURAL [LEFT] JOIN of two tables is defined to be semantically equivalent to an INNER JOIN or a LEFT JOIN with a...
Read more >
JOIN (SQL)
The JOIN operation combines matching rows from two tables into a single table. Rows across two tables are considered a match when they...
Read more >
Alias for column name on a SELECT * join
In SQL, top-level SELECT statements are allowed to produce the same column name twice. Any form of nested SELECT , derived table, or...
Read more >
SQL joins and how to use them
When using joins, sometimes our queries can get unwieldy, especially when we're dealing with 2 or more JOIN s. To better manage this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found