question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Limiting SQL query to defined fields/columns

See original GitHub issue

A full working demo can be found under https://github.com/somada141/demo-graphql-sqlalchemy-falcon.

Consider the following SQLAlchemy ORM class:

class Author(Base, OrmBaseMixin):
    __tablename__ = "authors"

    author_id = sqlalchemy.Column(
        sqlalchemy.types.Integer(),
        primary_key=True,
    )

    name_first = sqlalchemy.Column(
        sqlalchemy.types.Unicode(length=80),
        nullable=False,
    )

    name_last = sqlalchemy.Column(
        sqlalchemy.types.Unicode(length=80),
        nullable=False,
    )

Simply wrapped in an SQLAlchemyObjectType as such:

class TypeAuthor(SQLAlchemyObjectType):
    class Meta:
        model = Author

and exposed through:

    author = graphene.Field(
        TypeAuthor,
        author_id=graphene.Argument(type=graphene.Int, required=False),
        name_first=graphene.Argument(type=graphene.String, required=False),
        name_last=graphene.Argument(type=graphene.String, required=False),
    )

    @staticmethod
    def resolve_author(
        args,
        info,
        author_id: Union[int, None] = None,
        name_first: Union[str, None] = None,
        name_last: Union[str, None] = None,
    ):
        query = TypeAuthor.get_query(info=info)

        if author_id:
            query = query.filter(Author.author_id == author_id)

        if name_first:
            query = query.filter(Author.name_first == name_first)

        if name_last:
            query = query.filter(Author.name_last == name_last)

        author = query.first()

        return author

A GraphQL query such as:

query GetAuthor{
  author(authorId: 1) {
    nameFirst
  }
}

will cause the following raw SQL to be emitted (taken from the echo logs of the SQLA engine):

SELECT authors.author_id AS authors_author_id, authors.name_first AS authors_name_first, authors.name_last AS authors_name_last
FROM authors
WHERE authors.author_id = ?
 LIMIT ? OFFSET ?
2018-05-24 16:23:03,669 INFO sqlalchemy.engine.base.Engine (1, 1, 0)

As one can see we may only want the nameFirst field, i.e., the name_first column but the entire row is fetched. Of course the GraphQL response only contains the requested fields, i.e.,

{
  "data": {
    "author": {
      "nameFirst": "Robert"
    }
  }
}

but we have still fetched the entire row, which becomes a major issue when dealing with wide tables.

Is there a way to automagically communicate which columns are needed to SQLAlchemy so as preclude this form of over-fetching?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:3
  • Comments:15 (5 by maintainers)

github_iconTop GitHub Comments

8reactions
somada141commented, Jun 16, 2018

Should anyone land here wondering about implementation I got around to implementing my own version of the get_query_fields function as such:

def extract_requested_fields(
    info: graphql.execution.base.ResolveInfo,
    fields: List[Union[Field, FragmentSpread]],
    do_convert_to_snake_case: bool = True,
) -> Dict:
    """Extracts the fields requested in a GraphQL query by processing the AST
    and returns a nested dictionary representing the requested fields.

    Note:
        This function should support arbitrarily nested field structures
        including fragments.

    Example:
        Consider the following query passed to a resolver and running this
        function with the `ResolveInfo` object passed to the resolver.

        >>> query = "query getAuthor{author(authorId: 1){nameFirst, nameLast}}"
        >>> extract_requested_fields(info, info.field_asts, True)
        {'author': {'name_first': None, 'name_last': None}}

    Args:
        info (graphql.execution.base.ResolveInfo): The GraphQL query info passed
            to the resolver function.
        fields (List[Union[Field, FragmentSpread]]): The list of `Field` or
            `FragmentSpread` objects parsed out of the GraphQL query and stored
            in the AST.
        do_convert_to_snake_case (bool): Whether to convert the fields as they
            appear in the GraphQL query (typically in camel-case) back to
            snake-case (which is how they typically appear in ORM classes).

    Returns:
        Dict: The nested dictionary containing all the requested fields.
    """

    result = {}
    for field in fields:

        # Set the `key` as the field name.
        key = field.name.value

        # Convert the key from camel-case to snake-case (if required).
        if do_convert_to_snake_case:
            key = to_snake_case(name=key)

        # Initialize `val` to `None`. Fields without nested-fields under them
        # will have a dictionary value of `None`.
        val = None

        # If the field is of type `Field` then extract the nested fields under
        # the `selection_set` (if defined). These nested fields will be
        # extracted recursively and placed in a dictionary under the field
        # name in the `result` dictionary.
        if isinstance(field, Field):
            if (
                hasattr(field, "selection_set") and
                field.selection_set is not None
            ):
                # Extract field names out of the field selections.
                val = extract_requested_fields(
                    info=info,
                    fields=field.selection_set.selections,
                )
            result[key] = val
        # If the field is of type `FragmentSpread` then retrieve the fragment
        # from `info.fragments` and recursively extract the nested fields but
        # as we don't want the name of the fragment appearing in the result
        # dictionary (since it does not match anything in the ORM classes) the
        # result will simply be result of the extraction.
        elif isinstance(field, FragmentSpread):
            # Retrieve referened fragment.
            fragment = info.fragments[field.name.value]
            # Extract field names out of the fragment selections.
            val = extract_requested_fields(
                info=info,
                fields=fragment.selection_set.selections,
            )
            result = val

    return result

which parses the AST into a dict preserving the structure of the query and (hopefully) matching the structure of the ORM.

Running the info object of a query like:

query getAuthor{
  author(authorId: 1) {
    nameFirst,
    nameLast
  }
}

produces

{'author': {'name_first': None, 'name_last': None}}

while a more complex query like this:

query getAuthor{
  author(nameFirst: "Brandon") {
    ...authorFields
    books {
      ...bookFields
    }
  }
}

fragment authorFields on TypeAuthor {
  nameFirst,
  nameLast
}

fragment bookFields on TypeBook {
  title,
  year
}

produces:

{'author': {'books': {'title': None, 'year': None},
  'name_first': None,
  'name_last': None}}

Now these dictionaries can be used to define what is a field on the primary-table (Author in this case) as they’ll have a value of None such as name_first or a field on a relationship of that primary-table such as field title on the books relationship.

A simplistic approach to auto-applying those fields can take the form of the following function:

def apply_requested_fields(
    info: graphql.execution.base.ResolveInfo,
    query: sqlalchemy.orm.Query,
    orm_class: Type[OrmBaseMixin]
) -> sqlalchemy.orm.Query:
    """Updates the SQLAlchemy Query object by limiting the loaded fields of the
    table and its relationship to the ones explicitly requested in the GraphQL
    query.

    Note:
        This function is fairly simplistic in that it assumes that (1) the
        SQLAlchemy query only selects a single ORM class/table and that (2)
        relationship fields are only one level deep, i.e., that requestd fields
        are either table fields or fields of the table relationship, e.g., it
        does not support fields of relationship relationships.

    Args:
        info (graphql.execution.base.ResolveInfo): The GraphQL query info passed
            to the resolver function.
        query (sqlalchemy.orm.Query): The SQLAlchemy Query object to be updated.
        orm_class (Type[OrmBaseMixin]): The ORM class of the selected table.

    Returns:
        sqlalchemy.orm.Query: The updated SQLAlchemy Query object.
    """

    # Extract the fields requested in the GraphQL query.
    fields = extract_requested_fields(
        info=info,
        fields=info.field_asts,
        do_convert_to_snake_case=True,
    )

    # We assume that the top level of the `fields` dictionary only contains a
    # single key referring to the GraphQL resource being resolved.
    tl_key = list(fields.keys())[0]
    # We assume that any keys that have a value of `None` (as opposed to
    # dictionaries) are fields of the primary table.
    table_fields = [
        key for key, val in fields[tl_key].items()
        if val is None
    ]

    # We assume that any keys that have a value being a dictionary are
    # relationship attributes on the primary table with the keys in the
    # dictionary being fields on that relationship. Thus we create a list of
    # `[relatioship_name, relationship_fields]` lists to be used in the
    # `joinedload` definitions.
    relationship_fieldsets = [
        [key, val.keys()]
        for key, val in fields[tl_key].items()
        if isinstance(val, dict)
    ]

    # Assemble a list of `joinedload` definitions on the defined relationship
    # attribute name and the requested fields on that relationship.
    options_joinedloads = []
    for relationship_fieldset in relationship_fieldsets:
        relationship = relationship_fieldset[0]
        rel_fields = relationship_fieldset[1]
        options_joinedloads.append(
            sqlalchemy.orm.joinedload(
                getattr(orm_class, relationship)
            ).load_only(*rel_fields)
        )

    # Update the SQLAlchemy query by limiting the loaded fields on the primary
    # table as well as by including the `joinedload` definitions.
    query = query.options(
        sqlalchemy.orm.load_only(*table_fields),
        *options_joinedloads
    )

    return query
2reactions
aldiyar-zharkimbayevcommented, Sep 6, 2021

Hi, I didn’t find full answer, so I combined answers and made solution for “n+1”, but “cartesian product” appears.

  1. edited version of extract_requested_fields function from @somada141, that converts requested fields to dict
  2. my makeLoadOnlyOptions function, that converts dict to query options
  3. custom ConnectionField class LoadOnlyConnectionField - if you don’t want custom class, you can use this example to resolve query manually
def extract_requested_fields(
        info: graphql.execution.base.ResolveInfo,
        fields: List[Field],
        do_convert_to_snake_case: bool = True,
) -> Dict:
    result = {}
    for field in fields:

        key = field.name.value
        if do_convert_to_snake_case:
            key = to_snake_case(name=key)

        if key == "id":
            continue
        val = None
        if isinstance(field, Field):
            if (
                    hasattr(field, "selection_set") and
                    field.selection_set is not None
            ):
                val = extract_requested_fields(
                    info=info,
                    fields=field.selection_set.selections,
                    do_convert_to_snake_case=do_convert_to_snake_case
                )
            if val and 'edges' in val:
                val = val['edges']
            if val and 'node' in val:
                val = val['node']
            result[key] = val
        elif isinstance(field, FragmentSpread):
            fragment = info.fragments[field.name.value]
            val = extract_requested_fields(
                info=info,
                fields=fragment.selection_set.selections,
                do_convert_to_snake_case=do_convert_to_snake_case
            )
            result = val
    return result
def makeLoadOnlyOptions(fieldsDict, model):
    if not fieldsDict:
        return []
    options = []
    modelFields = []
    relationshipFields = []
    for field in list(fieldsDict):
        if fieldsDict[field] is None:
            modelFields.append(field)
        else:
            relationshipFields.append(field)
    if modelFields:
        options.append(load_only(*modelFields))
    for relationshipField in relationshipFields:
        relationshipOptions = makeLoadOnlyOptions(fieldsDict[relationshipField],
                                                 getattr(model, relationshipField).property.mapper.class_)
        options.append(joinedload(getattr(model, relationshipField)).options(*relationshipOptions))
    return options
class LoadOnlyConnectionField(graphene_sqlalchemy.SQLAlchemyConnectionField):
    def __init__(self, connection, *args, **kwargs):
        super().__init__(connection, *args, **kwargs)

    @classmethod
    def get_query(cls, model, info: 'ResolveInfo', sort=None, **args):
        query = super().get_query(model, info, sort, **args)
        queryName = info.field_name

        fieldsDict = extract_requested_fields(info, info.field_asts, False)
        fieldsDict = fieldsDict[queryName]
        options = makeLoadOnlyOptions(fieldsDict, model)
        query = query.options(*options)
        # print(query)
        return query
class Query(graphene.ObjectType):
    node = relay.Node.Field()

    searchTitlesRelay = LoadOnlyConnectionField(connection=Title, sort=Title.sort_argument())
{
    searchTitlesRelay{
        edges {
            cursor
            node {
                uuid
                code
                titleAuthors {
                    edges {
                        node {
                            uuid
                            user {
                                uuid
                                name
                            }
                        }
                    }
                }
                chapters {
                    edges {
                        node {
                            uuid
                        }
                    }
                }
            }
        }
    }
}
SELECT	anon_1.title_id AS anon_1_title_id, anon_1.title_code AS anon_1_title_code,
	title_author_1.id AS title_author_1_id,
	users_1.id AS users_1_id, users_1.name AS users_1_name,
	chapter_1.id AS chapter_1_id
FROM (
	SELECT title.id AS title_id, title.code AS title_code
	FROM title ORDER BY title.id ASC
	LIMIT 2
) AS anon_1
LEFT OUTER JOIN title_author AS title_author_1 ON anon_1.title_id = title_author_1.title_id
LEFT OUTER JOIN users AS users_1 ON users_1.id = title_author_1.user_id
LEFT OUTER JOIN chapter AS chapter_1 ON anon_1.title_id = chapter_1.title_id
ORDER BY anon_1.title_id ASC
Read more comments on GitHub >

github_iconTop Results From Across the Web

The Complete Guide to SQL Row Limiting and Top-N Queries
The SQL Limit feature allows for SQL row limiting and performing Top-N queries. Learn how to use the LIMIT feature and how to...
Read more >
Limiting SQL query to defined fields/columns in Graphene ...
My question was answered on the GitHub issue (https://github.com/graphql-python/graphene-sqlalchemy/issues/134). The idea is to identify the ...
Read more >
Using the SQL Limit Keyword - Navicat
The SQL LIMIT clause constrains the number of rows returned by a SELECT statement. For Microsoft databases like SQL Server or MSAccess, ...
Read more >
SQL: SELECT LIMIT Statement - TechOnTheNet
The SQL SELECT LIMIT statement is used to retrieve records from one or more tables in a database and limit the number of...
Read more >
SQL LIMIT | Basic SQL - Mode Analytics
As you might expect, the limit restricts how many rows the SQL query returns. ... idea of which fields you care about and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found