Unicode characters get transformed into surrogate pairs by graphql.print_ast()
See original GitHub issueI’ve run into an issue where serializing a GraphQL DocumentNode into a string, and then parsing that back into a DocumentNode transforms certain unicode characters into surrogate pairs, which make them no longer UTF-8 encodeable.
This code snippet demonstrates the problem:
import graphql
value = "\U000a90e5"
print(f"Value before serializing: {value!r}")
encoded = value.encode("utf8")
print(f"UTF-8 encoded: {encoded!r}")
query = graphql.DocumentNode(
definitions=[
graphql.OperationDefinitionNode(
operation=graphql.OperationType.QUERY,
selection_set=graphql.SelectionSetNode(
selections=[
graphql.FieldNode(
name=graphql.NameNode(
kind="name",
value="hello",
),
arguments=[
graphql.ArgumentNode(
name=graphql.NameNode(value="user"),
value=graphql.StringValueNode(value=value)
)
]
)
]
)
)
]
)
serialized_query = graphql.print_ast(query)
print(f"Serialized query: {serialized_query}")
parsed_query = graphql.parse(serialized_query)
value = parsed_query.definitions[0].selection_set.selections[0].arguments[0].value.value
print(f"Value after serializing: {value!r}")
encoded = value.encode("utf8")
print(f"UTF-8 encoded: {encoded!r}")
Given the unicode character \U000a90e5 which is UTF-8 encodeable, passing this value to a DocumentNode tree and serializing the AST into text transforms the character into the surrogate pair \uda64\udce5. Converting this back into a DocumentNode via graphql.parse() and then extracting the argument value shows that it has been modified. And it is no longer UTF-8 encodeable. The last line in this snippet produces the error: UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
Environment:
- Python 3.8.5
- graphql-core 3.1.4
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)

Top Related StackOverflow Question
Regarding your question, there are good reasons (see also “Goals and Restrictions” in the README): First, GraphQL.js was originally made by Facebook and was written by Lee Byron who is also the co-author of GraphQL. Meanwhile, it is developed as part of the GraphQL foundation. So this library is probably the closest to the specs and it is also continually updated. Second, I needed to restrict the scope of the project in order to be able to maintain it for a long time in a sustainable way, since I work on this only in my spare time. It surely is possible to create a GraphQL implementation in Python that is more performant, more Pythonic or has more features, but that’s outside the declared scope of this project. Maybe an idea for another project.
The PR mentioned here has meanwhile be ported to GraphQL-core and is available since v3.2.0rc1.