BigQuery upload_from_file unicode file-like must be opened in binary-mode if it's more than RESUMABLE_UPLOAD_THRESHOLD, otherwise str-mode.
See original GitHub issueSteps to reproduce
import sys
from gcloud.bigquery import Client, SchemaField
from gcloud.bigquery.job import CreateDisposition, WriteDisposition
csv_filename = 'sandwiches.csv'
if len(sys.argv) > 1:
csv_filename = sys.argv[1]
bq = Client()
ds = bq.dataset('test_unicode')
ds.location = 'EU'
def test_unicode_upload(filename, mode):
if not ds.exists():
print('Creating dataset: {}'.format(ds.name))
ds.create()
fields = [
SchemaField('name', 'STRING'),
SchemaField('main_ingredient', 'STRING'),
]
table = ds.table('sandwiches', fields)
print('Uploading CSV: {}, mode={!r}'.format(csv_filename, mode))
table.upload_from_file(
open(filename, mode),
encoding='UTF-8',
source_format='CSV',
write_disposition=WriteDisposition.WRITE_TRUNCATE,
create_disposition=CreateDisposition.CREATE_IF_NEEDED)
test_unicode_upload(csv_filename, 'r') # Works
test_unicode_upload(csv_filename, 'rb')
# Fails in http.client.HTTPConnection()._send_request
#
# /usr/lib/python3.4/http/client.py in _send_request(self, method, url,
# body, headers)
# 1178 if isinstance(body, str):
# 1179 # RFC 2616 Section 3.7.1 says that text default has a
# 1180 # default charset of iso-8859-1.
# -> 1181 body = body.encode('iso-8859-1')
# 1182 self.endheaders(body)
#
# UnicodeEncodeError: 'latin-1' codec can't encode characters in position
#649-650: ordinal not in range(256)
sandwiches.csv
name,main_ingredient
Räksmörgås,Räkor
Baguette,Bröd
Expected behavior
When i send in a binary-mode file-like I expect upload_from_file to pass the data through to BigQuery as-is, and that the BigQuery load job will decode it for me using encoding=.
Enviroment
$ python --version
Python 3.4.3+
$ pip freeze | egrep 'httplib2|gcloud'
gcloud==0.13.0
httplib2==0.9.2
Issue Analytics
- State:
- Created 7 years ago
- Comments:18 (16 by maintainers)
Top Results From Across the Web
Conversion rules | BigQuery - Google Cloud
Google Standard SQL for BigQuery supports conversion. Conversion includes, but is not limited to, casting, coercion, and supertyping.
Read more >function to convert unicode in bigquery - Stack Overflow
The ampersand character shows up correctly, but I have a table, with a column of STRING with unicode character in its values, ...
Read more >BigQuery Regex and Pattern Matching: 6 Major Things
Both the inputs must be of the same type (STRING or BYTES) or else it would return an error if the Regex is...
Read more >BigQuery Data Types in BigQuery - PopSQL
BigQuery supports all common data types found in Standard SQL. Google Cloud has verbose documentation, but here it is presented short and sweet:...
Read more >SQL CAST Function | BigQuery Syntax and Examples - Count
In the example above date1 returns a NULL value since it's not in the right format. Similar rules apply for converting STRINGs to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@joar Thanks for your efforts. #1779 has my re-working of your patch:
gcloud._helpers._to_bytes@thobrla gsutil doesn’t seem to run on Python 3, so it might be unaffected by this bug.
@tseaver I think that the use of
six.StringIOinUpload._configure_multiport_requestmight be close to the root cause of this issue.six.StringIO == six.BytesIO == StringIO.StringIOsix.StringIO == io.StringIO != io.BytesIO1.