Unable to ingest data from BigQuery sink
See original GitHub issueI’ve followed the Quickstart guide and loaded sample data. The UI is up and running on port 9001 so all good there. Now I want to ingest data from my BigQuery instance but I keep getting an error related to SQLAlchemy. I’ve tried authenticating with gcloud but no luck.
Can anyone help with this? Happy to provide more info it’ll help.
Full log:
(venv) root@datahub-test:~/datahub/metadata-ingestion# pip freeze
appdirs==1.4.4
attrs==20.3.0
avro-gen @ https://api.github.com/repos/rbystrit/avro_gen/tarball/master
avro-python3==1.10.1
bcrypt==3.2.0
black==20.8b1
cached-property==1.5.2
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
confluent-kafka==1.6.0
coverage==5.5
cryptography==3.4.6
-e git+https://github.com/linkedin/datahub.git@cda1ce458974dda1cdc59b2c0957369e9524ea5e#egg=datahub&subdirectory=metadata-ingestion
deepdiff==5.2.3
distro==1.5.0
docker==4.4.4
docker-compose==1.28.5
dockerpty==0.4.1
docopt==0.6.2
fastavro==1.3.2
flake8==3.8.4
frozendict==1.2
future==0.18.2
google-api-core==1.26.0
google-auth==1.27.0
google-cloud-bigquery==2.10.0
google-cloud-core==1.6.0
google-crc32c==1.1.2
google-resumable-media==1.2.0
googleapis-common-protos==1.53.0
grpcio==1.36.0
idna==2.10
iniconfig==1.1.1
isort==5.7.0
jsonschema==3.2.0
mccabe==0.6.1
mypy==0.812
mypy-extensions==0.4.3
ordered-set==4.0.2
packaging==20.9
paramiko==2.7.2
pathspec==0.8.1
pkg-resources==0.0.0
pluggy==0.13.1
proto-plus==1.14.2
protobuf==3.15.3
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybigquery==0.5.0
pycodestyle==2.6.0
pycparser==2.20
pydantic==1.8
pyflakes==2.2.0
PyMySQL==1.0.2
PyNaCl==1.4.0
pyparsing==2.4.7
pyrsistent==0.17.3
pytest==6.2.2
pytest-cov==2.11.1
pytest-docker==0.10.1
python-dotenv==0.15.0
python-tds==1.10.0
python3-ldap==0.9.8.4
pytz==2021.1
PyYAML==5.4.1
regex==2020.11.13
requests==2.25.1
rsa==4.7.2
six==1.15.0
SQLAlchemy==1.3.23
sqlalchemy-pytds==0.3.1
sqlalchemy-stubs==0.4
texttable==1.6.3
toml==0.10.2
typed-ast==1.4.2
typing-extensions==3.7.4.3
tzlocal==2.1
urllib3==1.26.3
websocket-client==0.57.0
(venv) root@datahub-test:~/datahub/metadata-ingestion# ls /
bigquery.yml boot etc lib lib64 lost+found mnt proc run sbin srv tmp var
bin dev home lib32 libx32 media opt root sa_credentials.json snap sys usr
(venv) root@datahub-test:~/datahub/metadata-ingestion# cat /bigquery.yml
source:
type: bigquery
config:
project_id: <PROJECT_ID>
options:
credential_path: "/sa_credentials.json"
sink:
type: console
(venv) root@datahub-test:~/datahub/metadata-ingestion# datahub ingest -c /bigquery.yml
[2021-03-02 09:15:27,558] DEBUG {datahub.entrypoints:64} - Using config: {'source': {'type': 'bigquery', 'config': {'project_id': '<PROJECT_ID>', 'options': {'credential_path': '/sa_credentials.json'}}}, 'sink': {'type': 'console'}}
[2021-03-02 09:15:27,558] DEBUG {datahub.ingestion.run.pipeline:63} - Source type:bigquery,<class 'datahub.ingestion.source.bigquery.BigQuerySource'> configured
[2021-03-02 09:15:27,558] DEBUG {datahub.ingestion.run.pipeline:69} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
[2021-03-02 09:15:27,558] DEBUG {datahub.ingestion.source.sql_common:152} - sql_alchemy_url=bigquery://<PROJECT_ID>
[2021-03-02 09:15:27,773] DEBUG {google.auth._default:203} - Checking None for explicit credentials as part of auth process...
[2021-03-02 09:15:27,774] DEBUG {google.auth._default:181} - Checking Cloud SDK credentials as part of auth process...
[2021-03-02 09:15:27,774] DEBUG {google.auth._default:187} - Cloud SDK credentials not found on disk; not using them
[2021-03-02 09:15:27,774] DEBUG {google.auth._default:223} - Checking for App Engine runtime as part of auth process...
[2021-03-02 09:15:27,775] DEBUG {google.auth._default:234} - No App Engine library was found so cannot authentication via App Engine Identity Credentials.
[2021-03-02 09:15:27,776] DEBUG {google.auth.transport._http_client:104} - Making request: GET http://169.254.169.254
[2021-03-02 09:15:27,778] DEBUG {google.auth.transport._http_client:104} - Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
Traceback (most recent call last):
File "/root/datahub/metadata-ingestion/venv/bin/datahub", line 11, in <module>
load_entry_point('datahub', 'console_scripts', 'datahub')()
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/root/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 70, in ingest
pipeline.run()
File "/root/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 81, in run
for wu in self.source.get_workunits():
File "/root/datahub/metadata-ingestion/src/datahub/ingestion/source/sql_common.py", line 153, in get_workunits
engine = create_engine(url, **sql_config.options)
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/__init__.py", line 520, in create_engine
return strategy.create(*args, **kwargs)
File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 164, in create
raise TypeError(
TypeError: Invalid argument(s) 'credential_path' sent to create_engine(), using configuration BigQueryDialect/QueuePool/Engine. Please check that the keyword arguments are appropriate for this combination of components.
(venv) root@datahub-test:~/datahub/metadata-ingestion#
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Troubleshoot routing and sinks - Logging - Google Cloud
Logs streamed to the table in your BigQuery dataset don't match the current table's schema. Common issues include trying to route log entries...
Read more >unable to list google cloud logging sink for Bigquery
I have created a google cloud logging sink for Bigquery. (Reference) And I was able to share the BQ dataset with the Service...
Read more >[JIRA] (PLUGIN-678) BigQuery sink is not able to write to ...
If the input schema of bigquery sink contains integer field, the bigquery sink is not able to write the record to existing tables....
Read more >Google Cloud BigQuery Sink Connector for Confluent Cloud
The connector supports Avro, JSON Schema, Protobuf, or JSON (schemaless) input data formats. Schema Registry must be enabled to use a Schema Registry-based ......
Read more >Chapter 4. Loading Data into BigQuery - O'Reilly
BigQuery does not charge for loading data. Ingestion happens on a set of workers that is distinct from the cluster providing the slots...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@hsheth2 Nice! Confirming your fix works. Thanks for all the help on this!
@user5651 thanks for sending that along. I just pushed an tiny update to that branch which should fix it.
Can you try running
git pull
and then retry the ingestion?