question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to ingest data from BigQuery sink

See original GitHub issue

I’ve followed the Quickstart guide and loaded sample data. The UI is up and running on port 9001 so all good there. Now I want to ingest data from my BigQuery instance but I keep getting an error related to SQLAlchemy. I’ve tried authenticating with gcloud but no luck.

Can anyone help with this? Happy to provide more info it’ll help.

Full log:

(venv) root@datahub-test:~/datahub/metadata-ingestion# pip freeze
appdirs==1.4.4
attrs==20.3.0
avro-gen @ https://api.github.com/repos/rbystrit/avro_gen/tarball/master
avro-python3==1.10.1
bcrypt==3.2.0
black==20.8b1
cached-property==1.5.2
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
confluent-kafka==1.6.0
coverage==5.5
cryptography==3.4.6
-e git+https://github.com/linkedin/datahub.git@cda1ce458974dda1cdc59b2c0957369e9524ea5e#egg=datahub&subdirectory=metadata-ingestion
deepdiff==5.2.3
distro==1.5.0
docker==4.4.4
docker-compose==1.28.5
dockerpty==0.4.1
docopt==0.6.2
fastavro==1.3.2
flake8==3.8.4
frozendict==1.2
future==0.18.2
google-api-core==1.26.0
google-auth==1.27.0
google-cloud-bigquery==2.10.0
google-cloud-core==1.6.0
google-crc32c==1.1.2
google-resumable-media==1.2.0
googleapis-common-protos==1.53.0
grpcio==1.36.0
idna==2.10
iniconfig==1.1.1
isort==5.7.0
jsonschema==3.2.0
mccabe==0.6.1
mypy==0.812
mypy-extensions==0.4.3
ordered-set==4.0.2
packaging==20.9
paramiko==2.7.2
pathspec==0.8.1
pkg-resources==0.0.0
pluggy==0.13.1
proto-plus==1.14.2
protobuf==3.15.3
py==1.10.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybigquery==0.5.0
pycodestyle==2.6.0
pycparser==2.20
pydantic==1.8
pyflakes==2.2.0
PyMySQL==1.0.2
PyNaCl==1.4.0
pyparsing==2.4.7
pyrsistent==0.17.3
pytest==6.2.2
pytest-cov==2.11.1
pytest-docker==0.10.1
python-dotenv==0.15.0
python-tds==1.10.0
python3-ldap==0.9.8.4
pytz==2021.1
PyYAML==5.4.1
regex==2020.11.13
requests==2.25.1
rsa==4.7.2
six==1.15.0
SQLAlchemy==1.3.23
sqlalchemy-pytds==0.3.1
sqlalchemy-stubs==0.4
texttable==1.6.3
toml==0.10.2
typed-ast==1.4.2
typing-extensions==3.7.4.3
tzlocal==2.1
urllib3==1.26.3
websocket-client==0.57.0
(venv) root@datahub-test:~/datahub/metadata-ingestion# ls /
bigquery.yml  boot  etc   lib    lib64   lost+found  mnt  proc  run                  sbin  srv  tmp  var
bin           dev   home  lib32  libx32  media       opt  root  sa_credentials.json  snap  sys  usr
(venv) root@datahub-test:~/datahub/metadata-ingestion# cat /bigquery.yml
source:
  type: bigquery
  config:
    project_id: <PROJECT_ID>
    options:
      credential_path: "/sa_credentials.json"
sink:
  type: console
(venv) root@datahub-test:~/datahub/metadata-ingestion# datahub ingest -c /bigquery.yml
[2021-03-02 09:15:27,558] DEBUG    {datahub.entrypoints:64} - Using config: {'source': {'type': 'bigquery', 'config': {'project_id': '<PROJECT_ID>', 'options': {'credential_path': '/sa_credentials.json'}}}, 'sink': {'type': 'console'}}
[2021-03-02 09:15:27,558] DEBUG    {datahub.ingestion.run.pipeline:63} - Source type:bigquery,<class 'datahub.ingestion.source.bigquery.BigQuerySource'> configured
[2021-03-02 09:15:27,558] DEBUG    {datahub.ingestion.run.pipeline:69} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
[2021-03-02 09:15:27,558] DEBUG    {datahub.ingestion.source.sql_common:152} - sql_alchemy_url=bigquery://<PROJECT_ID>
[2021-03-02 09:15:27,773] DEBUG    {google.auth._default:203} - Checking None for explicit credentials as part of auth process...
[2021-03-02 09:15:27,774] DEBUG    {google.auth._default:181} - Checking Cloud SDK credentials as part of auth process...
[2021-03-02 09:15:27,774] DEBUG    {google.auth._default:187} - Cloud SDK credentials not found on disk; not using them
[2021-03-02 09:15:27,774] DEBUG    {google.auth._default:223} - Checking for App Engine runtime as part of auth process...
[2021-03-02 09:15:27,775] DEBUG    {google.auth._default:234} - No App Engine library was found so cannot authentication via App Engine Identity Credentials.
[2021-03-02 09:15:27,776] DEBUG    {google.auth.transport._http_client:104} - Making request: GET http://169.254.169.254
[2021-03-02 09:15:27,778] DEBUG    {google.auth.transport._http_client:104} - Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
Traceback (most recent call last):
  File "/root/datahub/metadata-ingestion/venv/bin/datahub", line 11, in <module>
    load_entry_point('datahub', 'console_scripts', 'datahub')()
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/root/datahub/metadata-ingestion/src/datahub/entrypoints.py", line 70, in ingest
    pipeline.run()
  File "/root/datahub/metadata-ingestion/src/datahub/ingestion/run/pipeline.py", line 81, in run
    for wu in self.source.get_workunits():
  File "/root/datahub/metadata-ingestion/src/datahub/ingestion/source/sql_common.py", line 153, in get_workunits
    engine = create_engine(url, **sql_config.options)
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/__init__.py", line 520, in create_engine
    return strategy.create(*args, **kwargs)
  File "/root/datahub/metadata-ingestion/venv/lib/python3.8/site-packages/sqlalchemy/engine/strategies.py", line 164, in create
    raise TypeError(
TypeError: Invalid argument(s) 'credential_path' sent to create_engine(), using configuration BigQueryDialect/QueuePool/Engine.  Please check that the keyword arguments are appropriate for this combination of components.
(venv) root@datahub-test:~/datahub/metadata-ingestion# 

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
user5651commented, Mar 4, 2021

@hsheth2 Nice! Confirming your fix works. Thanks for all the help on this!

1reaction
hsheth2commented, Mar 3, 2021

@user5651 thanks for sending that along. I just pushed an tiny update to that branch which should fix it.

Can you try running git pull and then retry the ingestion?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot routing and sinks - Logging - Google Cloud
Logs streamed to the table in your BigQuery dataset don't match the current table's schema. Common issues include trying to route log entries...
Read more >
unable to list google cloud logging sink for Bigquery
I have created a google cloud logging sink for Bigquery. (Reference) And I was able to share the BQ dataset with the Service...
Read more >
[JIRA] (PLUGIN-678) BigQuery sink is not able to write to ...
If the input schema of bigquery sink contains integer field, the bigquery sink is not able to write the record to existing tables....
Read more >
Google Cloud BigQuery Sink Connector for Confluent Cloud
The connector supports Avro, JSON Schema, Protobuf, or JSON (schemaless) input data formats. Schema Registry must be enabled to use a Schema Registry-based ......
Read more >
Chapter 4. Loading Data into BigQuery - O'Reilly
BigQuery does not charge for loading data. Ingestion happens on a set of workers that is distinct from the cluster providing the slots...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found