Docker Quickstart - Sample Data Loading Error
See original GitHub issueDescribe the bug
Running the Sample data ingestion commands produces an error:
Collecting avro-python3==1.8.2 Downloading avro-python3-1.8.2.tar.gz (36 kB) ERROR: Command errored out with exit status 1: … Complete output (7 lines): Traceback (most recent call last): … File “/tmp/pip-install-gmtSqW/avro-python3/setup.py”, line 114, in Main (‘Python version >= 3 required, got %r’ % sys.version_info) AssertionError: Python version >= 3 required, got sys.version_info(major=2, minor=7, micro=17, releaselevel=‘final’, serial=0)
To Reproduce
Steps to reproduce the behavior:
- Successfully run the Quickstart.sh script
- Run the commands to import sample data:
docker build -t ingestion -f docker/ingestion/Dockerfile . && cd docker/ingestion && docker-compose up
Expected behavior
Sample Data is loaded into DataHub
Desktop (please complete the following information):
- OS: Ubuntu 18.04.4 LTS - 4.15.0-91-generic #92-Ubuntu
- Browser N/A
- Version 18.04.4
Additional context
I got around the above error by modifying the datahub/docker/ingestion/Dockerfile line “–from=python:2.7” --> “–from=python:3.6” and rerunning the docker build ingestion command again.
The container was successfully built but when the container started, I received a different error message:
Attaching to ingestion ingestion | Traceback (most recent call last): ingestion | File “mce_cli.py”, line 3, in <module> ingestion | from confluent_kafka import avro ingestion | File “/root/.local/lib/python3.6/site-packages/confluent_kafka/avro/init.py”, line 9, in <module> ingestion | from confluent_kafka.avro.cached_schema_registry_client import CachedSchemaRegistryClient ingestion | File “/root/.local/lib/python3.6/site-packages/confluent_kafka/avro/cached_schema_registry_client.py”, line 27, in <module> ingestion | from requests import Session, utils ingestion | ModuleNotFoundError: No module named ‘requests’ ingestion exited with code 1`
Then I got around that error by adding a line to the the datahub/metadata-ingestion/mce-cli/requirements.txt -> “requests==2.23.0”
Again, container built, ingestion container ran, but I received the error:
ingestion | avro.io.AvroTypeException: The datum {‘auditHeader’: None, ‘proposedSnapshot’: (‘com.linkedin.pegasus2avro.metadata.snapshot.CorpUserSnapshot’, {‘urn’: ‘urn:li:corpuser:datahub’, ‘aspects’: [{‘active’: True, ‘displayName’: ‘Data Hub’, ‘fullName’: ‘Data Hub’, ‘email’: ‘datahub@linkedin.com’, ‘title’: ‘CEO’}, {}]}), ‘proposedDelta’: None} is not an example of the schema
At this point I gave up, because I was mucking with too many components.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (4 by maintainers)
Top GitHub Comments
solved too. step1: edit ingestion/Dockerfile, change python from 2.7 to 3.6
step2: change /datahub/metadata-ingestion/mce-cl/requirements.txt as following
This works for me now. thank you!