question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MySqlToHiveOperator , ”invalid path“ while loding the extracted csv to hive

See original GitHub issue

Apache Airflow version: 2.0

Kubernetes version (if you are using kubernetes) (use kubectl version):

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

What happened: I am trying to get data from mysql to hive , with the code:

t2 = MySqlToHiveOperator(
    task_id='mysql_to_hive',
    sql='select caseNo as case_num from t_ca_detected_case',
	hive_table='dsj.casenum_temp',
	create=True,
	recreate=True,
    delimiter=',',
	mysql_conn_id='mysql_dsj',
	hive_cli_conn_id='hive_cli_default',
    start_date=days_ago(2),
    owner='airflow',
    dag=dag
)

[2020-12-30 21:53:21,631] {taskinstance.py:1038} INFO - Executing <Task(MySqlToHiveOperator): mysql_to_hive> on 2020-12-30T13:53:20.354276+00:00 [2020-12-30 21:53:21,638] {standard_task_runner.py:51} INFO - Started process 54185 to run task [2020-12-30 21:53:21,643] {standard_task_runner.py:75} INFO - Running: [‘airflow’, ‘tasks’, ‘run’, ‘hello2’, ‘mysql_to_hive’, ‘2020-12-30T13:53:20.354276+00:00’, ‘–job-id’, ‘44’, ‘–pool’, ‘default_pool’, ‘–raw’, ‘–subdir’, ‘DAGS_FOLDER/test/hello2.py’, ‘–cfg-path’, ‘/tmp/tmp1qiqqk_2’] [2020-12-30 21:53:21,647] {standard_task_runner.py:76} INFO - Job 44: Subtask mysql_to_hive [2020-12-30 21:53:21,696] {logging_mixin.py:103} INFO - Running <TaskInstance: hello2.mysql_to_hive 2020-12-30T13:53:20.354276+00:00 [running]> on host serv98. [2020-12-30 21:53:21,744] {taskinstance.py:1232} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=hello2 AIRFLOW_CTX_TASK_ID=mysql_to_hive AIRFLOW_CTX_EXECUTION_DATE=2020-12-30T13:53:20.354276+00:00 AIRFLOW_CTX_DAG_RUN_ID=manual__2020-12-30T13:53:20.354276+00:00 [2020-12-30 21:53:21,763] {base.py:74} INFO - Using connection to: id: hive_cli_default. Host: 10.2.20.11, Port: 21066, Schema: default, Login: airflow, Password: XXXXXXXX, extra: XXXXXXXX [2020-12-30 21:53:21,765] {mysql_to_hive.py:141} INFO - Dumping MySQL query results to local file [2020-12-30 21:53:21,776] {base.py:74} INFO - Using connection to: id: mysql_dsj. Host: 192.168.2.178, Port: 3306, Schema: djzs_db, Login: root, Password: XXXXXXXX, extra: None [2020-12-30 21:53:21,798] {mysql_to_hive.py:161} INFO - Loading file into Hive [2020-12-30 21:53:21,798] {hive.py:445} INFO - DROP TABLE IF EXISTS dsj.casenum_temp; CREATE TABLE IF NOT EXISTS dsj.casenum_temp ( case_num STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ STORED AS textfile ; [2020-12-30 21:53:21,799] {hive.py:248} INFO - beeline -u “jdbc:hive2://10.2.20.11:21066/default” -n airflow -p Ygnet123# -hiveconf airflow.ctx.dag_id=hello2 -hiveconf airflow.ctx.task_id=mysql_to_hive -hiveconf airflow.ctx.execution_date=2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_run_id=manual__2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -f /tmp/airflow_hiveop_ka2jjicc/tmpnyz766bk [2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: Class path contains multiple SLF4J bindings. [2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] [2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] [2020-12-30 21:53:24,679] {hive.py:260} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. [2020-12-30 21:53:24,692] {hive.py:260} INFO - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] [2020-12-30 21:53:26,776] {hive.py:260} INFO - Connecting to jdbc:hive2://10.2.20.11:21066/default [2020-12-30 21:53:27,079] {hive.py:260} INFO - Connected to: Apache Hive (version 3.1.0) [2020-12-30 21:53:27,080] {hive.py:260} INFO - Driver: Hive JDBC (version 3.1.0) [2020-12-30 21:53:27,080] {hive.py:260} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2020-12-30 21:53:27,188] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598): USE default–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6 [2020-12-30 21:53:27,188] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,188] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false) [2020-12-30 21:53:27,188] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598); Time taken: 0.003 seconds [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598): USE default–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6 [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213941_8e02d3e0-56e1-4ad6-aae4-35c2b7baf598); Time taken: 0.004 seconds [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : OK [2020-12-30 21:53:27,189] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,190] {hive.py:260} INFO - No rows affected (0.079 seconds) [2020-12-30 21:53:27,364] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d): DROP TABLE IF EXISTS dsj.casenum_temp–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6 [2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false) [2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) [2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d); Time taken: 0.01 seconds [2020-12-30 21:53:27,364] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d): DROP TABLE IF EXISTS dsj.casenum_temp–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6 [2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode [2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213941_88bf0601-a224-4a6f-acd7-96bed9dc977d); Time taken: 0.149 seconds [2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : OK [2020-12-30 21:53:27,365] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,365] {hive.py:260} INFO - No rows affected (0.165 seconds) [2020-12-30 21:53:27,408] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > . . . . . . . . . . . . . . . . . . . . > INFO : Compiling command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c): CREATE TABLE IF NOT EXISTS dsj.casenum_temp ( [2020-12-30 21:53:27,408] {hive.py:260} INFO - case_num STRING) [2020-12-30 21:53:27,408] {hive.py:260} INFO - ROW FORMAT DELIMITED [2020-12-30 21:53:27,408] {hive.py:260} INFO - FIELDS TERMINATED BY ‘,’ [2020-12-30 21:53:27,408] {hive.py:260} INFO - STORED AS textfile–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6 [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false) [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c); Time taken: 0.004 seconds [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c): CREATE TABLE IF NOT EXISTS dsj.casenum_temp ( [2020-12-30 21:53:27,409] {hive.py:260} INFO - case_num STRING) [2020-12-30 21:53:27,409] {hive.py:260} INFO - ROW FORMAT DELIMITED [2020-12-30 21:53:27,409] {hive.py:260} INFO - FIELDS TERMINATED BY ‘,’ [2020-12-30 21:53:27,409] {hive.py:260} INFO - STORED AS textfile–0; Current sessionId=701dd932-ae51-4edc-84bf-c1575a5536a6 [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode [2020-12-30 21:53:27,409] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213941_9258986c-dbe2-4f5f-8a45-41e9db9aea4c); Time taken: 0.028 seconds [2020-12-30 21:53:27,410] {hive.py:260} INFO - INFO : OK [2020-12-30 21:53:27,410] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:27,410] {hive.py:260} INFO - No rows affected (0.041 seconds) [2020-12-30 21:53:27,418] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> 0: jdbc:hive2://10.2.20.11:21066/default> Closing: 0: jdbc:hive2://10.2.20.11:21066/default [2020-12-30 21:53:27,445] {hive.py:459} INFO - LOAD DATA LOCAL INPATH ‘/tmp/tmpscnxvwz8’ OVERWRITE INTO TABLE dsj.casenum_temp ;

[2020-12-30 21:53:27,447] {hive.py:248} INFO - beeline -u “jdbc:hive2://10.2.20.11:21066/default” -n airflow -p Ygnet123# -hiveconf airflow.ctx.dag_id=hello2 -hiveconf airflow.ctx.task_id=mysql_to_hive -hiveconf airflow.ctx.execution_date=2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_run_id=manual__2020-12-30T13:53:20.354276+00:00 -hiveconf airflow.ctx.dag_owner=airflow -hiveconf airflow.ctx.dag_email= -f /tmp/airflow_hiveop_hpm6bvo4/tmpuszfib5a [2020-12-30 21:53:30,344] {hive.py:260} INFO - SLF4J: Class path contains multiple SLF4J bindings. [2020-12-30 21:53:30,345] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] [2020-12-30 21:53:30,345] {hive.py:260} INFO - SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] [2020-12-30 21:53:30,345] {hive.py:260} INFO - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. [2020-12-30 21:53:30,349] {hive.py:260} INFO - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] [2020-12-30 21:53:32,333] {hive.py:260} INFO - Connecting to jdbc:hive2://10.2.20.11:21066/default [2020-12-30 21:53:32,673] {hive.py:260} INFO - Connected to: Apache Hive (version 3.1.0) [2020-12-30 21:53:32,674] {hive.py:260} INFO - Driver: Hive JDBC (version 3.1.0) [2020-12-30 21:53:32,674] {hive.py:260} INFO - Transaction isolation: TRANSACTION_REPEATABLE_READ [2020-12-30 21:53:32,785] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Semantic Analysis Completed (retrial = false) [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Completed compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.003 seconds [2020-12-30 21:53:32,785] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Starting task [Stage-0:DDL] in serial mode [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Completed executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.004 seconds [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : OK [2020-12-30 21:53:32,786] {hive.py:260} INFO - INFO : Concurrency mode is disabled, not creating a lock manager [2020-12-30 21:53:32,787] {hive.py:260} INFO - No rows affected (0.082 seconds) [2020-12-30 21:53:32,859] {hive.py:260} INFO - 0: jdbc:hive2://10.2.20.11:21066/default> Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ‘’/tmp/tmpscnxvwz8’‘: No files matching path file:/tmp/tmpscnxvwz8 (state=42000,code=40000) [2020-12-30 21:53:32,862] {hive.py:260} INFO - Closing: 0: jdbc:hive2://10.2.20.11:21066/default [2020-12-30 21:53:32,905] {taskinstance.py:1396} ERROR - SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/app/fi65clients/Hive/Beeline/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://10.2.20.11:21066/default Connected to: Apache Hive (version 3.1.0) Driver: Hive JDBC (version 3.1.0) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://10.2.20.11:21066/default> INFO : Compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.003 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332): USE default–0; Current sessionId=c4df2d8c-553b-41a4-9998-c1b3572fa0c8 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=omm_20201230213946_f6efcb01-0914-41c2-a8a6-88e8420c4332); Time taken: 0.004 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (0.082 seconds) 0: jdbc:hive2://10.2.20.11:21066/default> Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ‘’/tmp/tmpscnxvwz8’': No files matching path file:/tmp/tmpscnxvwz8 (state=42000,code=40000) Closing: 0: jdbc:hive2://10.2.20.11:21066/default

What you expected to happen: Load mysql table to hive

file should be extracted but no found in expected path

How to reproduce it:

Anything else we need to know:

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
pateashcommented, Apr 29, 2021

@vikramkoka @eladkal could you please assign this to me. I would like to work on it.

0reactions
fjmacagnocommented, Oct 22, 2021

I am seeing the same issue using HiveCliHook.load_df():

    for design in designs:
        for platformSegment in design["platformSegments"]:
            design_rows.append(
                [design["designId"], design["fullName"], design["taxonomyId"], design["taxonomySegmentId"],
                 platformSegment["platformId"], platformSegment["platformSegmentId"]])

    hive = HiveCliHook(hive_cli_conn_id=constants.HIVE_CONN)
    hive.load_df(pandas.DataFrame(design_rows), table=segment_definition_table, delimiter="\t", create=False)

Error: Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/tmp/airflow_hiveop_twe9yc4g/tmpervj2lgz'': No files matching path file:/tmp/airflow_hiveop_twe9yc4g/tmpervj2lgz (state=42000,code=40000)

Read more comments on GitHub >

github_iconTop Results From Across the Web

An Error Is Reported When the "load data local inpath ... - 华为云
The following errors are reported when the load data local inpath command is ... SemanticException Line 1:23 Invalid path ''/tmp/input/mapdata'': No files ...
Read more >
SnapLogic Documentation - Confluence
Hive Snap Pack: this includes only a Hive Execute Snap. ... Resolved an issue with Directory Browser Snap returning Invalid Path.
Read more >
Release Notes - Apache Kylin
[KYLIN-4729] - The hive table will be overwrited when add csv table with the same name ... [KYLIN-3734] - UT Failed:Invalid path string...
Read more >
Release Notes - Cloudera Runtime 7.1.8
An improvised method to import Hive metadata into Atlas is now ... this dynamic parent queue, it should be detected as an invalid...
Read more >
Amazon EMR – AWS Big Data Blog
To support Apache Hive style partitions in Hudi, we have to enable it in the ... needs to be set to true because...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found