[SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol
See original GitHub issue- Have you gone through our FAQs? Yes
Describe the problem you faced Unable to sync to external hive metastore via thrift protocol. Instead the sync seems to happen with the local hive store.
To Reproduce Run pyspark file as below which does the following
- connects to hive metastore using
hive.metastore.uris
using the thrift protocol and prints the existing tables: to show that the existing setup is able to connect to the metastore without any issues - generates a sample df using the generator from hudi, writes the df to a hudi table with hive sync enabled
- reconnects to the hive metastore and prints the tables. Can observe that the newly synced table does not show up
- On opening a new pyspark shell, can see that the required table shows up in the local spark warehouse dir:
spark.catalog.listTables()
- The below log shows
HiveMetastoreConnection version 1.2.1 using Spark classes
. Have tried connecting to the hive metastore using spark3.0.1
and hive2.3.7
jars and able to list the tables in the external metastore. However, unable to use it with hudi0.6.0
, and hence used spark2.4.7
for the below example.
from pyspark.sql import SparkSession
from pyspark.sql.functions import lit
metastore_uri = "thrift://localhost:9083"
spark = SparkSession.builder \
.appName("test-hudi-hive-sync") \
.enableHiveSupport() \
.config("hive.metastore.uris", metastore_uri) \
.getOrCreate()
print("Before {}".format(spark.catalog.listTables()))
tableName = "hive_hudi_sync"
basePath = "file:///tmp/hive_hudi_sync"
sc = spark.sparkContext
dataGen = sc._jvm.org.apache.hudi.QuickstartUtils.DataGenerator()
inserts = sc._jvm.org.apache.hudi.QuickstartUtils.convertToStringList(dataGen.generateInserts(10))
df = spark.read.json(spark.sparkContext.parallelize(inserts, 2))\
.withColumn("partitionpath", lit("partitionval"))
df.show()
hudi_options = {
'hoodie.table.name': tableName,
'hoodie.datasource.write.recordkey.field': 'uuid',
'hoodie.datasource.write.partitionpath.field': 'partitionpath',
'hoodie.datasource.write.table.name': tableName,
'hoodie.datasource.write.operation': 'insert',
'hoodie.datasource.write.precombine.field': 'ts',
'hoodie.upsert.shuffle.parallelism': 2,
'hoodie.insert.shuffle.parallelism': 2,
'hoodie.datasource.hive_sync.enable': True,
'hoodie.datasource.hive_sync.use_jdbc': False,
'hoodie.datasource.hive_sync.jdbcurl': metastore_uri,
'hoodie.datasource.hive_sync.partition_fields': 'partitionpath',
'hoodie.datasource.hive_sync.table': tableName,
'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor'
}
df\
.write.format("hudi"). \
options(**hudi_options). \
mode("overwrite"). \
save(basePath)
print("After {}".format(spark.catalog.listTables()))
Expected behavior
- Expecting the table
hive_hudi_sync
to show up in the external hive metastore after hive sync - The hive sync succeeds according to logs, but not able to see the new table in the metastore.
- Instead only seeing the existing tables in the hive metastore.
Environment Description
- Hudi version : 0.6
- Spark version : 2.4.7
- Hive version : metastore uses Hive 3.1.0.3.1.0.0-78
- Storage (HDFS/S3/GCS…) : S3, but same for local too
- Running on Docker? (yes/no) : No
Additional context Have attached the run logs:
- Can see that the native spark connect to hive metastore works. Am able to see the tables from the external hive metastore.
- However, in the Hudi-hive sync run, can observe that it is not making a connection to the external hive metastore, but is using the local spark warehouse dir
- Have removed the logs from
org.apache.spark
because they were adding to noise. If I need to attach it, do let me know.
.venv ❯ bin/spark-submit --master local[2] --deploy-mode client --packages org.apache.hudi:hudi-spark-bundle_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.4 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' hive-metastore-pyspark.py
Ivy Default Cache set to: /Users/rakeshramakrishnan/.ivy2/cache
The jars for the packages stored in: /Users/rakeshramakrishnan/.ivy2/jars
:: loading settings :: url = jar:file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hudi#hudi-spark-bundle_2.11 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-1ea4440b-ae8a-49c2-b638-7765bc189b84;1.0
confs: [default]
found org.apache.hudi#hudi-spark-bundle_2.11;0.6.0 in central
found org.apache.spark#spark-avro_2.11;2.4.4 in central
found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 300ms :: artifacts dl 6ms
:: modules in use:
org.apache.hudi#hudi-spark-bundle_2.11;0.6.0 from central in [default]
org.apache.spark#spark-avro_2.11;2.4.4 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 3 | 0 | 0 | 0 || 3 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-1ea4440b-ae8a-49c2-b638-7765bc189b84
confs: [default]
0 artifacts copied, 3 already retrieved (0kB/6ms)
294 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1245 [Thread-5] INFO org.apache.spark.SparkContext - Running Spark version 2.4.7
1268 [Thread-5] INFO org.apache.spark.SparkContext - Submitted application: test-hudi-hive-sync
1949 [Thread-5] INFO org.apache.spark.ui.SparkUI - Bound SparkUI to 0.0.0.0, and started at http://192.168.0.104:4040
1966 [Thread-5] INFO org.apache.spark.SparkContext - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar at spark://192.168.0.104:62151/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar with timestamp 1610556549207
1967 [Thread-5] INFO org.apache.spark.SparkContext - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar at spark://192.168.0.104:62151/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar with timestamp 1610556549208
1967 [Thread-5] INFO org.apache.spark.SparkContext - Added JAR file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://192.168.0.104:62151/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1610556549208
1992 [Thread-5] INFO org.apache.spark.SparkContext - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar with timestamp 1610556549232
1994 [Thread-5] INFO org.apache.spark.util.Utils - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.apache.hudi_hudi-spark-bundle_2.11-0.6.0.jar
2118 [Thread-5] INFO org.apache.spark.SparkContext - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar with timestamp 1610556549359
2118 [Thread-5] INFO org.apache.spark.util.Utils - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.4.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.apache.spark_spark-avro_2.11-2.4.4.jar
2126 [Thread-5] INFO org.apache.spark.SparkContext - Added file file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at file:///Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1610556549367
2126 [Thread-5] INFO org.apache.spark.util.Utils - Copying /Users/rakeshramakrishnan/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to /private/var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/spark-af0a1237-22bd-4a2e-a29c-2d8af9d40aae/userFiles-11f1df4b-2ad9-427f-8beb-2bbc0c8639c6/org.spark-project.spark_unused-1.0.0.jar
2180 [Thread-5] INFO org.apache.spark.executor.Executor - Starting executor ID driver on host localhost
2237 [Thread-5] INFO org.apache.spark.util.Utils - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62152.
2238 [Thread-5] INFO org.apache.spark.network.netty.NettyBlockTransferService - Server created on 192.168.0.104:62152
2584 [Thread-5] INFO org.apache.spark.sql.internal.SharedState - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse').
2585 [Thread-5] INFO org.apache.spark.sql.internal.SharedState - Warehouse path is 'file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse'.
3140 [Thread-5] INFO org.apache.spark.sql.hive.HiveUtils - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
3642 [Thread-5] INFO hive.metastore - Trying to connect to metastore with URI thrift://localhost:9083
4753 [Thread-5] INFO hive.metastore - Connected to metastore.
5590 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/46498653-2d37-4043-aa85-93083a524fc0_resources
5597 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0
5605 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0
5615 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/46498653-2d37-4043-aa85-93083a524fc0/_tmp_space.db
5618 [Thread-5] INFO org.apache.spark.sql.hive.client.HiveClientImpl - Warehouse location for Hive client (version 1.2.2) is file:/Users/rakeshramakrishnan/OSS/spark/spark-2.4.7-bin-hadoop2.7/spark-warehouse
15852 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 1 finished: hasNext at NativeMethodAccessorImpl.java:0, took 0.041704 s
Before [Table(name='****', database='default', description=None, tableType='MANAGED', isTemporary=False), .... tables in hive metastore]
17183 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 4 finished: showString at NativeMethodAccessorImpl.java:0, took 0.035620 s
+-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+
| begin_lat| begin_lon| driver| end_lat| end_lon| fare|partitionpath| rider| ts| uuid|
+-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+
| 0.4726905879569653|0.46157858450465483|driver-213| 0.754803407008858| 0.9671159942018241|34.158284716382845| partitionval|rider-213|0.0|f0476ada-9d26-4a6...|
| 0.6100070562136587| 0.8779402295427752|driver-213| 0.3407870505929602| 0.5030798142293655| 43.4923811219014| partitionval|rider-213|0.0|2507bfa1-01ec-471...|
| 0.5731835407930634| 0.4923479652912024|driver-213|0.08988581780930216|0.42520899698713666| 64.27696295884016| partitionval|rider-213|0.0|f3951634-256a-46f...|
|0.21624150367601136|0.14285051259466197|driver-213| 0.5890949624813784| 0.0966823831927115| 93.56018115236618| partitionval|rider-213|0.0|f0e3fdc7-685d-45d...|
| 0.40613510977307| 0.5644092139040959|driver-213| 0.798706304941517|0.02698359227182834|17.851135255091155| partitionval|rider-213|0.0|92233d5f-f684-43e...|
| 0.8742041526408587| 0.7528268153249502|driver-213| 0.9197827128888302| 0.362464770874404|19.179139106643607| partitionval|rider-213|0.0|f683850c-2940-4e0...|
| 0.1856488085068272| 0.9694586417848392|driver-213|0.38186367037201974|0.25252652214479043| 33.92216483948643| partitionval|rider-213|0.0|47af2a09-264b-4bd...|
| 0.0750588760043035|0.03844104444445928|driver-213|0.04376353354538354| 0.6346040067610669| 66.62084366450246| partitionval|rider-213|0.0|43223a73-70e6-4ec...|
| 0.651058505660742| 0.8192868687714224|driver-213|0.20714896002914462|0.06224031095826987| 41.06290929046368| partitionval|rider-213|0.0|851b928f-f368-49c...|
|0.11488393157088261| 0.6273212202489661|driver-213| 0.7454678537511295| 0.3954939864908973| 27.79478688582596| partitionval|rider-213|0.0|2683968f-4b48-477...|
+-------------------+-------------------+----------+-------------------+-------------------+------------------+-------------+---------+---+--------------------+
17293 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Initializing file:///tmp/hive_hudi_sync as hoodie table file:///tmp/hive_hudi_sync
17297 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17323 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17325 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17330 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17336 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17336 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished initializing Table of type COPY_ON_WRITE from file:///tmp/hive_hudi_sync
17365 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Registered avro schema : {
"type" : "record",
"name" : "hive_hudi_sync_record",
"namespace" : "hoodie.hive_hudi_sync",
"fields" : [ {
"name" : "begin_lat",
"type" : [ "double", "null" ]
}, {
"name" : "begin_lon",
"type" : [ "double", "null" ]
}, {
"name" : "driver",
"type" : [ "string", "null" ]
}, {
"name" : "end_lat",
"type" : [ "double", "null" ]
}, {
"name" : "end_lon",
"type" : [ "double", "null" ]
}, {
"name" : "fare",
"type" : [ "double", "null" ]
}, {
"name" : "partitionpath",
"type" : "string"
}, {
"name" : "rider",
"type" : [ "string", "null" ]
}, {
"name" : "ts",
"type" : [ "double", "null" ]
}, {
"name" : "uuid",
"type" : [ "string", "null" ]
} ]
}
17458 [Thread-5] INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 14.138735 ms
17521 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17521 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieClient - Starting Timeline service !!
17522 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Overriding hostIp to (192.168.0.104) found in spark-conf. It was null
17524 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :MEMORY
17525 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating in-memory based Table View
17537 [Thread-5] INFO org.eclipse.jetty.util.log - Logging initialized @18965ms to org.eclipse.jetty.util.log.Slf4jLog
17646 [Thread-5] INFO io.javalin.Javalin -
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
17647 [Thread-5] INFO io.javalin.Javalin - Starting Javalin ...
17768 [Thread-5] INFO io.javalin.Javalin - Listening on http://localhost:62161/
17768 [Thread-5] INFO io.javalin.Javalin - Javalin started in 125ms \o/
17768 [Thread-5] INFO org.apache.hudi.timeline.service.TimelineService - Starting Timeline server on port :62161
17768 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Started embedded timeline server at 192.168.0.104:62161
17782 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: isEmpty at HoodieSparkSqlWriter.scala:164
17783 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Got job 5 (isEmpty at HoodieSparkSqlWriter.scala:164) with 1 output partitions
17783 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Final stage: ResultStage 5 (isEmpty at HoodieSparkSqlWriter.scala:164)
17783 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Parents of final stage: List()
17784 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Missing parents: List()
17784 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Submitting ResultStage 5 (MapPartitionsRDD[24] at map at HoodieSparkSqlWriter.scala:139), which has no missing parents
17789 [dag-scheduler-event-loop] INFO org.apache.spark.storage.memory.MemoryStore - Block broadcast_5 stored as values in memory (estimated size 28.8 KB, free 366.2 MB)
17795 [dag-scheduler-event-loop] INFO org.apache.spark.storage.memory.MemoryStore - Block broadcast_5_piece0 stored as bytes in memory (estimated size 13.3 KB, free 366.2 MB)
17796 [dispatcher-event-loop-0] INFO org.apache.spark.storage.BlockManagerInfo - Added broadcast_5_piece0 in memory on 192.168.0.104:62152 (size: 13.3 KB, free: 366.3 MB)
17796 [dag-scheduler-event-loop] INFO org.apache.spark.SparkContext - Created broadcast 5 from broadcast at DAGScheduler.scala:1184
17797 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - Submitting 1 missing tasks from ResultStage 5 (MapPartitionsRDD[24] at map at HoodieSparkSqlWriter.scala:139) (first 15 tasks are for partitions Vector(0))
17797 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 5.0 with 1 tasks
17802 [dispatcher-event-loop-1] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 5.0 (TID 6, localhost, executor driver, partition 0, PROCESS_LOCAL, 9327 bytes)
17803 [Executor task launch worker for task 6] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 5.0 (TID 6)
17847 [Executor task launch worker for task 6] INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 14.283395 ms
17856 [Executor task launch worker for task 6] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 5.0 (TID 6). 2049 bytes result sent to driver
17863 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 5.0 (TID 6) in 65 ms on localhost (executor driver) (1/1)
17863 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Removed TaskSet 5.0, whose tasks have all completed, from pool
17865 [dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 5 (isEmpty at HoodieSparkSqlWriter.scala:164) finished in 0.079 s
17865 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 5 finished: isEmpty at HoodieSparkSqlWriter.scala:164, took 0.083123 s
17871 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17872 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17873 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17874 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17874 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync
17885 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
17886 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST
17886 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view
17890 [Thread-5] INFO org.apache.hudi.client.HoodieWriteClient - Generate a new instant time 20210113221924
17890 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17891 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17892 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17892 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17892 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync
17894 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
17897 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Creating a new instant [==>20210113221924__commit__REQUESTED]
17919 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
17921 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
17922 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
17923 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
17923 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync
17926 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20210113221924__commit__REQUESTED]]
17930 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST
17930 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view
17933 [Thread-5] INFO org.apache.hudi.client.AsyncCleanerService - Auto cleaning is not enabled. Not running cleaner now
17984 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: countByKey at WorkloadProfile.java:73
18237 [Thread-5] INFO org.apache.hudi.table.action.commit.BaseCommitActionExecutor - Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=10, numUpdates=0}, partitionStat={partitionval=WorkloadStat {numInserts=10, numUpdates=0}}}
18278 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.commit.requested
18291 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.inflight
18293 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - AvgRecordSize => 1024
18432 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: collectAsMap at UpsertPartitioner.java:216
18523 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 7 finished: collectAsMap at UpsertPartitioner.java:216, took 0.089976 s
18525 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - For partitionPath : partitionval Small Files => []
18525 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - After small file assignment: unassignedInserts => 10, totalInsertBuckets => 1, recordsPerBucket => 122880
18526 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - Total insert buckets for partition path partitionval => [InsertBucket {bucketNumber=0, weight=1.0}]
18526 [Thread-5] INFO org.apache.hudi.table.action.commit.UpsertPartitioner - Total Buckets :1, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=114dfaba-3a25-4278-9e7b-f2784642f76c, partitionPath=partitionval}},
Partition to insert buckets => {partitionval=[InsertBucket {bucketNumber=0, weight=1.0}]},
UpdateLocations mapped to buckets =>{}
18585 [Thread-5] INFO org.apache.hudi.table.action.commit.BaseCommitActionExecutor - Auto commit disabled for 20210113221924
18796 [pool-18-thread-1] INFO org.apache.hudi.common.util.queue.IteratorBasedQueueProducer - starting to buffer records
18797 [pool-18-thread-2] INFO org.apache.hudi.common.util.queue.BoundedInMemoryExecutor - starting consumer thread
18806 [pool-18-thread-1] INFO org.apache.hudi.common.util.queue.IteratorBasedQueueProducer - finished buffering records
18811 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
18838 [pool-18-thread-2] INFO org.apache.hudi.table.MarkerFiles - Creating Marker Path=file:/tmp/hive_hudi_sync/.hoodie/.temp/20210113221924/partitionval/114dfaba-3a25-4278-9e7b-f2784642f76c-0_0-10-14_20210113221924.parquet.marker.CREATE
18897 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
18900 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
18901 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19008 [pool-18-thread-2] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.gz]
19434 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19434 [pool-18-thread-2] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: ], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19434 [pool-18-thread-2] INFO org.apache.hudi.io.HoodieCreateHandle - New CreateHandle for partition :partitionval with fileId 114dfaba-3a25-4278-9e7b-f2784642f76c-0
19445 [pool-18-thread-2] INFO org.apache.hudi.io.HoodieCreateHandle - Closing the file 114dfaba-3a25-4278-9e7b-f2784642f76c-0 as we are done with all the records 10
19445 [pool-18-thread-2] INFO org.apache.parquet.hadoop.InternalParquetRecordWriter - Flushing mem columnStore to file. allocated memory: 2179
19559 [pool-18-thread-2] INFO org.apache.hudi.io.HoodieCreateHandle - CreateHandle for partitionPath partitionval fileID 114dfaba-3a25-4278-9e7b-f2784642f76c-0, took 747 ms.
19559 [pool-18-thread-2] INFO org.apache.hudi.common.util.queue.BoundedInMemoryExecutor - Queue Consumption is done; notifying producer threads
19573 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 8 finished: count at HoodieSparkSqlWriter.scala:389, took 0.980784 s
19574 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - No errors. Proceeding to commit the write.
19654 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: collect at AbstractHoodieWriteClient.java:98
19729 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committing 20210113221924
19729 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
19731 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19731 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
19732 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
19732 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
19733 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
19741 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
19742 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
19742 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync
19743 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20210113221924__commit__INFLIGHT]]
19744 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST
19744 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view
19864 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Marking instant complete [==>20210113221924__commit__INFLIGHT]
19864 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.inflight
19887 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?file:/tmp/hive_hudi_sync/.hoodie/20210113221924.commit
19887 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Completed [==>20210113221924__commit__INFLIGHT]
19945 [Thread-5] INFO org.apache.spark.SparkContext - Starting job: foreach at MarkerFiles.java:97
20000 [Thread-5] INFO org.apache.hudi.table.MarkerFiles - Removing marker directory at file:/tmp/hive_hudi_sync/.hoodie/.temp/20210113221924
20005 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
20006 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20007 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20008 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
20008 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync
20010 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]]
20011 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST
20011 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view
20019 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20210113221924__commit__REQUESTED], [==>20210113221924__commit__INFLIGHT], [20210113221924__commit__COMPLETED]]
20020 [Thread-5] INFO org.apache.hudi.table.HoodieTimelineArchiveLog - No Instants to archive
20021 [Thread-5] INFO org.apache.hudi.client.HoodieWriteClient - Auto cleaning is enabled. Running cleaner now
20021 [Thread-5] INFO org.apache.hudi.client.HoodieWriteClient - Cleaner started
20021 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
20022 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20022 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20023 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
20023 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:///tmp/hive_hudi_sync
20025 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]]
20025 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating View Manager with storage type :REMOTE_FIRST
20025 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote first table view
20032 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating remote view for basePath file:///tmp/hive_hudi_sync. Server=192.168.0.104:62161
20033 [Thread-5] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating InMemory based view for basePath file:///tmp/hive_hudi_sync
20066 [Thread-5] INFO org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView - Sending request : (http://192.168.0.104:62161/v1/hoodie/view/compactions/pending/?basepath=file%3A%2F%2F%2Ftmp%2Fhive_hudi_sync&lastinstantts=20210113221924&timelinehash=40aa81825cab43b9fe13e7d01121c08f8868e61fb6d6794c1fe9d0d7f43e449e)
20362 [qtp1464312295-98] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:///tmp/hive_hudi_sync
20363 [qtp1464312295-98] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20364 [qtp1464312295-98] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20364 [qtp1464312295-98] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:///tmp/hive_hudi_sync
20364 [qtp1464312295-98] INFO org.apache.hudi.common.table.view.FileSystemViewManager - Creating InMemory based view for basePath file:///tmp/hive_hudi_sync
20366 [qtp1464312295-98] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]]
20374 [qtp1464312295-98] INFO org.apache.hudi.timeline.service.FileSystemViewHandler - TimeTakenMillis[Total=13, Refresh=11, handle=2, Check=0], Success=true, Query=basepath=file%3A%2F%2F%2Ftmp%2Fhive_hudi_sync&lastinstantts=20210113221924&timelinehash=40aa81825cab43b9fe13e7d01121c08f8868e61fb6d6794c1fe9d0d7f43e449e, Host=192.168.0.104:62161, synced=false
20404 [Thread-5] INFO org.apache.hudi.table.action.clean.CleanPlanner - No earliest commit to retain. No need to scan partitions !!
20404 [Thread-5] INFO org.apache.hudi.table.action.clean.CleanActionExecutor - Nothing to clean here. It is already clean
20418 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20210113221924
20418 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Commit 20210113221924 successful!
20418 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Config.isInlineCompaction ? false
20419 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Compaction Scheduled is Option{val=null}
20420 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Syncing to Hive Metastore (URL: thrift://localhost:9083)
20547 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading HoodieTableMetaClient from file:/tmp/hive_hudi_sync
20547 [Thread-5] INFO org.apache.hudi.common.fs.FSUtils - Hadoop Configuration: fs.defaultFS: [file:///], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [org.apache.hadoop.fs.LocalFileSystem@19ee471f]
20548 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableConfig - Loading table properties from file:/tmp/hive_hudi_sync/.hoodie/hoodie.properties
20548 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from file:/tmp/hive_hudi_sync
20548 [Thread-5] INFO org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit timeline for file:/tmp/hive_hudi_sync
20550 [Thread-5] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20210113221924__commit__COMPLETED]]
20681 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
20712 [Thread-5] INFO org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called
20850 [Thread-5] INFO DataNucleus.Persistence - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
20850 [Thread-5] INFO DataNucleus.Persistence - Property datanucleus.cache.level2 unknown - will be ignored
21860 [Thread-5] INFO org.apache.hadoop.hive.metastore.ObjectStore - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
22725 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22726 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22901 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
22901 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
22975 [Thread-5] INFO DataNucleus.Query - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
22978 [Thread-5] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is DERBY
22981 [Thread-5] INFO org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore
23194 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - Added admin role in metastore
23196 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - Added public role in metastore
23238 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - No user is added in admin role, since config is empty
23333 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_all_databases
23334 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_all_databases
23355 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_functions: db=default pat=*
23355 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_functions: db=default pat=*
23357 [Thread-5] INFO DataNucleus.Datastore - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
23408 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table hive_hudi_sync with base path file:/tmp/hive_hudi_sync of type COPY_ON_WRITE
23408 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync
23408 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync
23478 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources
23485 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
23492 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
23500 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db
23513 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 85 ms
23517 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
23517 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
23517 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
23556 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
23559 [Thread-5] INFO hive.ql.parse.ParseDriver - Parsing command: create database if not exists default
24543 [Thread-5] INFO hive.ql.parse.ParseDriver - Parse Completed
24545 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=parse start=1610556570797 end=1610556571786 duration=989 from=org.apache.hadoop.hive.ql.Driver>
24548 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
24609 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
24609 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=semanticAnalyze start=1610556571789 end=1610556571850 duration=61 from=org.apache.hadoop.hive.ql.Driver>
24619 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
24619 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=compile start=1610556570758 end=1610556571860 duration=1102 from=org.apache.hadoop.hive.ql.Driver>
24619 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager
24619 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
24619 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting command(queryId=rakeshramakrishnan_20210113221930_0ed9aaa9-b2ee-4824-a8f4-178fda3cdd72): create database if not exists default
24653 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1610556570758 end=1610556571894 duration=1136 from=org.apache.hadoop.hive.ql.Driver>
24653 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
24653 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
24658 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode
24665 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: create_database: Database(name:default, description:null, locationUri:null, parameters:null, ownerName:rakeshramakrishnan, ownerType:USER)
24665 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=create_database: Database(name:default, description:null, locationUri:null, parameters:null, ownerName:rakeshramakrishnan, ownerType:USER)
24671 [Thread-5] ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler - AlreadyExistsException(message:Database default already exists)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy35.create_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy36.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
at org.apache.hadoop.hive.ql.exec.DDLTask.createDatabase(DDLTask.java:3895)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:271)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLs(HoodieHiveClient.java:384)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQLUsingHiveDriver(HoodieHiveClient.java:367)
at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:357)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:121)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
at org.apache.hudi.HoodieSparkSqlWriter$.org$apache$hudi$HoodieSparkSqlWriter$$syncHive(HoodieSparkSqlWriter.scala:321)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:363)
at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$metaSync$2.apply(HoodieSparkSqlWriter.scala:359)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:359)
at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:417)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:205)
at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:745)
24671 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=runTasks start=1610556571894 end=1610556571912 duration=18 from=org.apache.hadoop.hive.ql.Driver>
24671 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.execute start=1610556571860 end=1610556571912 duration=52 from=org.apache.hadoop.hive.ql.Driver>
OK
24672 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - OK
24672 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
24672 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=releaseLocks start=1610556571913 end=1610556571913 duration=0 from=org.apache.hadoop.hive.ql.Driver>
24672 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.run start=1610556570758 end=1610556571913 duration=1155 from=org.apache.hadoop.hive.ql.Driver>
24673 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [create database if not exists default]: 1159 ms
24691 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Hive table hive_hudi_sync is not found. Creating it
24712 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync'
24728 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources
24734 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
24741 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
24747 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db
24747 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 35 ms
24747 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
24747 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
24747 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
24748 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
24748 [Thread-5] INFO hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync'
24756 [Thread-5] INFO hive.ql.parse.ParseDriver - Parse Completed
24756 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=parse start=1610556571989 end=1610556571997 duration=8 from=org.apache.hadoop.hive.ql.Driver>
24756 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
24793 [Thread-5] INFO org.apache.hadoop.hive.ql.parse.CalcitePlanner - Starting Semantic Analysis
24802 [Thread-5] INFO org.apache.hadoop.hive.ql.parse.CalcitePlanner - Creating table default.hive_hudi_sync position=37
24812 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync
24812 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync
24813 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_database: default
24814 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_database: default
24832 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
24833 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=semanticAnalyze start=1610556571997 end=1610556572074 duration=77 from=org.apache.hadoop.hive.ql.Driver>
24833 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
24833 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=compile start=1610556571988 end=1610556572074 duration=86 from=org.apache.hadoop.hive.ql.Driver>
24833 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager
24833 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
24833 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting command(queryId=rakeshramakrishnan_20210113221931_93adf670-a860-4ed6-b873-35027fee5f4e): CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync'
24834 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1610556571988 end=1610556572075 duration=87 from=org.apache.hadoop.hive.ql.Driver>
24834 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
24834 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
24835 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode
24892 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: create_table: Table(tableName:hive_hudi_sync, dbName:default, owner:rakeshramakrishnan, createTime:1610556572, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:_hoodie_commit_time, type:string, comment:null), FieldSchema(name:_hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:_hoodie_record_key, type:string, comment:null), FieldSchema(name:_hoodie_partition_path, type:string, comment:null), FieldSchema(name:_hoodie_file_name, type:string, comment:null), FieldSchema(name:begin_lat, type:double, comment:null), FieldSchema(name:begin_lon, type:double, comment:null), FieldSchema(name:driver, type:string, comment:null), FieldSchema(name:end_lat, type:double, comment:null), FieldSchema(name:end_lon, type:double, comment:null), FieldSchema(name:fare, type:double, comment:null), FieldSchema(name:rider, type:string, comment:null), FieldSchema(name:ts, type:double, comment:null), FieldSchema(name:uuid, type:string, comment:null)], location:file:/tmp/hive_hudi_sync, inputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:partitionpath, type:string, comment:null)], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:false)
24893 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=create_table: Table(tableName:hive_hudi_sync, dbName:default, owner:rakeshramakrishnan, createTime:1610556572, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:_hoodie_commit_time, type:string, comment:null), FieldSchema(name:_hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:_hoodie_record_key, type:string, comment:null), FieldSchema(name:_hoodie_partition_path, type:string, comment:null), FieldSchema(name:_hoodie_file_name, type:string, comment:null), FieldSchema(name:begin_lat, type:double, comment:null), FieldSchema(name:begin_lon, type:double, comment:null), FieldSchema(name:driver, type:string, comment:null), FieldSchema(name:end_lat, type:double, comment:null), FieldSchema(name:end_lon, type:double, comment:null), FieldSchema(name:fare, type:double, comment:null), FieldSchema(name:rider, type:string, comment:null), FieldSchema(name:ts, type:double, comment:null), FieldSchema(name:uuid, type:string, comment:null)], location:file:/tmp/hive_hudi_sync, inputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:partitionpath, type:string, comment:null)], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:{}, groupPrivileges:null, rolePrivileges:null), temporary:false)
25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=runTasks start=1610556572075 end=1610556572305 duration=230 from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.execute start=1610556572074 end=1610556572305 duration=231 from=org.apache.hadoop.hive.ql.Driver>
OK
25064 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - OK
25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=releaseLocks start=1610556572305 end=1610556572305 duration=0 from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.run start=1610556571988 end=1610556572305 duration=317 from=org.apache.hadoop.hive.ql.Driver>
25064 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`hive_hudi_sync`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `begin_lat` double, `begin_lon` double, `driver` string, `end_lat` double, `end_lon` double, `fare` double, `rider` string, `ts` double, `uuid` string) PARTITIONED BY (`partitionpath` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'file:/tmp/hive_hudi_sync']: 317 ms
25065 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for hive_hudi_sync
25065 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
25066 [Thread-5] INFO org.apache.hudi.sync.common.AbstractSyncHoodieClient - Last commit time synced is not known, listing all partitions in file:/tmp/hive_hudi_sync,FS :org.apache.hadoop.fs.LocalFileSystem@19ee471f
25089 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 1
25089 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_partitions : db=default tbl=hive_hudi_sync
25090 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_partitions : db=default tbl=hive_hudi_sync
25122 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - New Partitions [partitionval]
25122 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Adding partitions 1 to table hive_hudi_sync
25138 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201_resources
25144 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
25150 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created local directory: /var/folders/v8/nx847jpd1452pyg64r15m_7w0000gn/T/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201
25156 [Thread-5] INFO org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/rakeshramakrishnan/2c4d7adf-7f48-47a8-b8d0-2ee0330fe201/_tmp_space.db
25157 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 35 ms
25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
25157 [Thread-5] INFO hive.ql.parse.ParseDriver - Parsing command: ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval'
25161 [Thread-5] INFO hive.ql.parse.ParseDriver - Parse Completed
25161 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=parse start=1610556572398 end=1610556572402 duration=4 from=org.apache.hadoop.hive.ql.Driver>
25161 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
25162 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync
25162 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Semantic Analysis Completed
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=semanticAnalyze start=1610556572402 end=1610556572588 duration=186 from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=compile start=1610556572398 end=1610556572588 duration=190 from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting command(queryId=rakeshramakrishnan_20210113221932_5b183376-504e-4518-9c26-46d6773384df): ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval'
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=TimeToSubmit start=1610556572398 end=1610556572588 duration=190 from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
25347 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
25348 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode
25348 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync
25348 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync
25375 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: add_partitions
25375 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=add_partitions
25427 [Thread-5] WARN hive.log - Updating partition stats fast for: hive_hudi_sync
25428 [Thread-5] WARN hive.log - Updated size to 437811
25479 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=runTasks start=1610556572588 end=1610556572720 duration=132 from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.execute start=1610556572588 end=1610556572721 duration=133 from=org.apache.hadoop.hive.ql.Driver>
OK
25480 [Thread-5] INFO org.apache.hadoop.hive.ql.Driver - OK
25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=releaseLocks start=1610556572721 end=1610556572721 duration=0 from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO org.apache.hadoop.hive.ql.log.PerfLogger - </PERFLOG method=Driver.run start=1610556572398 end=1610556572721 duration=323 from=org.apache.hadoop.hive.ql.Driver>
25480 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [ALTER TABLE `default`.`hive_hudi_sync` ADD IF NOT EXISTS PARTITION (`partitionpath`='partitionval') LOCATION 'file:/tmp/hive_hudi_sync/partitionval' ]: 323 ms
25482 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Changed Partitions []
25482 [Thread-5] INFO org.apache.hudi.hive.HoodieHiveClient - No partitions to change for hive_hudi_sync
25482 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=default tbl=hive_hudi_sync
25482 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=get_table : db=default tbl=hive_hudi_sync
25497 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: alter_table: db=default tbl=hive_hudi_sync newtbl=hive_hudi_sync
25497 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=alter_table: db=default tbl=hive_hudi_sync newtbl=hive_hudi_sync
25560 [Thread-5] INFO org.apache.hudi.hive.HiveSyncTool - Sync complete for hive_hudi_sync
25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Shutting down the object store...
25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=Shutting down the object store...
25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: Metastore shutdown complete.
25560 [Thread-5] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=rakeshramakrishnan ip=unknown-ip-addr cmd=Metastore shutdown complete.
25560 [Thread-5] INFO org.apache.hudi.HoodieSparkSqlWriter$ - Is Async Compaction Enabled ? false
25560 [Thread-5] INFO org.apache.hudi.client.AbstractHoodieClient - Stopping Timeline service !!
25560 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Closing Timeline server
25560 [Thread-5] INFO org.apache.hudi.timeline.service.TimelineService - Closing Timeline Service
25561 [Thread-5] INFO io.javalin.Javalin - Stopping Javalin ...
25575 [Thread-5] INFO io.javalin.Javalin - Javalin has stopped
25576 [Thread-5] INFO org.apache.hudi.timeline.service.TimelineService - Closed Timeline Service
25576 [Thread-5] INFO org.apache.hudi.client.embedded.EmbeddedTimelineService - Closed Timeline server
31552 [Thread-5] INFO org.apache.spark.scheduler.DAGScheduler - Job 13 finished: hasNext at NativeMethodAccessorImpl.java:0, took 0.021244 s
After [Table(name='****', database='default', description=None, tableType='MANAGED', isTemporary=False), .... tables in hive metastore]
31588 [Thread-1] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
31597 [Thread-1] INFO org.spark_project.jetty.server.AbstractConnector - Stopped Spark@2239cd56{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
31599 [Thread-1] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://192.168.0.104:4040
31665 [Thread-1] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext
######### New spark shell ################
~/OSS/spark/spark-2.4.7-bin-hadoop2.7 34s
.venv ❯ bin/pyspark
Python 3.7.5 (default, Dec 29 2020, 13:08:16)
SparkSession available as 'spark'.
>>> spark.catalog.listTables()
11457 [Thread-3] WARN org.apache.hadoop.hive.metastore.ObjectStore - Failed to get database global_temp, returning NoSuchObjectException
[Table(name='hive_hudi_sync', database='default', description=None, tableType='EXTERNAL', isTemporary=False)]
>>>
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (9 by maintainers)
Top Results From Across the Web
Hive metastore unable to connect Alert () - Cloudera Community
A Hive metastore client always uses the first URI to connect with the metastore server. If the metastore server becomes unreachable, the client ......
Read more >How to connect Spark SQL to remote Hive metastore (via thrift ...
Everything works fine When we use hive.metastore.uris property within spark code while creating SparkSession. But if we don't specify in code but specify...
Read more >How to troubleshoot several Apache Hive metastore problems
Problem 1: External metastore tables not available. When you inspect the driver logs, you see a stack trace that includes the error Required ......
Read more >Enabling Hive metastore high availability - IBM
With the default values, a metastore client would try to connect to any of the metastore URIs. If the connection fails to all...
Read more >Configuring an external metastore for Hive - Amazon EMR
If you're using Hive 3 and encounter too many connections to Hive metastore, configure the parameter datanucleus.connectionPool.maxPoolSize to have a smaller ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Will go ahead and close this one out as we have a solution proposed. Feel free to re-open if you are still encountering issues.