Make hive metastore configurable didn't work actually for TException was converted to PrestoException
See original GitHub issue@findepi @electrum
I have made ThriftHiveMetastore retry policy configurable in my pr(https://github.com/prestosql/presto/pull/240)
I use iptables
to simulate the timeout successfully; But I find that whatever I setup the retry attempts, RetryDriver
didn’t retry;
Finally, from the source code, I got the root cause:
When timeout of getAllTables()
happened, the TException
happened(In fact it is TTransportException) and this exception is converted to PrestoException
:
Below is StaticHiveCluster.createMetastoreClient()
which get the TTransportException
and convert it to PrestoException
:
@Override
public HiveMetastoreClient createMetastoreClient()
{
List<HostAndPort> metastores = new ArrayList<>(addresses);
Collections.shuffle(metastores.subList(1, metastores.size()));
TException lastException = null;
for (HostAndPort metastore : metastores) {
try {
HiveMetastoreClient client = clientFactory.create(metastore);
if (!isNullOrEmpty(metastoreUsername)) {
client.setUGI(metastoreUsername);
}
return client;
}
catch (TException e) {
lastException = e;
}
}
throw new PrestoException(HIVE_METASTORE_ERROR, "Failed connecting to Hive metastore: " + addresses, lastException);
}
and below is the exception stacktrace :
com.facebook.presto.spi.PrestoException: Failed connecting to Hive metastore: [metrics-hive-services-lb.prod.hulu.com:9083]
at com.facebook.presto.hive.metastore.thrift.StaticHiveCluster.createMetastoreClient(StaticHiveCluster.java:86)
at com.facebook.presto.hive.metastore.thrift.ThriftHiveMetastore.lambda$getAllTables$2(ThriftHiveMetastore.java:196)
at com.facebook.presto.hive.metastore.thrift.HiveMetastoreApiStats.lambda$wrap$0(HiveMetastoreApiStats.java:42)
at com.facebook.presto.hive.metastore.thrift.ThriftHiveMetastore.lambda$getAllTables$4(ThriftHiveMetastore.java:213)
at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:140)
at com.facebook.presto.hive.metastore.thrift.ThriftHiveMetastore.getAllTables(ThriftHiveMetastore.java:212)
at com.facebook.presto.hive.metastore.thrift.BridgingHiveMetastore.getAllTables(BridgingHiveMetastore.java:122)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.loadAllTables(CachingHiveMetastore.java:358)
at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:188)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2273)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2156)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2046)
at com.google.common.cache.LocalCache.get(LocalCache.java:3943)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3967)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4952)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4958)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.get(CachingHiveMetastore.java:214)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.getAllTables(CachingHiveMetastore.java:353)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.loadAllTables(CachingHiveMetastore.java:358)
at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:188)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2273)
But the PrestoException
has been added to the exception Whitelist for RetryDriver
:
private RetryDriver retry()
{
return RetryDriver.retry()
.exponentialBackoff(minBackoffDelay, maxBackoffDelay, maxRetryTime, backoffScaleFactor)
.maxAttempts(maxRetries + 1)
.exceptionMapper(exceptionMapper)
.stopOn(PrestoException.class);
}
So, RetryDriver
will not retry for this timeout exception;
I doubt what’s the reason that we convert TException
to PrestoException
;
Anyone could give me some background or advice? Is this exception conversion necessary?
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
@electrum Oh yes…I am based on my company presto which is based on presto 0.212. After I cherry-pick the corresponding PR(07aa4558ae1917d1b14acf8a5dfd16f6ca690ddd), the retry behavior is working just as expected definitely! Let me close this issue;
Ah, that make sense now. I think you are looking at an older version of the code, because the current code throws
TException
to allow for retry.