HBaseResultCoder fails to serialize KeyValue instances
See original GitHub issueHBaseResultCoder
(in bigtable-hbase-dataflow) works well for Result
instances that use the RowKey
implementation of hbase’s Cell
interface, but when given a Result
with hbase’s KeyValue
instead of Bigtable’s RowKey
it corrupts the output Result
.
I couldn’t find where exactly the breakdown occurs, but it seems related to the fact that a call to RowCell.toString()
causes an ArrayIndexOutOfBoundsException.
A simple test for RowCell.toString()
:
import com.google.bigtable.repackaged.com.google.cloud.hbase.adapters.read.RowCell;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.KeyValue;
@Test
public void testStringifyRowCell() throws Exception {
Cell keyValue = CellUtil.createCell("key".getBytes(), "family".getBytes(), "value".getBytes());
Cell rowCell = new RowCell("key".getBytes(), "family".getBytes(), "qualifier".getBytes(), System.currentTimeMillis(), "value".getBytes());
keyValue.toString();
rowCell.toString(); // ArrayIndexOutOfBoundsException
}
Stack trace from test failure:
java.lang.ArrayIndexOutOfBoundsException: 27495
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1231)
at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1190)
at com.google.bigtable.repackaged.com.google.cloud.hbase.adapters.read.RowCell.toString(RowCell.java:234)
Another test illustrating HBaseCoder
’s incompatibility with KeyValue
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellComparator;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import com.google.cloud.bigtable.dataflow.coders.HBaseResultCoder;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import static org.junit.Assert.assertTrue;
@Test
public void testHBaseResultCoderWithKeyValue() throws Exception {
// Given
// -----
Cell inputKeyValue = CellUtil.createCell("key".getBytes(), "family".getBytes(), "value".getBytes());
Result inputResult = Result.create(new Cell[]{inputKeyValue});
HBaseResultCoder coder = HBaseResultCoder.getInstance();
// When
// -----
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
coder.encode(inputResult, outputStream, null);
ByteArrayInputStream inputStream = new ByteArrayInputStream(outputStream.toByteArray());
Result outputResult = coder.decode(inputStream, null);
Cell outputRowCell = outputResult.listCells().get(0);
// Constructor for KeyValue(Cell c) located here https://hbase.apache.org/1.2/xref/org/apache/hadoop/hbase/KeyValue.html#747
Cell keyValueFromOutputRowCell = new KeyValue(outputRowCell);
// Then
// -----
// Print statements to show unusual byte string in the decoded Cell
System.out.println(inputKeyValue.toString()); // prints: key/family:value/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
System.out.println(keyValueFromOutputRowCell.toString()); // prints: key/\x00\x00\x00\x1A\x00\x00\x00\x00\x00\x03key\x06familyvalue\x7F\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD:\x00\x00\x00\x1A\x00\x00\x00\x00\x00\x03key\x06familyvalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF/-1/Put/vlen=34/seqid=0
// Output doesn't equal the input, regardless of whether output and input are the same Cell implementation
assertTrue(CellComparator.equals(inputKeyValue, outputRowCell)); // AssertionError
assertTrue(CellComparator.equals(inputKeyValue, keyValueFromOutputRowCell)); // AssertionError
// Input and output RowArrays are unequal
assertTrue(Arrays.areEqual(inputKeyValue.getRowArray(), outputRowCell.getRowArray())); // AssertionError
// Input and output Result instances are unequal
Result.compareResults(inputResult, outputResult); // ArrayIndexOutOfBoundsException caused by attempt to call RowCell.toString()
}
The relevant dependencies in our pom.xml:
<dependency>
<groupId>com.google.cloud.dataflow</groupId>
<artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>1.22.0</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- This must be at least v0.9.3, due to a bug in the HBaseMutationCoder that
breaks serialization of Put objects -->
<dependency>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-hbase-dataflow</artifactId>
<version>0.9.6-SNAPSHOT</version>
</dependency>
<!-- bigtable-hbase-dataflow and bigtable-hbase-1.2 are incompatible-->
<dependency>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-hbase-shaded-for-dataflow</artifactId>
<version>[0.9.5.1, 1.0.0)</version>
</dependency>
Issue Analytics
- State:
- Created 7 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
HBaseResultCoder fails to serialize KeyValue instances #1227
HBaseResultCoder (in bigtable-hbase-dataflow) works well for Result instances that use the RowKey implementation of hbase's Cell interface, ...
Read more >Unable to serialize a KeyValuePair with YamlDotNet
This is because you don't dispose the StreamWriter , so it isn't flushed to the stream. Try putting it in a using block:...
Read more >PropertyNamingPolicy, PropertyNameCaseInsensitive, and ...
PropertyNamingPolicy, PropertyNameCaseInsensitive, and Encoder options are honored when serializing and deserializing key-value pairs.
Read more >apex - SObject key of map mutated returns null but serializing ...
I am just wondering how you are even able to compare a boolean with string in assert statement and we don't get any...
Read more >JSON.NET does not serialize public properties if object ...
Quite surprised to find that serializing this type, results in empty JSON "{}". void Main() { JsonConvert.SerializeObject(new Foo { Id = "5" }).Dump();...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
RowCell.toString()
turns out easy to fix. I’ll get a new -SNAPSHOT out today with the fix.HBaseResultCoder
has 2 problems.KeyValue
to aFlatRow.Cell
inHBaseResultCoder.encode()
.KeyValue
value is translated to aFlatRow.Cell
,HBaseResultCoder.decode()
will create aRowCell
instead of aKeyValue
. That will still result in “// Input and output Result instances are unequal”I use a Bigtable
FlatRow
to encode/decode HBaseResult
s, which might be the wrong approach. We’ll fix FlatRowAdapter for case 1), but that won’t fix 2). I’m going to have to mull over these two issues.The next release will have this fix.