question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HBaseResultCoder fails to serialize KeyValue instances

See original GitHub issue

HBaseResultCoder (in bigtable-hbase-dataflow) works well for Result instances that use the RowKey implementation of hbase’s Cell interface, but when given a Result with hbase’s KeyValue instead of Bigtable’s RowKey it corrupts the output Result.

I couldn’t find where exactly the breakdown occurs, but it seems related to the fact that a call to RowCell.toString() causes an ArrayIndexOutOfBoundsException.

A simple test for RowCell.toString():

import com.google.bigtable.repackaged.com.google.cloud.hbase.adapters.read.RowCell;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.KeyValue;

@Test
public void testStringifyRowCell() throws Exception {
    Cell keyValue = CellUtil.createCell("key".getBytes(), "family".getBytes(), "value".getBytes());
    Cell rowCell = new RowCell("key".getBytes(), "family".getBytes(), "qualifier".getBytes(), System.currentTimeMillis(), "value".getBytes());

    keyValue.toString();
    rowCell.toString();  // ArrayIndexOutOfBoundsException
}

Stack trace from test failure:

java.lang.ArrayIndexOutOfBoundsException: 27495

	at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1231)
	at org.apache.hadoop.hbase.KeyValue.keyToString(KeyValue.java:1190)
	at com.google.bigtable.repackaged.com.google.cloud.hbase.adapters.read.RowCell.toString(RowCell.java:234)

Another test illustrating HBaseCoder’s incompatibility with KeyValue

import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellComparator;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Result;
import com.google.cloud.bigtable.dataflow.coders.HBaseResultCoder;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import static org.junit.Assert.assertTrue;


@Test
public void testHBaseResultCoderWithKeyValue() throws Exception {
    // Given
    // -----
    Cell inputKeyValue = CellUtil.createCell("key".getBytes(), "family".getBytes(), "value".getBytes());
    Result inputResult = Result.create(new Cell[]{inputKeyValue});

    HBaseResultCoder coder = HBaseResultCoder.getInstance();

    // When
    // -----
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    coder.encode(inputResult, outputStream, null);
    ByteArrayInputStream inputStream = new ByteArrayInputStream(outputStream.toByteArray());
    Result outputResult = coder.decode(inputStream, null);
    Cell outputRowCell = outputResult.listCells().get(0);

    // Constructor for KeyValue(Cell c) located here https://hbase.apache.org/1.2/xref/org/apache/hadoop/hbase/KeyValue.html#747
    Cell keyValueFromOutputRowCell = new KeyValue(outputRowCell);

    // Then
    // -----
    // Print statements to show unusual byte string in the decoded Cell
    System.out.println(inputKeyValue.toString());              // prints: key/family:value/LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0
    System.out.println(keyValueFromOutputRowCell.toString());  // prints: key/\x00\x00\x00\x1A\x00\x00\x00\x00\x00\x03key\x06familyvalue\x7F\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBD:\x00\x00\x00\x1A\x00\x00\x00\x00\x00\x03key\x06familyvalue\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF/-1/Put/vlen=34/seqid=0

    // Output doesn't equal the input, regardless of whether output and input are the same Cell implementation
    assertTrue(CellComparator.equals(inputKeyValue, outputRowCell));  // AssertionError
    assertTrue(CellComparator.equals(inputKeyValue, keyValueFromOutputRowCell));  // AssertionError

    // Input and output RowArrays are unequal
    assertTrue(Arrays.areEqual(inputKeyValue.getRowArray(), outputRowCell.getRowArray()));  // AssertionError

    // Input and output Result instances are unequal
    Result.compareResults(inputResult, outputResult);   // ArrayIndexOutOfBoundsException caused by attempt to call RowCell.toString()
}

The relevant dependencies in our pom.xml:

        <dependency>
            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
            <version>1.9.0</version>
        </dependency>

        <dependency>
            <groupId>com.google.api-client</groupId>
            <artifactId>google-api-client</artifactId>
            <version>1.22.0</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- This must be at least v0.9.3, due to a bug in the HBaseMutationCoder that
             breaks serialization of Put objects -->
        <dependency>
            <groupId>com.google.cloud.bigtable</groupId>
            <artifactId>bigtable-hbase-dataflow</artifactId>
            <version>0.9.6-SNAPSHOT</version>
        </dependency>

        <!-- bigtable-hbase-dataflow and bigtable-hbase-1.2 are incompatible-->
        <dependency>
            <groupId>com.google.cloud.bigtable</groupId>
            <artifactId>bigtable-hbase-shaded-for-dataflow</artifactId>
            <version>[0.9.5.1, 1.0.0)</version>
        </dependency>

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
sduskiscommented, Feb 24, 2017

RowCell.toString() turns out easy to fix. I’ll get a new -SNAPSHOT out today with the fix.

HBaseResultCoder has 2 problems.

  1. It’s fundamentally broken when converting a KeyValue to a FlatRow.Cell in HBaseResultCoder.encode().
  2. Even if a KeyValue value is translated to a FlatRow.Cell, HBaseResultCoder.decode() will create a RowCell instead of a KeyValue. That will still result in “// Input and output Result instances are unequal”

I use a Bigtable FlatRow to encode/decode HBase Results, which might be the wrong approach. We’ll fix FlatRowAdapter for case 1), but that won’t fix 2). I’m going to have to mull over these two issues.

0reactions
sduskiscommented, Mar 21, 2017

The next release will have this fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

HBaseResultCoder fails to serialize KeyValue instances #1227
HBaseResultCoder (in bigtable-hbase-dataflow) works well for Result instances that use the RowKey implementation of hbase's Cell interface, ...
Read more >
Unable to serialize a KeyValuePair with YamlDotNet
This is because you don't dispose the StreamWriter , so it isn't flushed to the stream. Try putting it in a using block:...
Read more >
PropertyNamingPolicy, PropertyNameCaseInsensitive, and ...
PropertyNamingPolicy, PropertyNameCaseInsensitive, and Encoder options are honored when serializing and deserializing key-value pairs.
Read more >
apex - SObject key of map mutated returns null but serializing ...
I am just wondering how you are even able to compare a boolean with string in assert statement and we don't get any...
Read more >
JSON.NET does not serialize public properties if object ...
Quite surprised to find that serializing this type, results in empty JSON "{}". void Main() { JsonConvert.SerializeObject(new Foo { Id = "5" }).Dump();...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found