Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Azure streaming binary data error

See original GitHub issue

Problem description

I am trying to stream a binary file from Azure Blob Storage.

I expect to be able to iterate over chunks of the data set, but I see an error do with the Azure readinto function.

I’m using the npTDMS library to read a LabVIEW data file in TDMS format (binary quantitative data files.)

Steps/code to reproduce the problem

The code is something like this:

import azure.storage.blob
import smart_open
import nptdms

CONN_STR = '******************'
BLOB_URI = 'azure://test/my_data_file.tdms'

transport_params = dict(
    client=azure.storage.blob.BlobServiceClient.from_connection_string(conn_str=CONN_STR),
)

with smart_open.open(BLOB_URI, mode='rb', transport_params=transport_params) as file:

    with nptdms.TdmsFile.open(file) as tdms_file:
        for group in tdms_file.groups():
            for channel in group.channels():
                for chunk in channel.data_chunks():
                    pass

and the error I get is:

Traceback (most recent call last):
  File "C:\Users\my_username\my_project\scripts\blob-tdms\smart.py", line 35, in <module>
    main()
  File "C:\Users\my_username\my_project\scripts\blob-tdms\smart.py", line 28, in main
    for chunk in channel.data_chunks():
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms.py", line 564, in data_chunks
    for raw_data_chunk in self._read_channel_data_chunks():
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms.py", line 758, in _read_channel_data_chunks
    for chunk in self._reader.read_raw_data_for_channel(self.path):
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\reader.py", line 191, in read_raw_data_for_channel
    for i, chunk in enumerate(
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 269, in read_raw_data_for_channel
    for chunk in self._read_channel_data_chunks(f, data_objects, channel_path, chunk_offset, stop_chunk):
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\tdms_segment.py", line 367, in _read_channel_data_chunks
    for chunk in reader.read_channel_data_chunks(file, data_objects, channel_path, chunk_offset, stop_chunk):
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 64, in read_channel_data_chunks
    yield self._read_channel_data_chunk(file, data_objects, chunk_index, channel_path)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 72, in _read_channel_data_chunk
    data_chunk = self._read_data_chunk(file, data_objects, chunk_index)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\daqmx.py", line 39, in _read_data_chunk
    combined_data = read_interleaved_segment_bytes(file, raw_data_width, chunk_size)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 159, in read_interleaved_segment_bytes
    combined_data = fromfile(f, dtype=np.uint8, count=number_bytes)
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\nptdms\base_segment.py", line 147, in fromfile
    bytes_read = file.readinto(buffer[offset:])
  File "C:\Users\my_username\Miniconda3\envs\my_project\lib\site-packages\smart_open\azure.py", line 322, in readinto
    b[:len(data)] = data
ValueError: invalid literal for int() with base 10: b'\x93\xad\x03\x00k\xf0\xff\xff\xfe\xee\xff\xffm\xfd\xff\xffd\xc1E\x00<\xad\x03\x00O\xf0\xff\xffI\xee\xff\xff\xd1\xfd\xff\xff\xbe\xc2E\x00\xe8\xac\x03\x00\xa6\xef\xff\xff\xe5\xed\xff\xff\x92\xfd\xff\x

It seems like it’s expecting a text file? Or it’s not calculating the data index correctly to page through the data set?

Versions

>>> import platform, sys, smart_open
>>> print(platform.platform())
Windows-10-10.0.19042-SP0
>>> print("Python", sys.version)
Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:15:42) [MSC v.1916 64 bit (AMD64)]
>>> print("smart_open", smart_open.__version__)
smart_open 6.1.0

From pip list:

azure-core          1.23.0
azure-storage-blob  12.10.0
npTDMS              1.4.0
smart-open          6.1.0

Issue Analytics

State:
Created a year ago
Comments:7 (1 by maintainers)

Top GitHub Comments

2reactions

nharada1commented, Aug 26, 2022

I believe this is an issue under the hood with the readinto implementation. I run into this same error when using S3 and Linux. The problem seems to be assigning a binary string into a numpy array. Perhaps the exception that the next line catches should be ValueError instead of AttributeError?

1reaction

mpenkovcommented, Aug 24, 2022

For what possible values of b and data will b[:len(data)] = data (or parts of it) raise that exception?

If you’re able to dig in with a debugger, it would be good to know what those values are.

Top Results From Across the Web

Troubleshoot Azure Stream Analytics outputs - Microsoft Learn

This article describes techniques to troubleshoot your output connections in Azure Stream Analytics jobs.

Azure Stream Analytics: An unexpected error has occured and ...

I have developed a ASA query to call ML model. It works fine in the VS code version. But when I copy it...

String or binary data would be truncated while reading column ...

When querying the ss stream in azure-synapse-analytics, the fields in the stream are longer than 8000 and the query reports an error.

Troubleshooting Inputs for Azure Stream Analytics

This article describes techniques to troubleshoot your input connections in Azure Stream Analytics jobs.

Error when streaming binary data through VPN connection (1 ...

I have a WCF method which streams the data to the client. The amount of binary data being streamed is about 10-20 MB....

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Azure streaming binary data error

Problem description

Steps/code to reproduce the problem

Versions

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

open() doesn't play nice with subprocess.run when used for STDIN

For Azure stream, including the opened variable in a formatted message throws an exception