question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Efficiency of run length encoding

See original GitHub issue

Version: 3.9.1

Runtime Version: .NET 5

Expected behavior

Remove repetitions in columns.

Actual behavior

The file produced by the sample below contains a number of bytes that repeat - 00 FE 04 00 FE 04 00 FE 04 etc. When I compress this file with 7-zip, it becomes almost 200 times smaller.

Is this expected behavior?

Code snippet reproducing the behavior

			var buf = new int[1024 * 1024];
			for (var i = 0; i < buf.Length; i++)
			{
				buf[i] = 1;
			}

			using (var stream = File.Create("_test.parquet"))
			{
				var field = new DataField<int>("Val");
				var schema = new Schema(field);
				using var parquetWriter = new ParquetWriter(schema, stream);
				using var groupWriter = parquetWriter.CreateRowGroup();
				var column = new DataColumn(field, buf);
				groupWriter.WriteColumn(column);
			}

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
houseofcatcommented, Feb 14, 2022

@aloneguid not that I have much time to work on someone else’s projects.

Plenty of time to benefit from some one else’s work though.

1reaction
fandreicommented, Jan 8, 2022

@aloneguid well, it’s your project. And RLE is one of important features of Parquet. If you decided to not make a quality implementation, it’s up to you 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Run-length encoding
Run -length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value...
Read more >
Is the Run-Length Encoding (RLE) Algorithm Flawed?
Compression Efficiency : RLE performs exceptionally well when applied to data with long runs of the same value. It can achieve significant ...
Read more >
RLE compression | How run length encoding works
RLE stands for Run Length Encoding. It is a lossless algorithm that only offers decent compression ratios for specific types of data.
Read more >
Run Length Encoding (RLE) Compression Algorithm in ...
Run Length Encoding is a lossless data compression algorithm. It compresses data by reducing repetitive, and consecutive data called runs.
Read more >
Coding - Compression 7.2. Run length encoding
This is the basic idea behind run length encoding (RLE), which is used to save space when storing digital images. In run length...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found