question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Modification to enable producing consistent binary output

See original GitHub issue

I think it’d be very useful to have a way to make xlsxwriter produce identical binary result every time a workbook with identical contents is generated.

Two scenarios I have personally in mind:

  • it would be useful to keep some slow-changing data in worksheets in version control without wasting space every time a report is regenerated,
  • it would be helpful to quickly check binary hash of a report to verify the same contents were generated as a kind of unit test

Unfortunately, that’s currently not possible. I’ve seen that there are two moving pieces here:

  1. one can set constant creation/modification date that is written to the metadata using “document properties”, but

  2. the zipping process always sets file timestamps to current time what results in the archive being different on every run.

I’ve tracked down the problem to the zipping library which internally makes use of timestamps of temporary files created by xlsxwriter or current time when in-memory mode is used.

In my pull request I propose a solution where creation time is taken from the metadata (if available) and:

  1. a ZipInfo structure with that date is used for in-memory zipping

  2. temporary files’ modification time is set to that date with file-based zipping.

All of that can be done in Workbook’s _store_workbook method. I wondered whether setting the temporary files’ date in Packager would be more natural but decided it’s better to have it all in one place for both file-based and in-memory modes.

I think the enhancement would be useful not only to me (if properly advertised) but I tried to keep a low profile and “constant output mode” is only activated when creation date is set in properties which I expect would be rare.

For implementation details and usage code see my pull request https://github.com/jmcnamara/XlsxWriter/pull/495.

All existing unit tests passed (linux, py27). I’ve seen your encouragement for creating new ones for any additional functionality but all existing tests check internal xml data I haven’t spotted a natural place to add an “outside” binary-level test.

Last but not least, thanks for your great library, I use it to all kind of things and am glad I can give something back.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ziemblacommented, Mar 30, 2018

Thanks, you’re right, it was stupid of me not to check Excel. What’s more setting a constant is even simpler and cleaner, I’ve updated the pull request accordingly.

As I checked in the meantime, LibreOffice and Google Docs are not Excel-compliant 😄 they have current time embedded as zipped files’ timestamp which makes the files different on every save.

I’ve been worried as I realized there could be a side effect of changing timestamps on some non-temp files if such were used in the zipping but doesn’t seem to happen. I suspected embedding external image could reuse the original file for example but it looks it doesn’t.

Now your stackoverflow answer you mentioned would become true, setting the “created” property alone would make both files identical without depending on a race condition (that these two files are created in such rapid succession that tempfiles get the same filesystem timestamp) 😄

Happy Easter!

0reactions
ziemblacommented, Apr 23, 2018

Indeed! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

16.1.6.4 Binary Logging Options and Variables
With binary logging enabled, the server logs all statements that change data to the binary log, which is used for backup and replication....
Read more >
Types of breaking changes - .NET - Microsoft Learn
In this article. Modifications to the public contract; Behavioral changes; Platform support; Internal implementation changes; Code changes ...
Read more >
Process Menu - ImageJ - NIH
Output is a binary image, with foreground 255 and background 0, using an inverted or normal LUT depending on the "Black Background" option...
Read more >
Comparing Hypothesis Tests for Continuous, Binary, and ...
Learn about common hypothesis tests for three types of data—continuous, binary, and count data. The data type determines the conclusions that you can...
Read more >
What You Need to Know About Binary Options Outside the U.S.
Exiting a trade before expiration typically results in a lower payout (specified by broker) or small loss, but the trader won't lose their...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found