question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with databricks

See original GitHub issue

Hi,

I am using XlsxWriter on Databricks to write an Excel file on an Azure blob but I have the following issue : FileCreateError: [Errno 95] Operation not supported.

I am using Python version 3.7.3 and XlsxWriter 1.2.8.

Here is some code that demonstrates the problem:


import xlsxwriter
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

writer = pd.ExcelWriter("output.xlsx", engine='xlsxwriter')

df.to_excel(writer, sheet_name='output_1')

writer.save()

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

14reactions
pfernandez-sanoficommented, Oct 15, 2020

I managed to solve it by saving the file locally in Databricks, then I copy it to the desired mount.

import datetime
import shutil

now_= datetime.datetime.now()
timestamp_ = datetime.datetime.timestamp(now_)
tmp_path = f'temp_{timestamp_}.xlsx' # I save it with the timestamp to avoid errors
writer = ExcelWriter(tmp_path)
real_path = f'{YOURPATH}_filename.xlsx'
df.to_excel(writer,'sheet_name')
writer.save()
shutil.copy(tmp_path, real_path) # Copy the temp file to mount 
1reaction
jmcnamaracommented, Apr 8, 2022

Resolved

Similar to the suggestion from @pfernandez-sanofi above the Databricks documentation now recommends performing the operation on a local disk and then copying the output XlsxWriter file to dbfs.

From the docs https://docs.databricks.com/data/databricks-file-system.html#local-file-api-limitations:

Local file API limitations

FUSE V2 (default for Databricks Runtime 6.x and 7.x) Does not support random writes. For workloads that require random writes, perform the I/O on local disk first and then copy the result to /dbfs. For example:

import xlsxwriter
from shutil import copyfile

workbook = xlsxwriter.Workbook('/local_disk0/tmp/excel.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write(0, 0, "Key")
worksheet.write(0, 1, "Value")
workbook.close()

copyfile('/local_disk0/tmp/excel.xlsx', '/dbfs/tmp/excel.xlsx')
Read more comments on GitHub >

github_iconTop Results From Across the Web

Errors and troubleshooting for Databricks Repos
Errors and troubleshooting for Databricks Repos · Invalid credentials · Secure connection...SSL problems · Timeout errors · 404 errors · Resource not ...
Read more >
Troubleshoot performance bottlenecks in Azure Databricks
Monitoring and troubleshooting performance issues is a critical when operating production Azure Databricks workloads.
Read more >
Issues · databricks/Spark-The-Definitive-Guide - GitHub
Issues : databricks/Spark-The-Definitive-Guide ... Have a question about this project? Sign up for a free GitHub account to open an issue and ...
Read more >
Databricks Delta Issues | Stitch Documentation - Stitch Data
The automatic schema evolution feature available for Delta tables can cause issues when loading data to your Databricks Delta destination. It is recommended...
Read more >
Databricks Performance: Fixing the Small File Problem with ...
A common Databricks performance problem we see in enterprise data lakes are that of the “Small Files” issue. One of our customers is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found