question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large Table Creation Slow

See original GitHub issue

I saw a closed issue you couldn’t reproduce that sounded similar to what I’m seeing. I was trying to create a table with 7 columns and around 1,000 rows and it took a very long time. I was using a fork that had HTML tags so I wrote a test script, downloaded the most recent version of python-docx and I tried this on both ubuntu and windows machines running python2.7.

My test script is at the bottom, but all I did was create a document, add a 7 column table and then add blank rows. Here is my time output on an i7 running python 2.7.9 and 64bit windows 7. What I see is one core is pegged at 100% while this is running, but minimal memory usage. You can see the line additions take longer and longer as the table gets bigger.

-Thanks

SCRIPT OUTPUT:

docx Version: 0.8.5
    0.45s:     50 lines complete.  (  0.45 seconds for last 50 lines)
    1.54s:    100 lines complete.  (  1.09 seconds for last 50 lines)
    3.31s:    150 lines complete.  (  1.76 seconds for last 50 lines)
    5.74s:    200 lines complete.  (  2.43 seconds for last 50 lines)
    8.81s:    250 lines complete.  (  3.07 seconds for last 50 lines)
   12.56s:    300 lines complete.  (  3.74 seconds for last 50 lines)
   16.94s:    350 lines complete.  (  4.38 seconds for last 50 lines)
   22.01s:    400 lines complete.  (  5.07 seconds for last 50 lines)
   27.75s:    450 lines complete.  (  5.74 seconds for last 50 lines)
   34.16s:    500 lines complete.  (  6.41 seconds for last 50 lines)
   41.23s:    550 lines complete.  (  7.07 seconds for last 50 lines)
   48.92s:    600 lines complete.  (  7.69 seconds for last 50 lines)
   57.28s:    650 lines complete.  (  8.36 seconds for last 50 lines)
   66.38s:    700 lines complete.  (  9.10 seconds for last 50 lines)
   76.13s:    750 lines complete.  (  9.75 seconds for last 50 lines)
   86.52s:    800 lines complete.  ( 10.39 seconds for last 50 lines)
   97.58s:    850 lines complete.  ( 11.06 seconds for last 50 lines)
  109.37s:    900 lines complete.  ( 11.79 seconds for last 50 lines)
  121.71s:    950 lines complete.  ( 12.34 seconds for last 50 lines)
Total Runtime 134.46 seconds

SCRIPT CODE:

import time
import docx

STEP = 50
ROWS = 1000

print "docx Version: %s" % docx.__version__
document = docx.Document()
table = document.add_table(rows=1, cols=7)
tstart = time.time()
t1 = tstart
for i in range(ROWS):
    row_cells = table.add_row().cells
    if i and (i % STEP) == 0:
        t2 = time.time()
        print "%8.2fs:  %5d lines complete.  (%6.2f seconds for last %d lines)" % (t2 - tstart, i, t2-t1, STEP)
        t1 = t2

document.save("table_test.docx")
t2 = time.time()
print "Total Runtime %.2f seconds" % (t2 - tstart)

Issue Analytics

  • State:open
  • Created 8 years ago
  • Reactions:6
  • Comments:13

github_iconTop GitHub Comments

23reactions
stumpyyycommented, Apr 20, 2015

In looking at this a bit more, all of the time is taken up by the table._cells call which happens every time I fetch a row.cells. To retrieve a row, _cells has to iterate through every cell in the entire table to deal with merged cells, and doesn’t have a mechanism to regenerate only on change.

As a work-around, since I just need a simple table I’m fetching all the cells once and indexing the rows:

COLUMNS = 7
table = document.add_table(rows=1000, columns=COLUMNS)
table_cells = table._cells
for i in range(ROWS):
    row_cells = table_cells[i*COLUMNS:(i+1)*COLUMNS]
    #Add text to row_cells

This takes around 4 seconds to populate 1000 rows.

2reactions
sorrowecommented, Feb 8, 2022

Another year, and another person this has helped. Just got a 6000+ row table generating in a few minutes, as opposed to hours.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why MySQL Could Be Slow With Large Tables? - Percona
The three main issues you should be concerned if you're dealing with very large data sets are Buffers, Indexes, and Joins. Buffers. First...
Read more >
postgresql 10 - Constraint creation on a large table is too slow
There's a table with 723M rows, large amount of data and not partitioned. We're adding a constraint to the table as follows:
Read more >
sql - Queries on large table extremely slow, how can I optimize?
The problem is that your indexes do not cover your query. In other words: the server cannot service your query by using just...
Read more >
Intermittent slow inserts on a large table with lots of data churn
Any ideas on what could be causing this? I have ran traces, checked resource monitor etc. I am not seeing blocking, over taxed...
Read more >
SQL Performance Best Practices | CockroachDB Docs
Large multi-row INSERT queries can lead to long-running transactions that ... When a table is created, all columns are stored as a single...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found