Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large Table Creation Slow

See original GitHub issue

I saw a closed issue you couldn’t reproduce that sounded similar to what I’m seeing. I was trying to create a table with 7 columns and around 1,000 rows and it took a very long time. I was using a fork that had HTML tags so I wrote a test script, downloaded the most recent version of python-docx and I tried this on both ubuntu and windows machines running python2.7.

My test script is at the bottom, but all I did was create a document, add a 7 column table and then add blank rows. Here is my time output on an i7 running python 2.7.9 and 64bit windows 7. What I see is one core is pegged at 100% while this is running, but minimal memory usage. You can see the line additions take longer and longer as the table gets bigger.

-Thanks

SCRIPT OUTPUT:

docx Version: 0.8.5
    0.45s:     50 lines complete.  (  0.45 seconds for last 50 lines)
    1.54s:    100 lines complete.  (  1.09 seconds for last 50 lines)
    3.31s:    150 lines complete.  (  1.76 seconds for last 50 lines)
    5.74s:    200 lines complete.  (  2.43 seconds for last 50 lines)
    8.81s:    250 lines complete.  (  3.07 seconds for last 50 lines)
   12.56s:    300 lines complete.  (  3.74 seconds for last 50 lines)
   16.94s:    350 lines complete.  (  4.38 seconds for last 50 lines)
   22.01s:    400 lines complete.  (  5.07 seconds for last 50 lines)
   27.75s:    450 lines complete.  (  5.74 seconds for last 50 lines)
   34.16s:    500 lines complete.  (  6.41 seconds for last 50 lines)
   41.23s:    550 lines complete.  (  7.07 seconds for last 50 lines)
   48.92s:    600 lines complete.  (  7.69 seconds for last 50 lines)
   57.28s:    650 lines complete.  (  8.36 seconds for last 50 lines)
   66.38s:    700 lines complete.  (  9.10 seconds for last 50 lines)
   76.13s:    750 lines complete.  (  9.75 seconds for last 50 lines)
   86.52s:    800 lines complete.  ( 10.39 seconds for last 50 lines)
   97.58s:    850 lines complete.  ( 11.06 seconds for last 50 lines)
  109.37s:    900 lines complete.  ( 11.79 seconds for last 50 lines)
  121.71s:    950 lines complete.  ( 12.34 seconds for last 50 lines)
Total Runtime 134.46 seconds

SCRIPT CODE:

import time
import docx

STEP = 50
ROWS = 1000

print "docx Version: %s" % docx.__version__
document = docx.Document()
table = document.add_table(rows=1, cols=7)
tstart = time.time()
t1 = tstart
for i in range(ROWS):
    row_cells = table.add_row().cells
    if i and (i % STEP) == 0:
        t2 = time.time()
        print "%8.2fs:  %5d lines complete.  (%6.2f seconds for last %d lines)" % (t2 - tstart, i, t2-t1, STEP)
        t1 = t2

document.save("table_test.docx")
t2 = time.time()
print "Total Runtime %.2f seconds" % (t2 - tstart)

Issue Analytics

State:
Created 8 years ago
Reactions:6
Comments:13

Top GitHub Comments

23reactions

stumpyyycommented, Apr 20, 2015

In looking at this a bit more, all of the time is taken up by the table._cells call which happens every time I fetch a row.cells. To retrieve a row, _cells has to iterate through every cell in the entire table to deal with merged cells, and doesn’t have a mechanism to regenerate only on change.

As a work-around, since I just need a simple table I’m fetching all the cells once and indexing the rows:

COLUMNS = 7
table = document.add_table(rows=1000, columns=COLUMNS)
table_cells = table._cells
for i in range(ROWS):
    row_cells = table_cells[i*COLUMNS:(i+1)*COLUMNS]
    #Add text to row_cells

This takes around 4 seconds to populate 1000 rows.

2reactions

sorrowecommented, Feb 8, 2022

Another year, and another person this has helped. Just got a 6000+ row table generating in a few minutes, as opposed to hours.