.extract() is unable to get data properly from sparse tables
See original GitHub issueI created a manual table to reproduce the bug which I am facing
<!DOCTYPE html>
<html lang="en">
<table class="manual_table">
<thead>
<tr>
<th class="">Mar 2008</th>
<th class="">Mar 2009</th>
<th class="">Mar 2010</th>
</tr>
</thead>
<tbody>
<tr>
<td class="">8,626</td>
<td class="">8,427</td>
<td class="">11,525</td>
</tr>
<tr>
<td class="">16,408</td>
<td class="">19,582</td>
<td class=""></td>
</tr>
<tr>
<td class=""></td>
<td class="">22,574</td>
<td class="">21,755</td>
</tr>
</tbody>
</table>
Now when I try to run the below code on the above html. This is the output I get
>>> rows = response.css(".manual_table tbody tr")
>>> rows[0].css("td::text").extract()
['8,626', '8,427', '11,525']
>>> rows[1].css("td::text").extract()
['16,408', '19,582']
>>> rows[2].css("td::text").extract()
['22,574', '21,755']
As you can notice, It is unable to give proper output for empty data cells. It is ignoring all empty values and that seems a bug.
Similarly if you run below code you will find some weird results. I am confused because it is not supposed to be like this.
>>> len(rows[2].css("td::text").extract())
2
>>> len(rows[2].css("td::text"))
2
>>> len(rows[2].css("td"))
3
Both .getall()
and .extract()
give the same issue.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
unable to extract table data using beautifulsoup - Stack Overflow
The table data is inside a script tag as it is dynamically generated so your code would find nothing parsing the source ·...
Read more >Database Engine events and errors - SQL Server
Consult this MSSQL error code list to find explanations for error messages for SQL Server database engine events.
Read more >Unable to extract structured table from PDF's | Decipher
Hi Decipher Team,We are trying to extract a few fields and Tables from a PDF document,Decipher is doing a great job in extracting...
Read more >PartiQL select statements for DynamoDB - AWS Documentation
Use the SELECT statement to retrieve data from a table in Amazon DynamoDB. Using the SELECT statement can result in a full table...
Read more >TABLESEER: AUTOMATIC TABLE EXTRACTION, SEARCH ...
extracting table data from digital libraries and enables users to ... (ASCII) text based, it cannot fully make use of document image information....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@shubham-MLwiz
xpath("normalize-space()").getall()
returnsNone
from the empty data cells unliketext()
.Full code
Output
I’m commenting on this old issue because I’ve faced it today.
I believe that is the right way to do it with Parsel.