<p><span class="w"></span></p>
<h1 id="extraction-of-tabular-data-from-pdfs-using-python">Extraction Of Tabular Data From PDFs Using Python</h1>
<p><img alt="Image for post" src="https://miro.medium.com/max/2400/1*8jIibN00fj3Xm28ZWVpBmQ.jpeg" />
                                             Pic source : Google</p>
<h2 id="how-using-python">How using python?</h2>
<p>We can extract tabular data from PDFs using <strong>camelot</strong> library in python with &gt;90% accuracy and we can save into csv or excel file.</p>
<h2 id="what-is-camelot"><strong>What is camelot?</strong></h2>
<p>Camelot is python based,MIT licensed ,open source library having following features:</p>
<ul>
<li>Work well and configurable</li>
<li>We can debug and visualize using python matplotlib library</li>
<li>We can export output file as a csv or excel file</li>
<li>Camelot have excellent documentation</li>
</ul>
<h2 id="installation"><strong>Installation :</strong></h2>
<p>Using Conda:</p>
<ul>
<li>conda install camelot-py -c conda forge</li>
</ul>
<p>Using pip (after installing tk and ghostscript)</p>
<ul>
<li>pip install camelot-py[cv]</li>
</ul>
<p><span class="w je jf jg dg jh ji jj jk jl ab">N</span><strong>ote</strong> : It only works with text based PDFs not scanned documents.</p>
<h2 id="others-pdfs-extraction-tools-available"><strong>Others PDFs Extraction Tools Available:</strong></h2>
<ul>
<li>Tabula- Java based,Open source</li>
<li>pdfplumber- Python,Opensource</li>
<li>pdftables- Python,proprietary and paid</li>
<li>Smallpdfs- Online and paid service</li>
</ul>
<h2 id="problems-with-these-solutions">Problems with these solutions:</h2>
<ul>
<li>We can not save output file as csv or excel.</li>
<li>These tools are not scalable and maintainable.</li>
</ul>
<h1 id="conclusion">Conclusion:</h1>
<p>This article is inspired by speaker Vinayak Mehta in PyconIndia 2019.Thank you for reading. Please give it a try, have fun and let me know your feedback!</p>


<span class="w"></span>

# Extraction Of Tabular Data From PDFs Using Python


<noscript><img alt="Image for post" class="du ge dv gf t" src="https://miro.medium.com/max/2400/1*8jIibN00fj3Xm28ZWVpBmQ.jpeg" width="1200" height="800" srcSet="https://miro.medium.com/max/552/1*8jIibN00fj3Xm28ZWVpBmQ.jpeg 276w, https://miro.medium.com/max/1104/1*8jIibN00fj3Xm28ZWVpBmQ.jpeg 552w, https://miro.medium.com/max/1280/1*8jIibN00fj3Xm28ZWVpBmQ.jpeg 640w, https://miro.medium.com/max/1400/1*8jIibN00fj3Xm28ZWVpBmQ.jpeg 700w" sizes="700px"/></noscript>
                                             Pic source : Google

## How using python?

We can extract tabular data from PDFs using **camelot** library in python with >90% accuracy and we can save into csv or excel file.

## **What is camelot?**

Camelot is python based,MIT licensed ,open source library having following features:

*   Work well and configurable
*   We can debug and visualize using python matplotlib library
*   We can export output file as a csv or excel file
*   Camelot have excellent documentation

## **Installation :**

Using Conda:

*   conda install camelot-py -c conda forge

Using pip (after installing tk and ghostscript)

*   pip install camelot-py[cv]

<span class="w je jf jg dg jh ji jj jk jl ab">N</span>**ote** : It only works with text based PDFs not scanned documents.

## **Others PDFs Extraction Tools Available:**

*   Tabula- Java based,Open source
*   pdfplumber- Python,Opensource
*   pdftables- Python,proprietary and paid
*   Smallpdfs- Online and paid service

## Problems with these solutions:

*   We can not save output file as csv or excel.
*   These tools are not scalable and maintainable.

# Conclusion:

This article is inspired by speaker Vinayak Mehta in PyconIndia 2019.Thank you for reading. Please give it a try, have fun and let me know your feedback!

Extraction Of Tabular Data From PDFs Using Python

A Software Developer who likes to write Technical Blogs and love to share technical stuffs.


Amit Kumar Manjhi's Blog

Amit Kumar Manjhi's Blog

Extraction Of Tabular Data From PDFs Using Python

Extraction Of Tabular Data From PDFs Using Python

How using python?

What is camelot?

Installation :

Others PDFs Extraction Tools Available:

Problems with these solutions:

Conclusion: