Breaking News: Grepper is joining You.com. Read the official announcement!

extract text from scanned pdf python pytesseract

Add Answer

Maaz answered on December 3, 2021 Popularity 5/10 Helpfulness 2/10

answer extract text from scanned pdf python pytesseract

related pytesseract pdf to text

extract text from scanned pdf python pytesseract

Comment

Tip Maaz 1 GREPCC

import pytesseract
from pdf2image import convert_from_path
import glob

pdfs = glob.glob(r"yourPath\*.pdf")

for pdf_path in pdfs:
    pages = convert_from_path(pdf_path, 500)

    for pageNum,imgBlob in enumerate(pages):
        text = pytesseract.image_to_string(imgBlob,lang='eng')

        with open(f'{pdf_path[:-4]}_page{pageNum}.txt', 'w') as the_file:
            the_file.write(text)

xxxxxxxxxx

import pytesseract

from pdf2image import convert_from_path

import glob

pdfs = glob.glob(r"yourPath\*.pdf")

for pdf_path in pdfs:

    pages = convert_from_path(pdf_path, 500)

    for pageNum,imgBlob in enumerate(pages):

        text = pytesseract.image_to_string(imgBlob,lang='eng')

        with open(f'{pdf_path[:-4]}_page{pageNum}.txt', 'w') as the_file:

            the_file.write(text)

Popularity 5/10 Helpfulness 2/10 Language python

Source: stackoverflow.com

Tags: extract pdf python text

Link to this answer
Share Copy Link

Contributed on Feb 15 2024

Maaz

0 Answers Avg Quality 2/10

Closely Related Answers

pytesseract pdf to text

Comment

Tip Terrible Teira 1 GREPCC

import cv2
import pytesseract

img = cv2.imread('/Users/user1/Desktop/folder1/pdf1.pdf')
text = pytesseract.image_to_string(img)
print(text)

xxxxxxxxxx

import cv2

import pytesseract

img = cv2.imread('/Users/user1/Desktop/folder1/pdf1.pdf')

text = pytesseract.image_to_string(img)

print(text)

Popularity 9/10 Helpfulness 9/10 Language python

Source: stackoverflow.com

Tags: pdf text python

Link to this answer
Share Copy Link

Contributed on Dec 03 2021

Terrible Teira

0 Answers Avg Quality 2/10

extract text from scanned pdf python pytesseract

Contents

More Related Answers

extract text from scanned pdf python pytesseract

Closely Related Answers

pytesseract pdf to text

Grepper

Documentation

Social

Legal

Contact

Oops, You will need to install Grepper and log-in to perform this action.