Home
About
Resume
Projects
Links
Blog
Back to Contents
# Python Manipulate PDF using PyPDF2 #### Installation ```bash pip install PyPDF2 ``` #### Usage Get PDF number of pages ```python from PyPDF2 import PdfFileReader pdf = PdfFileReader(open(PDF_NAME, 'rb')) pdfNumPages = pdf.getNumPages() ``` Read PDF page(s) content ```python from PyPDF2 import PdfFileReader pdf = PdfFileReader(open(PDF_NAME, 'rb')) pdfPage = pdf.getPage(PAGE_NUM) ``` Add PDF page to Writer buffer ```python from PyPDF2 import PdfFileReader, PdfFileWriter writer = PdfFileWriter() pdf = PdfFileReader(open(pdf_name, 'rb')) for j in range(pdf.getNumPages()): writer.addPage(pdf.getPage(j)) ``` Add blank page to Writer buffer ```python from PyPDF2 import PdfFileWriter writer = PdfFileWriter() writer.addBlankPage() ``` Merge multiple PDFs into a single PDF ```python from PyPDF2 import PdfFileMerger merger = PdfFileMerger() for pdf in PDF_NAME_LIST: merger.append(pdf) merger.write('merge.pdf') ``` #### Working Example ##### Aim: - To combine a batch of dataset evaluation results into a single PDF - As printing layout shrinks every 4 pages into a single page, so to add blank pages to maintain every dataset has number of page divisible by 4 - To name the output PDF as "Evaluation_yyyy-mm-dd.pdf" ##### Code: ```python from PyPDF2 import PdfFileReader,PdfFileWriter import os, glob, datetime #### Define Functions ##### def gen_evaluation_pdf(target_dir=".",target_pattern = "*dataset_evaluation.pdf"): now = datetime.datetime.now() current_date = now.strftime("%Y-%m-%d") output_filename = f"Evaluation_{current_date}.pdf" gse_results = [i for i in os.listdir(target_dir) if "." not in i] pdf_count = 0 output = PdfFileWriter() for gse_result in gse_results: pdf_name_list = glob.glob("./"+gse_result+"/"+target_pattern) if len(pdf_name_list): for i in range(len(pdf_name_list)): pdf_count += 1 pdf_name = pdf_name_list[i] pdf = PdfFileReader(open(pdf_name, 'rb')) for j in range(pdf.getNumPages()): output.addPage(pdf.getPage(j)) for j in range((4-pdf.getNumPages()) % 4): output.addBlankPage() print(f"There are {pdf_count} evaluation pdfs found.") with open(output_filename, "wb") as f_out: output.write(f_out) print("Job done!") def get_evaluation_pdf(target_dir=".",target_pattern = "*dataset_evaluation.pdf"): gse_results = [i for i in os.listdir(target_dir) if "." not in i] pdfs = [] page_sum = 0 for gse_result in gse_results: pdf_name_list = glob.glob("./"+gse_result+"/"+target_pattern) if len(pdf_name_list)>1: for pdf_name in pdf_name_list: pdf = PdfFileReader(open(pdf_name,'rb')) page_sum += pdf.getNumPages() pdf_name_ele = pdf_name.split("/") pdfs.append((pdf_name_ele[1].split("_")[0],pdf_name_ele[2].replace("_dataset_evaluation.pdf",""))) return pdfs,page_sum ########################## gen_evaluation_pdf() for i,j in get_evaluation_pdf()[0]: print(i,"\t",j.replace("_WB"," vs Control").replace("_PBMC"," vs Control"),sep="") ``` #### Source **Documentation link**: [PyPDF2 Documentation](https://pythonhosted.org/PyPDF2/)
Previous Post:
Google Code Jam Qualification Round Q1
Next Post:
Python Covert PNG to PDF
Loading