the force that pushes object up
Random header image... Refresh for more!

Merging PDF Files In Linux Using PyPDF

PyPDF is a handy and valuable Python library for merging and splitting PDF files in Linux. It’s pure Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, …),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

PyPDF is a great Python library use by many Python applications which handles PDF files directly. PDF-Shuffler is a one of the tools written based on PyPDF which you can use to merge PDF files easily in Linux. In Ubuntu you can install it using following command.

sudo apt-get install pdfshuffler

Here is a sample code that merge PDF files together using PyPDF library. In this code I have used PdfFileWriter and PdfFileReader classes from PyPDF module to read and append PDF files together. This sample doesn’t contain completed error handling logic for file handling.

# Copyright (C) 2010 Milinda Pathirage

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import sys
from pyPdf import PdfFileWriter, PdfFileReader

def mergePDFFiles(outputFile, filesToBeMerged):
    output = PdfFileWriter()
   
    if(len(filesToBeMerged) == 0):
        print 'Empty Input File List'
        return;
   
    for inFile in filesToBeMerged:
        print 'Adding file' + inFile + ' to the out put'
        # Read the input PDF file
        input = PdfFileReader(file(inFile, "rb"))
        # Add every page in input PDF file to output
        for page in input.pages:
            output.addPage(page)
    print 'Writing the final out put to file system'
    # Out put stream for output file        
    outputStream = file(outputFile, "wb")
    output.write(outputStream)
    outputStream.close()
   

if __name__ == '__main__':
    i = 0
    outputFile = ''
    inputFiles = []
    for arg in sys.argv:
        i = i + 1
       
        # Getting out file
        if arg == '-o':
            outputFile = sys.argv[i]
            print 'Output File: ' + outputFile
           
        # Extracting Input files
        if arg == '-i':
            outfileOptionPos = sys.argv.index('-o')
            if i < outfileOptionPos:
                inputFiles = sys.argv[i: outfileOptionPos]
                filesStr = ",".join(inputFiles).replace(",", " ")
                print 'Input Files: ' + filesStr
            else:    
                inputFiles = sys.argv[i:]
                filesStr = ",".join(inputFiles).replace(",", " ")
                print 'Input Files: ' + filesStr
               
    # Merging PDF files
    print 'Merging PDF Files......'            
    mergePDFFiles(outputFile, inputFiles)

You can get more understanding about usages of PyPDF if you explore more about open source projects which uses PyPDF. Here are some of the projects which use PyPDF.

Related Resources:

January 16, 2010   1 Comment