the force that pushes object up
Random header image... Refresh for more!

Merging PDF Files In Linux Using PyPDF

PyPDF is a handy and valuable Python library for merging and splitting PDF files in Linux. It’s pure Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, …),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

PyPDF is a great Python library use by many Python applications which handles PDF files directly. PDF-Shuffler is a one of the tools written based on PyPDF which you can use to merge PDF files easily in Linux. In Ubuntu you can install it using following command.

sudo apt-get install pdfshuffler

Here is a sample code that merge PDF files together using PyPDF library. In this code I have used PdfFileWriter and PdfFileReader classes from PyPDF module to read and append PDF files together. This sample doesn’t contain completed error handling logic for file handling.

# Copyright (C) 2010 Milinda Pathirage

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import sys
from pyPdf import PdfFileWriter, PdfFileReader

def mergePDFFiles(outputFile, filesToBeMerged):
    output = PdfFileWriter()
   
    if(len(filesToBeMerged) == 0):
        print 'Empty Input File List'
        return;
   
    for inFile in filesToBeMerged:
        print 'Adding file' + inFile + ' to the out put'
        # Read the input PDF file
        input = PdfFileReader(file(inFile, "rb"))
        # Add every page in input PDF file to output
        for page in input.pages:
            output.addPage(page)
    print 'Writing the final out put to file system'
    # Out put stream for output file        
    outputStream = file(outputFile, "wb")
    output.write(outputStream)
    outputStream.close()
   

if __name__ == '__main__':
    i = 0
    outputFile = ''
    inputFiles = []
    for arg in sys.argv:
        i = i + 1
       
        # Getting out file
        if arg == '-o':
            outputFile = sys.argv[i]
            print 'Output File: ' + outputFile
           
        # Extracting Input files
        if arg == '-i':
            outfileOptionPos = sys.argv.index('-o')
            if i < outfileOptionPos:
                inputFiles = sys.argv[i: outfileOptionPos]
                filesStr = ",".join(inputFiles).replace(",", " ")
                print 'Input Files: ' + filesStr
            else:    
                inputFiles = sys.argv[i:]
                filesStr = ",".join(inputFiles).replace(",", " ")
                print 'Input Files: ' + filesStr
               
    # Merging PDF files
    print 'Merging PDF Files......'            
    mergePDFFiles(outputFile, inputFiles)

You can get more understanding about usages of PyPDF if you explore more about open source projects which uses PyPDF. Here are some of the projects which use PyPDF.

Related Resources:

January 16, 2010   1 Comment

Linux Utilities You Should Know About

Utility applications in Linux which are(most of them) originally created for Unix and ported to Linux can be used to make every Linux users life easier. It doesn’t matter whether you are a beginner or a hard core Linux geek, if you know the tools it’ll save you lot of time.

In this post I am going to give you introduction to several utilities available on Linux that originated from Unix based on the recent posts by Peteris Krumins. I am planning to update this post time to time once he add new articles about more tools.

Pipe Viewer

Pipe viewer (Written by Andrew Wood) or pv in short can be used inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Default Ubuntu installation doesn’t come with this tool. You need to install it using ‘sudo apt-get install pv’ command.

To get start with pv, lets use this to monitor the progress of compressing large file containing giga bytes of information in to a small file. The normal way to do this using gzip is like following.

gzip -c x.avi &gt; x.avi.gz

But this command won’t tell you how much time it takes to compress this file or monitor the progress of compression.

By using pv you can precisely time how long it will take. Take a look at doing the same through pv:

pv x.avi | gzip &gt; x.avi.gz

69MB 0:00:04 [  15MB/s] [==================&gt;         ]   74%  ETA 0:00:01

Pipe viewer acts as “cat” here, except it also adds a progress bar. We can see that gzip processed 69MB of data in 4 seconds. It has processed 74% of all data and it will take 1 more seconds to finish.

There are several advance usage patterns of pv command and I am not going to cover those.Please refer the article from Peteris to explore more about pv command.

lsof(list open files)

Peteris called this tools as the Swiss Army Knife of Unix Debugging. lsof is a utility command that you can used to list information about files opened by  Unix/Linux processes. In Unix/Linux every things is a file: pipes are files, IP sockets are files, unix sockets are files, directories are files, devices are files, inodes are files… So you know the advantage of this kind of tools where every thing is a file.

Using lsof

Here are some common uage scenarios of lsof comand extracted from catonmat.net blog. You must have root permission to get the information about all the open files using lsof. Unless otherwise you’ll only get information set of files which you have permission to access them.

List all open files

Running lsof without any arguments lists all open files by all processes.

# lsof

Find who’s using a file

With an argument of a path to a file, lsof lists all the processes, which are using the file in some way.

# lsof /path/to/file

You may also specify several files, which lists all the processes, which are using all the files:

# lsof /path/to/file1 /path/to/file2

Find all open files in a directory recursively

With the 

+D

argument lsof finds all files in the specified directory and all the subdirectories.

# lsof +D /usr/lib

There more use cases of lsof command like getting open files by process, open files by users, list all network connections, list all TCP connections, list network activity by user and etc. Please refer blog post from Peteris for those usage scenarios. You can find some examples of lsof from this link also.

December 24, 2009   No Comments