Merging PDF Files In Linux Using PyPDF
PyPDF is a handy and valuable Python library for merging and splitting PDF files in Linux. It’s pure Python library built as a PDF toolkit. It is capable of:
- extracting document information (title, author, …),
- splitting documents page by page,
- merging documents page by page,
- cropping pages,
- merging multiple pages into a single page,
- encrypting and decrypting PDF files.
PyPDF is a great Python library use by many Python applications which handles PDF files directly. PDF-Shuffler is a one of the tools written based on PyPDF which you can use to merge PDF files easily in Linux. In Ubuntu you can install it using following command.
Here is a sample code that merge PDF files together using PyPDF library. In this code I have used PdfFileWriter and PdfFileReader classes from PyPDF module to read and append PDF files together. This sample doesn’t contain completed error handling logic for file handling.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from pyPdf import PdfFileWriter, PdfFileReader
def mergePDFFiles(outputFile, filesToBeMerged):
output = PdfFileWriter()
if(len(filesToBeMerged) == 0):
print 'Empty Input File List'
return;
for inFile in filesToBeMerged:
print 'Adding file' + inFile + ' to the out put'
# Read the input PDF file
input = PdfFileReader(file(inFile, "rb"))
# Add every page in input PDF file to output
for page in input.pages:
output.addPage(page)
print 'Writing the final out put to file system'
# Out put stream for output file
outputStream = file(outputFile, "wb")
output.write(outputStream)
outputStream.close()
if __name__ == '__main__':
i = 0
outputFile = ''
inputFiles = []
for arg in sys.argv:
i = i + 1
# Getting out file
if arg == '-o':
outputFile = sys.argv[i]
print 'Output File: ' + outputFile
# Extracting Input files
if arg == '-i':
outfileOptionPos = sys.argv.index('-o')
if i < outfileOptionPos:
inputFiles = sys.argv[i: outfileOptionPos]
filesStr = ",".join(inputFiles).replace(",", " ")
print 'Input Files: ' + filesStr
else:
inputFiles = sys.argv[i:]
filesStr = ",".join(inputFiles).replace(",", " ")
print 'Input Files: ' + filesStr
# Merging PDF files
print 'Merging PDF Files......'
mergePDFFiles(outputFile, inputFiles)
You can get more understanding about usages of PyPDF if you explore more about open source projects which uses PyPDF. Here are some of the projects which use PyPDF.
Related Resources:
January 16, 2010 1 Comment
Linux Utilities You Should Know About
Utility applications in Linux which are(most of them) originally created for Unix and ported to Linux can be used to make every Linux users life easier. It doesn’t matter whether you are a beginner or a hard core Linux geek, if you know the tools it’ll save you lot of time.
In this post I am going to give you introduction to several utilities available on Linux that originated from Unix based on the recent posts by Peteris Krumins. I am planning to update this post time to time once he add new articles about more tools.
Pipe Viewer
Pipe viewer (Written by Andrew Wood) or pv in short can be used inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.
Default Ubuntu installation doesn’t come with this tool. You need to install it using ‘sudo apt-get install pv’ command.
To get start with pv, lets use this to monitor the progress of compressing large file containing giga bytes of information in to a small file. The normal way to do this using gzip is like following.
But this command won’t tell you how much time it takes to compress this file or monitor the progress of compression.
By using pv you can precisely time how long it will take. Take a look at doing the same through pv:
69MB 0:00:04 [ 15MB/s] [==================> ] 74% ETA 0:00:01
Pipe viewer acts as “cat” here, except it also adds a progress bar. We can see that gzip processed 69MB of data in 4 seconds. It has processed 74% of all data and it will take 1 more seconds to finish.
There are several advance usage patterns of pv command and I am not going to cover those.Please refer the article from Peteris to explore more about pv command.
lsof(list open files)
Peteris called this tools as the Swiss Army Knife of Unix Debugging. lsof is a utility command that you can used to list information about files opened by Unix/Linux processes. In Unix/Linux every things is a file: pipes are files, IP sockets are files, unix sockets are files, directories are files, devices are files, inodes are files… So you know the advantage of this kind of tools where every thing is a file.
Using lsof
Here are some common uage scenarios of lsof comand extracted from catonmat.net blog. You must have root permission to get the information about all the open files using lsof. Unless otherwise you’ll only get information set of files which you have permission to access them.
List all open files
Running lsof without any arguments lists all open files by all processes.
Find who’s using a file
With an argument of a path to a file, lsof lists all the processes, which are using the file in some way.
You may also specify several files, which lists all the processes, which are using all the files:
Find all open files in a directory recursively
With the
argument lsof finds all files in the specified directory and all the subdirectories.
There more use cases of lsof command like getting open files by process, open files by users, list all network connections, list all TCP connections, list network activity by user and etc. Please refer blog post from Peteris for those usage scenarios. You can find some examples of lsof from this link also.
December 24, 2009 No Comments