- Working with PDF files in Python
- Subscribe to RSS
- How to Add Watermark to a PDF File Using Python
- watermark 2.0.2
- How to implement various operations of PDF in Python
Subscribe to RSS
How to Add Watermark to a PDF File Using Python
It has become one of the most commonly used data formats ever. Up to PDF version 1. Unfortunately, the features from the newer PDF revisions, such as forms, are tricky to implement, and still require further work to be fully functional in the tools. Using various Python libraries you can create your own application in an comparable easy way. This article is part two of a little series on PDFs with Python. In part one we already gave you an introduction into reading PDF documents using Python, and started with a summary of the various Python libraries. An introduction followed that showed how to manipulate existing PDFs, and how to read and extract the content - both the text and images. Furthermore, we showed you how to split documents into its single pages. In this article you will learn how add images to your PDF in the form of watermarks, stamps, and barcodes. For example this is quite helpful in order to stamp or mark documents that are intended to be read by a specific audience, only, or have a draft quality, or to simply add a barcode for identification purposes. In order for this to work you need to have a background image available that comes with the word "DRAFT" on a transparent layer, which you can apply to an existing single-page PDF as follows:. The pdftk tool takes in the PDF file input. Figure 1 shows the output of this action. For more complex actions, like stamping a document with different stamps per page, have a look at the description at the PDF Labs project page. We also show the stamping use-case in this article below, although our example uses the library pdfrw instead of pdftk. In the example below we start with reading the first page of the original PDF document and the watermark. To read the file we use the PdfFileReader class. As a second step we merge the two pages by using the mergepage method. Finally, we will write the output to the output file. This is done in three steps - creating an object based on the PdfFileWriter class, adding the merged page to this object using the addPage method, and writing the new content to the output page using the write method. In your Python script the module that needs to be imported is named fitzand this name goes back to the previous name of PyMuPDF. For this section we are going to show how to add an image by using a barcode as an example since this is a pretty common task. Although the same steps can be applied to adding any kind of image to a PDF. In order to decorate a PDF document with a barcode we simply add an image as another PDF layer at the desired position. The position of the image is defined as a rectangle using the method fitz. Rect that requires two pairs of coordinates - x1,y1 and x2,y2. PyMuPDF interprets the upper-left corner of the page as 0,0. Having opened the input file and extracted the first page from it, the image containing the barcode is added using the method insertImage. This method requires two parameters - the position delivered via imageRectangleand the name of the image file to be inserted. Using the save method the modified PDF is stored to disk. Figure 2 shows the barcode after it was added to the example PDF. It faithfully reproduces vector formats without rasterization.
The text and pictures of the article are from the Internet, only for learning and communication, and do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us in time for handling. Portable document format, or PDF, is a file format that can be used for cross operating system rendering and document exchange. You can process pre-existing PDFs in Python by using the pypdf2 package. Pypdf2 is a pure Python package that can be used for many different types of PDF operations. This article will show you how to:. The original pypdf package was released in The last official version of pypdf was in About a year later, a company called phasit sponsored a pypdf2 branch of pypdf. This code was written to be backward compatible with the original code, and it has been used for many years, and the effect has been very good. The last version is in There is a short series of packages called pypdf3, and then the project is renamed pypdf4. The original pypdf of Python 3 has a different Python 3 branch, but this branch has not been maintained for many years. Although pypdf2 was recently abandoned, the new pypdf4 does not have full backward compatibility with pypdf2. Feel free to replace the import of pypdf2 with pypdf4 to see how it works. Patrick Maupin created a package called pdfrw that does a lot of the same work as pypdf2. In addition to the special case of encryptionall operations of pypdf2 are mentioned later in this article, and pdfrw can be implemented. The biggest difference with pdfrw is that it integrates with the reportlab package, so you can build a new PDF using some or all of the pre-existing PDFs. Here is how to install pypdf2 using PIP:. We can use pypdf2 to extract metadata and some text from PDFs, especially when performing some types of automation on pre-existing PDF files. You can find a PDF file on your own computer to try. Here is how to write some code using the PDF and learn how to access these properties:. In this example, we call. Getdocumentinfowhich returns an instance of documentinformation, containing most of the information we are interested in. We can also call.