Working with PDFs in Go
What do you want to do with a PDF?
Creating & Editing PDFs
Reading PDFs
1. ledongthuc/pdf
2. PaddleOCR
Viewing or Rendering PDFs

Friday, Aug 8, 2025

# Working with PDFs in Go

PDFs are a great way to read books, contracts, or vacuum cleaner manuals on your computer. The PDF format embeds everything you need to view the document, including fonts, images, formatting, form fields, page size, etc. They're convenient to print. And thanks to browser developers, everyone with a computing device from the past 10 years or so can read PDFs without extra software.

If you're a programmer, though, PDFs are a pain. The spec is proprietary (sold by ISO for around $300 US) and clocks in over 1000 pages. (You can read an older version from Adobe for free.)

Since you don't have 10 years to implement a PDF parser from scratch, you're going to want to use a library. But which one(s)? I'll cover a few Go libraries that you might want to use.

# What do you want to do with a PDF?

I've considered four discrete PDF-related tasks, based on things I've had to do:

Creating a PDF from scratch
Editing an existing PDF (removing pages, merging two PDFs together, or inserting content)
Reading content from a PDF
Rendering or viewing a PDF, such as converting a page of PDF to PNG

Many libraries are only capable of performing one or two of these activities.

# Creating & Editing PDFs

You may want to create PDFs from scratch for invoicing or reporting, or modify existing PDFs to fill in forms, merge documents, extract pages, etc.

# `pdfcpu`

When working with PDFs in Go, all roads lead to the pdfcpu/pdfcpu, which started in 2018 and is still actively developed. It has broad support for creating, editing, merging, and extracting content from PDF files.

Pure Go and licensed permissively via Apache 2.0.

# `jung-kurt/gofpdf`

jung-kurt/gofpdf is a PDF creation library written in pure Go. It has been unmaintained since 2021. However, if you are working primarily with your own PDFs and don't need to chase modern compatibility, it will likely continue working for you for quite some time.

Pure Go and MIT licensed.

# `llgcode/draw2d`

llgcode/draw2d is a 2d drawing library for Go. Its draw2dpdf package can be used to draw vector diagrams directly into PDFs, via gofpdf.

Pure Go and uses BSD 2-Clause license.

# `johnfercher/maroto`

johnfercher/maroto is a PDF creation library built on top of gofpdf. It provides a higher level API for creating PDF layouts by adding rows, columns, and content to your document.

It provides convenience functions for headers, footers, page numbers, barcodes and QR codes, table styling, and more. If you need to generate PDFs from scratch I would try this one first.

Pure Go and MIT licensed.

# Reading PDFs

Reading PDF data is generally a bit simpler than editing or creation.

You can use the aforementioned pdfcpu or gofpdf libraries to extract text, but there are some other, more specific options.

# ledongthuc/pdf

Russ Cox, one of Go's core developers to many years, wrote one of the earliest Go PDF libraries back in 2014. The original was archived, but the project continues on today as ledongthuc/pdf.

This library can extract text and formatting only. If you simply want to read text content inside a PDF, though, this may be adequate.

# PaddleOCR

Sometimes you encounter PDFs with substantial amounts of text that cannot be extracted using PDF parsing. For example, the text maybe be in images, graphics, or tables that can't be extracted from the PDF's text data, easily or at all.

In this case you'll need to use a technique called optical character recognition to pull out the data. OCR identifies text by identifying the shapes of letters in pixel data, with the help of machine learning models.

PaddleOCR is an excellent C++ library that performs OCR. It uses CUDA and GPU hardware for better performance, if you have them. However, PaddleOCR also provides a python SDK which is quite easy to use.

There are numerous Go wrappers / bindings for PaddleOCR which I have not used, so I can't comment on which ones are good.

Licensed permissively via Apache 2.0.

# Viewing or Rendering PDFs

The primary use case for rendering a PDF with Go is to generate thumbnails of the document or of individual pages.

Rendering PDFs is a complex process so there are fewer choices for these libraries, and they are almost exclusively written in C or C++.

# `h2non/bimg`

The easiest library I've come across for creating thumbnails of PDFs is h2non/bimg, which is based on libvips. libvips supports conversion between a ton of formats, including JPEG,TIFF, PNG, WebP, HEIC, AVIF, PDF, SVG, GIF, and a dozen others I've never heard of.

bimg is great and I have had the best results with this with the least effort. Here's an example of creating a thumbnail:

func convertToPng(input []byte) ([]byte, error) {
    png, err := bimg.NewImage(input).Convert(bimg.PNG)
    if err != nil {
        return nil, fmt.Errorf("err during convert: %w", err)
    }
    
    imageType := bimg.NewImage(png).Type()
    if imageType != "png" {
        return nil, fmt.Errorf("unexpected image type: %s", imageType)
    }
    
    return png, nil
}

There are two caveats with bimg: First, the development experience is great on Linux where you can install libvips from package managers, or use the distro C/C++ build toolchain with go build. Second, while bimg is MIT licensed, libvips is LGPL licensed, which may be a red flag for you.

# `sunshineplan/imgconv`

sunshineplan/imgconv provides pure Go image conversions, such as converting TIFF to PNG. It includes functionality convert PDFs to PNG, too.

However, unlike other libraries that render PDFs, sunshineplan/imgconv simply uses pdfcpu/pdfcpu to extract the first image it finds in the document. If you are trying to snag the cover from an ebook this could be adequate, but it will not work on text-only PDFs like reports, resumes, or research papers.

Here's an example of extracting an image from a PDF:

func convertToPng(input []byte) ([]byte, error) {
	img, err := imgconv.Decode(bytes.NewReader(input))
	if err != nil {
		return nil, fmt.Errorf("error decoding preview image: %s", err)
	}

	imgBuf := &bytes.Buffer{}
	if err := imgconv.Write(imgBuf, img, &imgconv.FormatOption{
		Format: imgconv.PNG,
	}); err != nil {
		return nil, fmt.Errorf("error encoding preview image: %s", err)
	}

	return imgBuf.Bytes(), nil
}

At the time of writing, the release version of sunshineplan/imgconv panics if you try to convert a PDF that does not have any images. The code is fixed on the relevant master branch but it's not yet tagged.

This could be an efficient first pass before falling back to full rendering using another library.

# `go-pdfium`

pdfium is the component from the Chrome browser responsible for rendering PDFs. Folks at Klippa have written a wrapper library called go-pdfium that embeds the pdfium component into Go programs to render PDFs.

pdfium is written in C++, so the build pipeline for go-pdfium is somewhat complex. However, the go-pdfium team have also built a platform-agnostic WASM version which supports Go cross-compilation and runs via Wazero. It's provided as a magic blob of WASM, but if you want to follow along with patches to pdfium-binaries and emscripten (the WASM compiler component for LLVM) you can compile the WASM blob yourself. You can also use the library via CGO.

go-pdfium, pdfium, and wazero are licensed permissively under the MIT, BSD, and Apache 2.0 licenses, respectively.

Using the WASM approach, go-pdfium is essentially the only pure Go PDF renderer I've found.

Obviously, the non-Go parts of pdfium are not strictly pure Go, but Wazero allows you to compile and deploy a single binary without CGO or dynamic linking, which vastly simplifies deployment and distribution.

If you deploy your software only to Linux servers or with Docker, this is less of a concern, but for development or distribution across multiple operating systems, using the native Go toolchain vastly simplifies the whole process.

# `go-fitz`

gen2brain/go-fitz is a wrapper around MuPDF, which can render PDFs (and other formats like EPUB and MOBI) to images or even convert them to HTML.

I mention this library for completeness, but both go-fitz and mupdf are licensed under the AGPL, which is a cumbersome license.

# Non-Go Rendering Options

There are a variety of ways to render PDFs outside of Go. You can still invoke these from Go programs using the os/exec package.

I have used WeasyPrint and wkhtmltopdf in the past. These tools use HTML as input and produce a PDF. WeasyPrint is a python library (rendering in Python) and wkhtmltopdf uses QtWebKit to render. These work quite well in my experience. There's even a Go wrapper for wkhtmltopdf.

There are several Go libraries which connect to a Chrome browser process and use Chrome to render PDFs. For example, chromedp/chromedp, mafredri/cdp, and rapid7/pdf-renderer. If combined with foliate-js you could use this to render MOBI and EPUB formats as well.

The Node.js package pdf-to-png-converter is built on top of pdf.js, the PDF viewer built into Firefox, and the Skia graphics library, which powers Chrome and Firefox, by way of Brooooooklyn/canvas. As the name suggests, it converts PDFs to PNGs.

Many of the PDF libraries I reviewed, which claimed to be for Golang, were actually client libraries to web APIs that did remote PDF processing. I excluded these for privacy and cost reasons, but you can search for them if you like.

Finally, an approach which I have considered but have not tried is simply to use the user's browser to render the PDF. For example, to generate a thumbnail, load a PDF via mozilla/pdf.js and then capture an image of the PDF from the user's browser. Using the DataTransfer API, PDFs can be thumbnailed locally before they are uploaded. Of course, remember that client behavior is not trustworthy, so this may not be an appropriate solution for you.

Related Notes

golang

The Magic of Go Workspaces

# Working with PDFs in Go

# What do you want to do with a PDF?

# Creating & Editing PDFs

# pdfcpu

# jung-kurt/gofpdf

# llgcode/draw2d

# johnfercher/maroto