Parallel decryption and encryption with GPG in Python

Posted at 28 Jan 2025
Tags: python, encryption, gpg, security

For a project I needed to decrypt thousands of OpenPGP-encrypted files using Python. This is a task that can be easily sped up by the means of parallel processing to use all the CPU cores in your machine. This can be implemented using for example Python’s multiprocessing Pool. However, the implementation was not as straight forward as I thought.

Let’s start with an UTF-8 encoded plain text file, plaintext.txt, that we encrypt using gpg on Linux (replace JANE DOE <JANEDOE@EXAMPLE.ORG> with the actual name and e-mail address used in your GPG key identity – it’s really important to use the exact same name as in your GPG identity):

gpg --encrypt --sign -r 'JANE DOE <JANEDOE@EXAMPLE.ORG>' < plaintext.txt > encrypted.gpg

Using python-gnupg, decrypting this file is straight forward:

from gnupg import GPG
gpg = GPG()
decrypted = gpg.decrypt_file('encrypted.gpg')
if decrypted.ok:
    plain = decrypted.data.decode('utf-8')
    print(f'decrypted message: {plain}')

So transforming this into a function and harnessing all CPU cores via parallel processing with Pool.map() should be quite simple, too:

import multiprocessing as mp
from glob import glob

from gnupg import GPG


def decrypt(fpath):
    gpg = GPG()

    decrypted = gpg.decrypt_file(fpath)
    if decrypted.ok:
        return fpath, decrypted.data.decode('utf-8')
    else:
        raise SystemError('failed to decrypt file')


if __name__ == '__main__':
    encrypted_files = glob('*.gpg')
    assert len(encrypted_files) > 0
    n_workers = min(len(encrypted_files), mp.cpu_count())
    print(f'will use {n_workers} worker processes') 
    with mp.Pool(processes=n_workers) as pool:
        results = pool.map(decrypt, encrypted_files)
    print('results:')
    print(results)

This works. However, when applying this to many files, you will notice that there’s no speed up though we should be using all the CPU cores in parallel instead of just one CPU core. In fact, when we observe the CPU usage with a tool like htop, we will see that only one CPU core at a time is involved in processing when this script is run. So something with the parallel execution of the function doesn’t work.

I looked a bit into the python-gnupg package and saw that it’s basically a Python wrapper for the gpg executable, meaning that it executes gpg as subprocess to en- or decrypt files just as if you ran gpg -d encrypted.gpg in a terminal. I don’t know exactly what prevents running this in parallel, but I suppose that GPG has some kind of locking mechanism against parallel execution.

I thought that there must be some way to decrypt OpenPGP encrypted files other than using the python-gnupg package and hence the gpg executable. For one, there’s the cryptography package, but it requires a much deeper understanding of cryptography than I currently have to implement this task using the “low-level cryptographic primitives”. These primitives are found in the “hazardous materials” submodule and it has this name for a reason – keep away if you’re no cryptography expert! The only other useful package I could find was the PGPy package and although the package seems to be no longer maintained since three years, I gave it a try.

First of all, you need to export your GPG secret key (aka private key) to a file in ASCII-format. Make sure to save that file only to a folder that can be solely read by you (e.g. an encrypted home folder). Then run the following and again replace YOUR_KEY_ID with the key identity used to encrypt the files as above:

gpg --export-secret-key --armor 'YOUR_KEY_ID' > PATH_TO_STORE_YOUR_SECRET_KEY.asc

Next, we can use that key to decrypt the files using the PGPy package. You would usually protect your GPG key with a password, so this script asks for that password and unlocks the key before it can be used to decrypt the files. Make sure to adapt the path in key_fpath.

import multiprocessing as mp
from glob import glob
from getpass import getpass
from functools import partial

import pgpy


def decrypt(fpath, key_fpath, key_passwd):
    msg = pgpy.PGPMessage.from_file(fpath)
    key_loaded, _ = pgpy.PGPKey.from_file(key_fpath)

    with key_loaded.unlock(key_passwd) as key:
        decrypted = key.decrypt(msg).message

    return fpath, decrypted.decode('utf-8')


if __name__ == '__main__':
    encrypted_files = glob('*.gpg')
    assert len(encrypted_files) > 0

    key_fpath = '/PATH/TO/YOUR/SECRET/KEY/IN/ASCII/FORMAT'
    key_passwd = getpass("Please provide the GPG key password: ")
    decrypt_w_key = partial(decrypt, key_fpath=key_fpath, key_passwd=key_passwd)

    n_workers = min(len(encrypted_files), mp.cpu_count())
    print(f'will use {n_workers} worker processes') 
    with mp.Pool(processes=n_workers) as pool:
        results = pool.map(decrypt_w_key, encrypted_files)
    print('results:')
    print(results)

Now this will actually work and decrypt the files in parallel, harnessing all available CPUs and speeding up the decryption, if you’re working with many many encrypted files! Note that the unlocking of the key takes place in the parallelized function which is inefficient, because it is done for each encrypted file instead of just once at program start up. I’ve tried to move the key unlocking code into the main routine and pass the unlocked pgpy.PGPKey object to the parallelized decrypt function, but unfortunately pgpy.PGPKey objects cannot be serialized.

Some word of caution: I’m not a security expert, but I guess you shouldn’t use this script in a shared environment where other users on the same machine can observe your processes as they may snoop the GPG key password that is passed to the forked Python processes. Furthermore, it can be dangerous to rely on a package that seems to be no longer maintained such as PGPy.

If you spotted a mistake or want to comment on this post, please contact me: post -at- mkonrad -dot- net.
← “GPU Acceleration for Llama.cpp Python Bindings on Linux with Intel Hardware
View all posts
Restricted evaluation of user input in R” →