For a project I needed to decrypt thousands of OpenPGP-encrypted files using Python. This is a task that can be easily sped up by the means of parallel processing to use all the CPU cores in your machine. This can be implemented using for example Python’s multiprocessing Pool. However, the implementation was not as straight forward as I thought.
Let’s start with an UTF-8 encoded plain text file, plaintext.txt
, that we encrypt using gpg on Linux (replace JANE DOE <JANEDOE@EXAMPLE.ORG>
with the actual name and e-mail address used in your GPG key identity – it’s really important to use the exact same name as in your GPG identity):
Using python-gnupg, decrypting this file is straight forward:
So transforming this into a function and harnessing all CPU cores via parallel processing with Pool.map()
should be quite simple, too:
This works. However, when applying this to many files, you will notice that there’s no speed up though we should be using all the CPU cores in parallel instead of just one CPU core. In fact, when we observe the CPU usage with a tool like htop
, we will see that only one CPU core at a time is involved in processing when this script is run. So something with the parallel execution of the function doesn’t work.
I looked a bit into the python-gnupg package and saw that it’s basically a Python wrapper for the gpg
executable, meaning that it executes gpg
as subprocess to en- or decrypt files just as if you ran gpg -d encrypted.gpg
in a terminal. I don’t know exactly what prevents running this in parallel, but I suppose that GPG has some kind of locking mechanism against parallel execution.
I thought that there must be some way to decrypt OpenPGP encrypted files other than using the python-gnupg package and hence the gpg
executable. For one, there’s the cryptography package, but it requires a much deeper understanding of cryptography than I currently have to implement this task using the “low-level cryptographic primitives”. These primitives are found in the “hazardous materials” submodule and it has this name for a reason – keep away if you’re no cryptography expert! The only other useful package I could find was the PGPy package and although the package seems to be no longer maintained since three years, I gave it a try.
First of all, you need to export your GPG secret key (aka private key) to a file in ASCII-format. Make sure to save that file only to a folder that can be solely read by you (e.g. an encrypted home folder). Then run the following and again replace YOUR_KEY_ID
with the key identity used to encrypt the files as above:
Next, we can use that key to decrypt the files using the PGPy package. You would usually protect your GPG key with a password, so this script asks for that password and unlocks the key before it can be used to decrypt the files. Make sure to adapt the path in key_fpath
.
Now this will actually work and decrypt the files in parallel, harnessing all available CPUs and speeding up the decryption, if you’re working with many many encrypted files! Note that the unlocking of the key takes place in the parallelized function which is inefficient, because it is done for each encrypted file instead of just once at program start up. I’ve tried to move the key unlocking code into the main routine and pass the unlocked pgpy.PGPKey
object to the parallelized decrypt
function, but unfortunately pgpy.PGPKey
objects cannot be serialized.
Some word of caution: I’m not a security expert, but I guess you shouldn’t use this script in a shared environment where other users on the same machine can observe your processes as they may snoop the GPG key password that is passed to the forked Python processes. Furthermore, it can be dangerous to rely on a package that seems to be no longer maintained such as PGPy.