The main increase is the -a
flag which means it base64 encodes your ciphertext.
From man enc
:
NAME
enc - symmetric cipher routines
SYNOPSIS
openssl enc -ciphername [-in filename] [-out filename] [-pass arg] [-e] [-d] [-a] [-A] [-k password]
[-kfile filename] [-K key] [-iv IV] [-p] [-P] [-bufsize number] [-nopad] [-debug]
[...]
-a base64 process the data. This means that if encryption is taking place the data is base64 encoded
after encryption. If decryption is set then the input data is base64 decoded before being
decrypted.
Base64 encoding means that for every three bytes of binary data (a byte is an 8-bit number meaning it has value 0 to 28-1=255) you have are encoded in four bytes of 6-bit data (with value 0 to 26-1=63, though represented in printable ASCII symbols). Base64 is convenient as the symbols for the 64 values can be chosen to be printable ASCII characters (e.g., typically 0='A',1='B',...25='Z',26='a',...51='z',52='0',...,61='9',62='+',63='/' though the last two often are defined differently in different variants). Note three bytes 8*3 has 24 bits, as does four groups of base64 encoded numbers 6*4.
For example if your ciphertext was three bytes (in hexadecimal): f0 bb 5c
(240, 187, 92) in binary the bits grouped into three bytes would be:
11110000 10111011 01011100
in base64 it would be the same bits, except grouped into four groups of 6 bits:
111100 001011 101101 011100
which map to the values 60, 11, 45, 28, which on a typical base64 table would map to the printable ASCII characters 8Ltc
which will take four bytes on the disk (instead of the three bytes it would have taken without base64 encoding).
Thus base64 encoding should account for roughly a 33% file increase. It's slightly more than that as openssl also adds a newline characters every 64 characters of base64 encoded ascii (so the text wraps at 64 bytes). These two features together account for a general file size increase of (4/3 * 65/64 - 1) = 35.4%
There's also a bit of overhead from your scheme. Specifying -salt
takes your plaintext password and concatenates a random eight byte salt to the message along with a header Salted__
specifying that a salt was used, and these will also be base64 encoded. (The purpose of the salt is to make it less cost-effective for an attacker to pre-compute rainbow tables for common passwords). If I encrypt a random file in your scheme (specifying the salt as DEADBEEFDEADBEEF
using openssl enc -aes-256-cbc -a -salt -S DEADBEEFDEADBEEF
the first row of my encrypted file was
U2FsdGVkX1/erb7v3q2+7ybJfdPaLlVzOp7lKpOljvNK8ONCrgFrQpaJHQ8EqO1X
which decodes to (using python):
>>> import base64
>>> base64.b64decode("U2FsdGVkX1/erb7v3q2+7ybJfdPaLlVzOp7lKpOljvNK8ONCrgFrQpaJHQ8EqO1X")
'Salted__\xde\xad\xbe\xef\xde\xad\xbe\xef&\xc9}\xd3\xda.Us:\x9e\xe5*\x93\xa5\x8e\xf3J\xf0\xe3B\xae\x01kB\x96\x89\x1d\x0f\x04\xa8\xedW'
So combining the base64 encoding, the linebreaks, the salt, the initialization vector (for CBC mode), and the padding (to be evenly divisible to be 128-bit blocks for AES), an overhead of ~35% seems perfectly reasonable.
EDIT: Actually, openssl
doesn't store an initialization vector when deriving a key from a password with a salt. From man enc
: "When a password is being specified using one of the other options, the IV is generated from this password.". Using this and doing a couple test files, the file sizes match up perfectly. The salt 8 bytes plus Salted__
adds 16 bytes to the file. The file is padded to be a multiple of 16 bytes (adding at most 16 bytes). If you don't base64 encode the file size match up perfectly, and then you can get a file that exactly matches the base64 version if you then apply base64 --wrap=64
.