46

According to the ownCloud documentation, if you enable encryption, file sizes can be ~35% larger than their unencrypted forms.

From my understanding of encryption, the file sizes should be more-or-less identical (perhaps some padded 0 bits at the end to make it a multiple of the key size).

Is that incorrect? If not, why?

warren
  • 679
  • 2
  • 9
  • 16
  • 2
    depends on the format that is used for encryption. Data expansion varies from 28 bytes to much more if you put more stuff in headers and footers (algorithm identifiers, key derivation parameters, salts, encrypted keys, ...) – SEJPM Mar 10 '16 at 18:41
  • 1
    @SEJPM but those are all small-ish (at least compared to, say, a video file), and do not scale with the amount of data being encrypted. I could see that stuff being ~35% of a single photo, but not ~35% of a folder full of photos, or a video file. – Mike Ounsworth Mar 10 '16 at 18:53
  • 4
    Possible duplicate of [Why does openssl enc -aes-256-cbc -a -salt increase the file size?](http://security.stackexchange.com/questions/58351/why-does-openssl-enc-aes-256-cbc-a-salt-increase-the-file-size) – dr jimbob Mar 10 '16 at 19:11
  • 3
    I wouldn't want a file to be the same size encrypted. That discloses information. Even if the size is consistently _almost_ the same, it discloses information about the contents. – kojiro Mar 11 '16 at 00:58
  • 9
    @kojiro So does a file size that's consistently about 35% bigger. – user253751 Mar 11 '16 at 07:56
  • @immibis true. I would expect a distribution of sizes. You would then need several differently sized encryptions of the same file to build an analysis of the probable actual size. – kojiro Mar 11 '16 at 11:47
  • It's a bit of ass covering combined with worst-case assumption. "Any encrypted file *can* be *up to* 35% larger than the original." – Shadur Mar 11 '16 at 13:34
  • Note: There also cryptosystems that may increase the amount of data (even hundreds of times) Example: https://en.m.wikipedia.org/wiki/Goldwasser–Micali_cryptosystem – Daniel Jour Mar 11 '16 at 20:37
  • Why would it not? – Evan Carslake Mar 12 '16 at 17:01
  • 1
    @MikeOunsworth I'm guessing that "35%" is based on the average size of the kinds of files that they're expecting you to store on ownCloud. If you have larger files, I imagine that the percentage would be smaller; if you have very small files, it could well be larger. EDIT: seems that the answer has been found - still a good point to bear in mind though. – micheal65536 Mar 12 '16 at 21:17
  • @drjimbob - while that other question happens to have an apparently similar answer, it is not the same – warren Mar 14 '16 at 14:56
  • @warren This question is "Why would an encrypted file be 35% larger than an unencrypted one?" and the other is "why does increase the file size ?" It doesn't matter if its different products using encryption (though owncloud calls openssl functions through PHP's `openssl_encrypt`), both boil down to "Why does this encryption increase file size by about 33.3-35.4%?" Because of base64 encoding the ciphertext. Lots of apps use base64 encoding after encryption, it would be silly to keep repeating the same question in every separate case. – dr jimbob Mar 14 '16 at 18:05
  • @drjimbob, I disagree with your analysis: especially since figuring out the "why" is not something I was qualified to do. Understanding that some products do this (even if they do it in different ways) is important - and asking similar questions from different directions help put together a stronger body of knowledge surrounding the topic :) – warren Mar 14 '16 at 19:29
  • I dont get ownCloud. They say that they use AES256 but that algorithm does only increase the size by a very little bit (a few bytes). Plus i wasnt able to verify this on my own ownCloud machine – BlueWizard Apr 04 '16 at 14:21
  • @JonasDralle - did you not read the accepted answer that indicates they're BASE64 converting it ahead of AES256? – warren Apr 04 '16 at 16:41

3 Answers3

82

Most likely, the encrypted file is base64 encoded which would account for 33.3% file increase (you encode three bytes of data in four bytes of base64 data). Inserting a new line every 64 characters to make it easier to read (as is done by ASCII armor in openssl, GPG, PGP) will increase the size by 65/64.

Combining these two effects results in the new file being (4/3)*(65/64) = 135.4% of the size of the original or an increase in file size of 35.4%.

I've gone through the calculation in this answer here.

You are correct though that encryption should not need to significantly change the file size. It possibly adds a couple blocks of data if there is a header, an initialization vector/nonce, some padding to make it a full block and/or MAC to check integrity, though these changes will be insignificant for large files (e.g., adding four blocks to an AES encoded file that is 1 MB would make the file 0.006% larger).

However, to not increase the files size, you need to be fine with storing and passing around the encrypted data as an arbitrary binary. Arbitrary binaries are often blocked over email to prevent spreading computer viruses, and are often difficult to open outside of hexeditors. Base64 encoded files are easier to pass around and is a more portable format than binary files of an unknown file type.

dr jimbob
  • 38,936
  • 8
  • 92
  • 162
  • 13
    Why does owncloud do this? – Moby Disk Mar 10 '16 at 21:19
  • Any references to the source code? – Deer Hunter Mar 10 '16 at 21:50
  • 10
    @DeerHunter - I've never used OwnCloud. Briefly looking at their source code it seems they use the [poorly documented PHP function `openssl_encrypt`](http://php.net/manual/en/function.openssl-encrypt.php) to do the bulk of their encryption work. The fourth parameter [`$options` is hard-coded to false](https://github.com/owncloud/core/blob/90810cc6052b38ac03dd8f08a200c5928a355d20/apps/encryption/lib/crypto/crypt.php#L204-L219) in owncloud's source code. The parameter `$options` used to be called `$raw_output` and when it's set to false it base64 encodes the ciphertext output. – dr jimbob Mar 11 '16 at 00:50
  • 9
    Playing around with `openssl_encrypt` in a PHP fiddle, it seems to base64 encode the data (and not insert linebreaks every 64 characters), but as php.net calls this an undocumented function, it wouldn't surprise me if this changed between PHP versions (if it's not 100% clear, I am not a PHP fan). Without the linebreaks, I'd expect encrypted files to be consistently 33.33% larger up to 1-3 blocks for padding, an IV, and a MAC. So saying 35% may just be rounding up for safety safe (and I shouldn't have assumed line breaks). – dr jimbob Mar 11 '16 at 00:56
  • 22
    @drjimbob Being utterly clueless about details like this pretty much ensures I will never trust ownCloud for any kind of security guarantees ever. – Stephen Touset Mar 11 '16 at 01:22
  • @MobyDisk - Maybe using binary files caused issues (e.g., used across firewalls that block random binary blobs)? Or just initially used as base64 is the default for PHP's openssl_encrypt/openssl_decrypt? There's a [ticket](https://github.com/owncloud/core/issues/10831) to migrate to binary encryption from Sep 2014, though it's still open. Going to binary just requires switching `false` to `true` at [L208](https://github.com/owncloud/core/blob/90810cc6052b38ac03dd8f08a200c5928a355d20/apps/encryption/lib/crypto/crypt.php#L208) and L574 of crypt.php though backward compatibility makes it harder. – dr jimbob Mar 11 '16 at 02:17
  • 1
    @drjimbob - thanks for finding the exact file, this 'feature' alone dissuades people from using ownCloud. – Deer Hunter Mar 11 '16 at 05:10
  • 2
    @drjimbob No need to speculate, or make assumptions about PHP changing behaviour in incompatible ways; the default for that function has always been to output in base64. The boolean was changed to a flags param to avoid lengthening the signature, in this commit, in a completely backward-compatible way: https://github.com/php/php-src/commit/9e7ae3b2d0e942b816e3836025456544d6288ac3 The intention was probably to match hash functions (md5, sha1, etc) which output in printable form by default. – IMSoP Mar 11 '16 at 16:54
  • @IMSoP When openssl is called from command line to encrypt with base64 encoding (-a flag), it inserts `\n` every 64 chars (see: `openssl enc -aes-128-cbc -a -in file2encrypt`). I expected a function named `openssl_encrypt` with flags set to base64 to act similarly. I don't have a problem maintaining backward compatibility, but the lack of documentation explaining the options=false means base64 encoded is a problem. The only documentation is `options can be one of OPENSSL_RAW_DATA, OPENSSL_ZERO_PADDING` which gives no indication of what happens when options=false unlike raw_output=false. – dr jimbob Mar 11 '16 at 20:15
  • 1
    @drjimbob Well, false is the same as zero, so means "no flags". But yes, it would be good if someone [edited the documentation](https://edit.php.net/?project=PHP&perm=en/function.openssl-encrypt.php) to clarify. – IMSoP Mar 12 '16 at 10:19
  • @IMSoP - While 0 and false have the same value, they have different semantic meanings. I understand options is now a bit mask and the LSB of options being 0 means base64 encode. I just think the design and documentation is poor. The documentation should state default behavior (b64 encode and PKCS#7 padding). You should never have to consult the language's source code to understand how the parameters of a built-in functions works. Maybe add a new constant `OPENSSL_BASE64_PKCS7` (value 0) and document the difference between zero padding and PKCS7 padding. Also rename password to key. – dr jimbob Mar 12 '16 at 17:02
  • 1
    @MobyDisk - Don't know about ownCloud, but I've done the same thing in the past. For instance, when storing password hashes in a DB. Because although the database/underlying storage layer supports directly storing binary data, many of its API's for working with binary data are clunky/less convenient than the API's for dealing with text data. Thus it's more convenient to base64-encode the binary stuff and store it as text in cases where space-efficiency isn't the overriding concern. – aroth Mar 13 '16 at 01:36
  • @DeerHunter - I bet this *doesn't* dissuade people from using ownCloud, any more than not knowing how other cloud storage systems handle their encryption (or if they even do any). – warren Mar 14 '16 at 14:55
7

If the files are being compressed then you might see this discrepancy.

Compression algorithms work best on non-random data. Encryption aims to generate randomness from information. Information is generally easy to compress as it has patterns. However, if you encrypt it, you are generally erasing any patterns (and information).

Example: 2.75GB of email archive files can be easily compressed down to <.5GB. If these email archives were encrypted, however, then the compressed version would be much closer to 2.75GB.

d1str0
  • 2,348
  • 14
  • 24
  • 6
    This is true, but you can get around that by just encrypting the compressed collection of files. The fact that it increase by roughly 35% indicates its base-64 encoding with linebreaks to avoid sending passing around arbitrary binary files of encrypted data. If it had to do with compression, there's no reason to suspect it would always be 35%. (Instead of sometimes being 90% difference as in your example). – dr jimbob Mar 10 '16 at 19:20
  • Definitely agree. But ignoring the specific product and answering OP's question, this is one example where encryption might be leading to larger files. – d1str0 Mar 10 '16 at 19:27
  • 1
    Owncloud is an enterprise file-sharing application (which will store arbitrary types of documents, which may be highly redundant or not). Their [documentation](https://doc.owncloud.org/server/8.1/admin_manual/configuration_files/encryption_configuration.html) says that "Encrypting files increases their size by roughly 35%, so you must take this into account when you are provisioning storage and setting storage quotas. User quotas are based on the unencrypted file size, and not the encrypted file size." So this isn't talking about their specific files, but user files being 35% larger. – dr jimbob Mar 10 '16 at 19:43
  • 1
    Well this is sec.se not crypto and I read and answered the question in the title "Why would an encrypted file be ~35% larger than an unencrypted one?". Though I guess you fairly answered the question in the body "From my understanding of encryption, file *sizes* should be more-or-less identical (perhaps some padded 0 bits at the end to make it a multiple of the key size). Is that incorrect?" – dr jimbob Mar 10 '16 at 19:50
  • Lol. I'm an idiot. The app doesn't desperate the two sites very well. – d1str0 Mar 10 '16 at 19:50
  • @drjimbob also I fully agree with your answer, and it's the most likely by far. Again, just offering an alternative given OP's question. – d1str0 Mar 10 '16 at 19:51
  • No - all compression may factor-in would be whether or not files are smaller after being compressed after being encrypted (as you correctly point out, compression relies on non-randomness). As others have said, compressing-then-encrypting doesn't account for the size differences whereas Base64 does – warren Mar 14 '16 at 19:33
  • You could compress it *before* encryption – BlueWizard Apr 04 '16 at 14:23
4

Normally, the % mark says that the file might be Base64 encoded after encrypting, and also might get some checksum over each block to prevent corruption. Base64 encodes characters of 8 bits into characters of 6 bits, which means the file in question gets about 30 % larger due to more charachters required to render the whole file. Add a per-block checksum and you are up to 35 %.

Normally, the encryption itself adds some overhead. Normally, the overhead is header+footer, eventual encrypted key, parameters, salts, checksum, and also one block size minus 1, because if the encrypted data is not evenly dividable with the block size, you would have to pad with up to block size - 1.

But all those data in the previous sentence would add a static amount of data to every file, regardless of its size, even if its 1 or 100 GB large.

The data enlargement expressed in % says its a reencoding process like base64 or something similiar.

sebastian nielsen
  • 8,799
  • 1
  • 19
  • 33
  • 1
    This says so too: https://github.com/owncloud/core/issues/10831 . But you've mixed something up: Base64 makes 3 byte => 4 chars, it won't encode 1 byte in 6 bit. And it's 25%, not 20%. – deviantfan Mar 10 '16 at 19:05
  • "Base64 encodes 8 bits into 6 bits, which means the file in question gets about 20 % larger." This doesn't make sense. If b64 encodes 8bits to 6bits then the file would get smaller. – d1str0 Mar 10 '16 at 19:07
  • 1
    @d1str0 6 bits per charachter. Which means the file gets larger as more charachters is required to render the file. – sebastian nielsen Mar 10 '16 at 19:10
  • That doesn't clarify your answer. "Base64 encodes 8 bits into 6 bits" suggests it is reducing the size. – d1str0 Mar 10 '16 at 19:12
  • @d1str0 : Did clarify that now. – sebastian nielsen Mar 10 '16 at 19:14
  • 1
    @d1str0 "8 bits encode to 6 bits" doesn't mean 2 bits get discarded. Every leftover 2-bits get collected to create **additional** 6-bit "characters. So for every 3 8-bit characters, there are 4 6-bit characters generated. The valid 6-bit characters that are allowed to come out of base64 encoding are characters that consist of bit patterns that are also valid 8-bit characters; they're simply the 8-bit characters that have 2 bits that are both zeros. It so happens that each of those characters is one that is "printable" ASCII, and that makes the output file acceptable to SMTP. etc. – user2338816 Mar 13 '16 at 13:21
  • @user2338816 I understand, but as it was written his answer was quite confusing. – d1str0 Mar 13 '16 at 17:29