102

This was, before someone helpfully fixed it after seeing this question, a relatively unassuming and tiny photo of a ̶f̶i̶s̶h̶ nudibranch, with 283,620 pixels. It has some metadata: text Exif tags as well as 8.6kB of Color Profile information, and a 5,557-byte Thumbnail as well as a 648,534-byte Preview Image (that I cannot read) and some other random things (like Face Detect Area) that take up little space.

Using

exiftool -a -b -W %d%f_%t%-c.%s -u -g1 -ee -api RequestAll=3 temp.jpg

extracts a total of <650kiB of stuff.

Are there any strategies or tools that one might use to discover what is going on, and whether something has been hidden in the file?

In case it makes things easier, the same or very similar inclusions appear to affect multiple files by the same Flickr user: 2, 3, 4, 5

David
  • 782
  • 2
  • 5
  • 9
  • 22
    The term you are looking for is "steganography". – schroeder May 18 '20 at 14:22
  • 10
    @schroeder I would have thought a steganographic method would not attempt to put its payload in a separate blob... the methods I've seen all seem to manipulate pixel colors, which would not drastically increase the image size, but I haven't looked at it for over 10 years. This seems more akin to appending a GIF to a ZIP making the file conformant to both specs. Is that really steganography? – David May 18 '20 at 14:38
  • 2
    Steganography is any means of hiding data. JPEG does not store "pixels," but different quanta could be fudged a small amount from some reference image in the same way. Appending data is also steganography. – bonsaiviking May 18 '20 at 15:24
  • 10
    For a JPEG, the first thing I'd try is unzipping it. JPEG doesn't care what comes after the end of the image data, while Zip stores its catalog at the end of the file and works backwards from there. If you concatenate a JPEG and a Zip together, everything works fine. – Mark May 18 '20 at 23:51
  • 4
    That animal is not a fish, it is a species of nudibranch which is a type of marine mollusc. – Brady Gilg May 19 '20 at 21:48

4 Answers4

155

Short answer: It's an artifact of Nikon Picture Project

I had difficulty finding "Nikon Picture Project" but finally found a 1.5 version to try. The last version produced was 1.7.6 .

It turns out that "Nikon Picture Project" does indeed implement non-destructive editing with undo and versioning capabilities. Unlike every other photo editing software I've ever seen, it does this by directly altering the JPG file structure and embedding edit controls and versions directly in the JPG. There is an Export JPEG function in the software to flatten and remove history but it looks like the native munged JPGs were posted instead of using the export.

I loaded up your first reference image (resized here)

Reference.

Sure enough, "Nikon Picture Project" showed it as an edit and crop of a much larger picture (resized here)

Original.

Checking the before and after file structures verifies the weird artifacts.

Thanks for the puzzle!

user10216038
  • 7,933
  • 2
  • 16
  • 20
  • 3
    Wow, nice! That seems the same picture I was able to isolate, but wasn't recognized as a fully functioning JPEG picture. Any insight on why this might differ from standard JPEG? – Esa Jokinen May 19 '20 at 19:58
  • 4
    @EsaJokinen Wild guess on my part, but given they're storing multiple similar images in one file, I wonder if there's some custom compression going on - either storing partial images to be reassembled (like a video codec with key frames and deltas) or just detecting similar sections of data across the multiple versions. – IMSoP May 19 '20 at 20:39
  • Adobe Fireworks does a similar thing for PNG images. – gparyani May 19 '20 at 22:44
  • The actual JPEG format supports a lot of quirky modes, and I can very much see this being possible within the technical standard. The simple images we usually share are actually the much smaller JFIF subset, with most software not implementing any more than that. – SE - stop firing the good guys May 21 '20 at 14:05
  • Would be interesting to poke around [this user's other contributions](https://commons.wikimedia.org/wiki/Special:Contributions/Josuevg) to see if there are any other similarly "unflattened" images. – Captain Man May 21 '20 at 14:45
  • This was the best answer at the time, but I think @nneonneo's answer below details a more general method of investigating this in case the culprit software was not available. Thanks for a wonderful answer, though! – David May 26 '20 at 15:54
  • @David - No problem! I just didn't see a reason to reinvent the wheel when there was a perfectly good truck available with the keys in the ignition. – user10216038 May 26 '20 at 20:56
45

This was less interesting than it seemed at first. The user might just have a broken camera, broken memory card, or malfunctioning photo editing software that fails to save the full resolution image, but is able to save various size of working thumbnails, including the 435 × 652 "original" picture.

The filesize of your example picture is explained by a 4032 × 3024 pixels and 5,47 MB JPEG stream that is broken and, scaled down, looks like this:

Broken image scaled down

It begins here with the FF D8 SOI (Start Of Image):

Start Of Image from HxD

And ends here with the FF D9 EOI (End Of Image):

End Of Image from HxD

There is also another differently broken 1920 × 1440 thumbnail of the same image and a thumbnail of this broken image, but if there's something interesting hidden in the gray, it's between 006A4F and 5812A2. However, I wouldn't bet on it.

Esa Jokinen
  • 16,725
  • 5
  • 51
  • 56
  • 18
    There is a lot more weirdness here. Initially the exif indicates and Olympus E-Pl1. It goes on to describe the Olympus lens and settings. Then it suddenly has a Nikon profile and a Primary Platform of Apple Computer. In the hex file structure starting at 6A2F there's a block labled "Nikon Image Info". Following this there are an additional 105 "Nikon Image Info" blocks. This doesn't seem like a possible failure mechanism. It's interesting but I have no good answers. – user10216038 May 18 '20 at 23:16
  • 19
    It looks like the images may have been edited on an Apple using "Nikon Picture Project". I'm not familiar with Picture Project, but I wonder if it saves multiple **undo's** in the file? – user10216038 May 18 '20 at 23:54
  • It could be some problem with the camera or with the software, but as there were multiple similarly broken images from the same user, something is failing quite systematically. The metadata may be related or not, and doesn't alone explain the filesize. I've clarified this in an edit. The *Nikon Picture Project* was a nice catch, though! – Esa Jokinen May 19 '20 at 06:42
  • 1
    I did try strings and saw a large number of "Nikon Image Info" strings in the file... I assumed this was Olympus licensing Nikon firmware... however I had no evidence of that, and trying strings on a photo from https://www.dpreview.com/reviews/olympusepl1/9 shows no Nikon strings at all, so if anything, editing software seems likely. – David May 19 '20 at 07:29
30

As other commenters have mentioned, the file contains data from Nikon Picture Project. What if you couldn't run that software, but you still wanted to know what was hidden inside?

Nikon's Picture Project format seems to be entirely undocumented, which is no surprise given that it's a custom format for a particular app and was never designed for interchange. That said, the format seems to be extremely simple and can be discerned by examining the APP10 chunks (FF EA tags) embedded in the binary. I looked at the chunks using Hachoir (a general-purpose file parsing tool) using the following code:

from hachoir.parser.image.jpeg import JpegFile
from hachoir.stream import FileInputStream
import struct

p = JpegFile(FileInputStream('20200519221417!Goniobranchus_aureomarginatus_2.jpg'))
for i in p.array('chunk'):
    print(i['data'].value[:100].hex())

Just lining up all the chunks like this, one immediately sees patterns:

4e696b6f6e20496d61676520496e666f000200000001f00000618396ffd8ffdb0084000101010101010101010101010101010201010101010202020102020202020202020202030303020303030202030403030304040404020304040404040304040401
4e696b6f6e20496d61676520496e666f000200000002f000bdcc1b6d3b9c535cb2bf520b2bff00340964d84ab6dc03cb7bf3c8ce6bd5bc1fae18562188d5e194bb9597040e36820f5e99e4f7fad7979b41bfebe67a5867785cf6e1e30c5b6e92621d8ef6
4e696b6f6e20496d61676520496e666f000200000003f000e0753fe7debf986355e1d34cfea696b17639dfb088ae1434600070a0fe7c57456f6931450a62507e47431072c3af04e3079af2b1152cf9bab65538dd5999b77a32f9991103d4739ce49e7eb5
4e696b6f6e20496d61676520496e666f000200000004f0000948036296da18e4e78e2bd98d292a577bbfebf1382b452bdcd28ef448cd8904a91a95f2cae368ee73d4fad4134b0ac68e082cd2336d033839ea7fbd9cf35c9384bda5dbd422a37b1fffd3fc
4e696b6f6e20496d61676520496e666f000200000005f000aa47dbc746ce9c2569c612aab7b9ffd3fcc2d67c0bf1b3e2d7c42bff00106972b695e0fd1ef46a1e25d7a4dc16360f84b7deea57730380b9dd5f6b7876730e8b6d664bce2c20581e590e7715
4e696b6f6e20496d61676520496e666f000200000006f000a0aa5a99ffd1fc2e5bb3ba2937c46471fc07210e73f89ae82c6e0163299631b8e58e793827afeb5fcc74aa3a57b1fd758cf7a3a1e8fa19230102b051921864306e7b9f7af54b1558e18dc310
4e696b6f6e20496d61676520496e666f000200000007f0009ddd707f957e974e7f5887b56f563a10743961d57e274be1ed7fed266b4b53219236659f703b78273863c139eb8ef5da695aba5a6b3610ddc2f3594ab219f6b0c162328727d0f6ef5e0d6c23
4e696b6f6e20496d61676520496e666f000200000008f000375ab993f790cd188651874393939cee0dd7a6411f4ae7478836811db4eac9972c4e41f94fcc416e5b9e01afa4861a528b99e34a6a261ea1e2268edc012399d0923692d9dc4920fe679ae12f
4e696b6f6e20496d61676520496e666f000200000009f000cd6fc7e6ee6de32bf75492727be7f0e6bf8be10536dbef73fb25c6317643f50958d9b9190318720124d73ba5c71c97af15e42df67b88c46252721893cf07ebfcfd2b2745467ccf7b9e950925
4e696b6f6e20496d61676520496e666f00020000000af000b0f9659df1b0f5c903a9c73f98aee03c5344b0c368bf31cf981f25f3fdecd44b1156524e5b1e156a692777a9c77882e65b547b60db6220b9dbd171ea7b579aa91a8de189519b24072b260e72
4e696b6f6e20496d61676520496e666f00020000000bf000d4fceeb5d124f262789d622cfb08924cf24e7a1e6bad8b462234ce245fe251b8063cf39afe48c5d48ceab6fccfe9ba5074e96a6c59db3c0ca8f1b850a18b2938662581e7f0fd6ba9b5b958fe
...
4e696b6f6e20496d61676520496e666f000200000068f0001acec0e2a791b919b9d91fffd6c432c611ce79c71594cf1cb202d8241af9849ec7b37b97ed648e59d60de067a8cd67f8816350d120048ef4a707a32a9cd5ec729a4de8d1b53576190c7a1af4
4e696b6f6e20496d61676520496e666f00020000006903960f515ce93b9d6e57d0cfbb94953c74eb58372df31e7f0ae983b239ea22a32a95e4d4ba7a057c139ad5dec713dcffd7f8f6f13692cc7807818a8609c4732b7615e7ad51dcb73a55bd82e60f9c
4e696b6f6e20496d61676520496e666f00030000000100000bbb0bbb40a9867a1be9d211a90a00aa00b1c1b70200a90b00000032a476a217d411a90a00aa00b1c1b70100050000000161512be4df5dd211a90a00aa00b1c1b7020005000000000132a476

We can see that there's a fixed header (4e696b6f6e20496d61676520496e666f: Nikon Image Info in ASCII), followed by either 0002 or 0003, then what seems to be an incrementing number (starting at 00000001 and ending at 00000069), and finally some kind of length field (f000 for most chunks except the last two, which have 0396 and 0000). After that it looks like data.

So, I guessed the header was something like this:

uint16_t chunktype;
uint16_t unknown; /* always zero */
uint16_t serial;
uint16_t datasize;
uint8_t payload[];

and then dumped out all the payload bits to a file:

out = open('dump.bin', 'wb')
for i in p.array('chunk'):
    data = i['data'].value
    magic, ctype, unknown, serial, size = struct.unpack('>16sHHHH', data[:24])
    print(magic, ctype, serial, size, len(data[24:]))
    chunk = data[24:24+size]
    out.write(chunk)

The resulting file starts with four bytes 00 61 83 96 (0x618396) which matches the total length of the data (0x618396 = 6390678 bytes). Next is FF D8 FF DB, the start of a JPEG, so stripping the length field off reveals a 4032x3024 JPEG. This is presumably the original photo from the camera. Here's the photo, resized to fit within the upload limit:

first image - 4032x3024

A trip to Hachoir shows that the JPEG is quite normal in structure, but it's been stripped of all metadata. Curiously, Hachoir also shows that it ends after 5742120 bytes. Dumping out the data after the end reveals a second JPEG, 1920x1440 in size:

second image - 1920x1440

Sadly it's not some exciting spy stuff, it's just another version of the original picture but somewhat downscaled. It's still much, much larger than the actual cropped photo data, though! This time there's nothing at the end, so we've extracted out all the images from the file.

All that remains is the last chunk of data, which is 3008 bytes long. This chunk appears to contain the actual picture project info, presumably including a history of edits, detailed edit information, etc. The format is a lot more irregular, although I recognize quite a few double-precision floating point numbers and some things that look like magic numbers (65 D4 11 D1 91 94 44 45 53 54). With a little more work it should be possible to reverse engineer these chunks too - but there does not appear to be anything interesting hidden here steganographically :)

nneonneo
  • 505
  • 1
  • 4
  • 8
17

It's not corrupt, it's just filled with APP10 segments, containing some sort of application specific data. Nikon-specific probably, because there are Nikon-references in the APP1/EXIF segment at the start. And after about 6 MB of APP10 segments, there's 103,001 bytes of actual JPEG image data. But all the segment markers are in the right place, meaning they show up after the payload lengths, so it appears to be a valid image with 6 MB of Nikon-specific data:

Byte 0x00000000 (0): marker 0xD8 found: SOI (Start Of Image)

Byte 0x00000002 (2): marker 0xE1 found: APP1 (EXIF data)
        Payload length: 18523 bytes

Byte 0x00004861 (18529): marker 0xE2 found: APP2 (ICC profile)
        Payload length: 8650 bytes

Byte 0x00006A2F (27183): marker 0xEA found: APP10 (Application marker 10)
        Payload length: 61468 bytes

Byte 0x00015A4F (88655): marker 0xEA found: APP10 (Application marker 10)
        Payload length: 61464 bytes

Byte 0x00024A6B (150123): marker 0xEA found: APP10 (Application marker 10)
        Payload length: 61464 bytes

(... this goes on and on, 6 MB of APP10 segments...)

Byte 0x00610577 (6358391): marker 0xEA found: APP10 (Application marker 10)
        Payload length: 61464 bytes

Byte 0x0061F593 (6419859): marker 0xEA found: APP10 (Application marker 10)
        Payload length: 942 bytes

Byte 0x0061F945 (6420805): marker 0xEA found: APP10 (Application marker 10)
        Payload length: 3032 bytes

Byte 0x00620521 (6423841): marker 0xDB found: DQT (Define Quantization Table)
        Payload length: 130 bytes

Byte 0x006205A7 (6423975): marker 0xC4 found: DHT (Define Huffman Table)
        Payload length: 168 bytes

Byte 0x00620653 (6424147): marker 0xC0 found: SOF0 (Start Of Frame (Baseline DCT))
        Payload length: 15 bytes

Byte 0x00620666 (6424166): marker 0xDA found: SOS (Start Of Scan)
        Reading image data... 103001 bytes read.

Byte 0x006398C1 (6527169): marker 0xD9 found: EOI (End Of Image)
Gerben
  • 171
  • 2
  • 2
    While this is interesting, similar information had already been posted [as a comment yesterday](https://security.stackexchange.com/questions/231831/why-is-this-435-%c3%97-652-pixel-jpeg-over-6-mb#comment473662_231845), and the meaning of these blocks discovered [in the answer posted 3 hours ago](https://security.stackexchange.com/a/231891/51961). – IMSoP May 19 '20 at 20:34
  • 4
    Well, that commenter said: "There is a lot more weirdness here." I'm pointing out there's no weirdness at all; the file structure is crystal clear. While it's great that he found out the file belongs to the 'Nikon Picture Project', it still did not explain why the file is 6 MB. This is not the [Photography](https://photo.stackexchange.com/) section of SE, this is the Information Security/Forensics section. People want to dig deeper. I'm still not showing why it's 6 MB, but at least I'm showing an overview of the structure of the file, and that the EXIF file is technically valid. – Gerben May 19 '20 at 20:48
  • Yes, but between that comment and your answer, another answer was posted, which is currently accepted, with a score of 32, and explains exactly what the extra data contains. Your answer feels like a clue that could have come between those two, but arrived 3 hours late. As I say, it's kind of interesting anyway, although it would be more so if you explained what tools you'd used to get this information. – IMSoP May 19 '20 at 20:50
  • Yeah maybe I shouldn't spend more time on this question then. The tool is a little JFIF/EXIF analyzer I started a few years ago but never finished. I could put it up on Github but it's only half a tool. It doesn't go much deeper than dumping a segment overview. – Gerben May 19 '20 at 20:58
  • 3
    This answer explains why the file is so big, and how "extra information" has been stored in the file in a standards compliant manner. The accepted answer is the far less interesting what did it and why. – JCRM May 20 '20 at 01:38
  • 2
    @JCRM A lot of file formats are extensible containers, and I knew JPEGs were one because they can contain multiple types of metadata, embedded thumbnails, etc; so the "how" didn't surprise me particularly. A bit more technical detail in the accepted answer would be nice though. – IMSoP May 20 '20 at 07:01