-
Notifications
You must be signed in to change notification settings - Fork 18k
image/gif: gif decoding fails with too much image data
#16146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @nigeltao This happens with every Go release since 1.1, so it is not a new problem. With 1.0.3 I get |
with too much data
with too much image data
Any updates on this, we process thousands of images daily, and run into this everyday, albeit on a very small number of them. Thanks. |
No update, but it would be helpful if you could attach a small (in terms of file size) GIF that exhibits this problem. The original post linked to something weighing 1 MB but that's still kind of unwieldy to debug. The smallest 'bad' GIF you've got would be great. |
Most of our images come from giphy.com where, a typical GIF is > 1MB, so I havent been able to find one thats smaller - will post if I find one. There is one that's quite wonky - its smaller but fails with a different message - |
@nigeltao here is one that's under 500kb and less than 15 frames: https://j.gifs.com/Z6KoKJ.gif It seems much more prominent on larger gifs though. |
I've debugged this. I have an explanation and a recommendation. The problem gifs in question (tx.gif from @odeke-em and Z6KoKJ.gif from @montanaflynn) indeed have extra/too much data. In both cases, the extra data is a data sub-block in the image data with size 1 containing 0x00. In both cases, there is an LZW End of Information Code, then this extra sub-block, and then the Block Terminator. In tx.gif this happens in the second image in the gif. In Z6KoKJ.gif it happens in the 11th image in the gif. In well formed gifs, the last sub-block would not be present. It does not need to be there since we completed decompression. Currently the decoder fully reads/decompresses the image's data, and then uses the block reader to check whether there is more. It sees there is more data (in this case, there is a block of 1 byte found) and raises this error. If we don't decompress the data and simply look at the image data as a series of data sub-blocks, it is not extra. That is, the data sub-blocks are well formed. It is extra only when we take into account reaching the LZW End of Information code. The question is what we should be doing. Other decoders (GIFLIB, browsers) apparently accept this. As far as the specification goes, it does not say one way or the other. In Appendix F it says "the code that terminates the LZW compressed data must appear before Block Terminator". It does not state what to do if there is data after the LZW data ends, nor that the code must be the last piece of data. Section 22 states that "[t]he sequence of indices is encoded using the LZW Algorithm [...]" so I think this data should not be treated as an index. But it is not useful and presumably is technically invalid. I looked at what the most recent version of GIFLIB (5.1.4) does. After it fully decompresses an image, it silently reads and discards any extra data such as this. It stops when it reads the Block Terminator. You can see this in its As for where these problem gifs are coming from, given the transformation using Note I didn't analyze @vsiv's extra_large_cropped_wino.gif. It appears corrupted to me in Firefox and is showing a different error, so possibly it is either truly corrupted or a different issue. I wrote a patch with a possible solution. I realize this will need to be submitted differently, but I thought I would include it and get feedback here before doing so. https://github.com/horgh/go/commit/ab144b8f336c0b99eec9b416a7577a305190dbdc Thanks for any feedback! |
Hi @horgh, nice debugging work. I suggest you submit a CL. We cannot review code without a signed CLA, so a Gerrit CL is the only format for discussing patches. |
@horgh - thanks for the debugging! Another option is to work out an external gif decoder lib here on github for concerning users to use if needed and leave the stricter lib as part of the golang distribution. thanks. |
CL https://golang.org/cl/37258 mentions this issue. |
Thanks all! I've uploaded the possible patch to Gerrit now too: https://go-review.googlesource.com/37258 (oh I see @gopherbot beat me!) |
Could it be limited somehow, how much stray data is allowed? Otherwise it feels like a new attack vector. What do other gif libraries do to limit it? |
Good point @nightlyone. Other gif libraries:
That is not to say we need to use that same approach here. In the two samples mentioned in this issue, only a single byte is extra. We could be conservative and allow as little as 1 byte or some arbitrary small limit. |
@horgh great debugging indeed. I'd like to limit the slack to at most one extra byte. We can continue the conversation on the code review. |
Change https://golang.org/cl/68350 mentions this issue: |
golang.org/cl/37258 was committed to fix issue #16146. This patch seemed intent to allow at most one dangling byte. But, as implemented, many more bytes may actually slip through. This is because the LZW layer creates a bufio.Reader which will itself consume data beyond the end of the LZW stream, and this isn't accounted for anywhere. This change means to avoid the allocation of the bufio.Reader by making blockReader implement io.ByteReader. Further, it adds a close() method which detects extra data in the block sequence. To avoid any regressions with poorly encoded GIFs which may have worked accidentally, there are no restrictions on how many extra bytes may exist in the final full sub-block that contained LZW data. If the end of the LZW stream happened to align with the end of a sub-block, at most one more sub-block with a length of 1 byte may exist before the block terminator. This change aims to be at least as performant as the prior implementation. But the primary gain is avoiding the allocation of a bufio.Reader per frame: name old time/op new time/op delta Decode-8 276µs ± 0% 275µs ± 2% ~ (p=0.690 n=5+5) name old speed new speed delta Decode-8 55.9MB/s ± 0% 56.3MB/s ± 2% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta Decode-8 49.2kB ± 0% 44.8kB ± 0% -9.10% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Decode-8 269 ± 0% 267 ± 0% -0.74% (p=0.008 n=5+5) Change-Id: Iec4f9b895561ad52266313fbc73ec82c070c3349 Reviewed-on: https://go-review.googlesource.com/68350 Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Nigel Tao <nigeltao@golang.org>
Please answer these questions before submitting your issue. Thanks!
go version
)?go version devel +1f44643 Wed Jun 22 00:12:55 2016 +0000 darwin/amd64
go env
)?GOARCH="amd64"
GOBIN="/Users/emmanuelodeke/go/bin"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/emmanuelodeke/go"
GORACE=""
GOROOT="/Users/emmanuelodeke/go/src/go.googlesource.com/go"
GOTOOLDIR="/Users/emmanuelodeke/go/src/go.googlesource.com/go/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kn/s30g1srd4h50bh6sd322tppm0000gn/T/go-build038095302=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
Run program at https://play.golang.org/p/NqqvAkiIao with source of http://ualberta.ca/~odeke/tx.gif
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
Successful decoding of a gif with no errors
gif: too much image data
/cc @montanaflynn who reported this issue first.
Code inlined
The text was updated successfully, but these errors were encountered: