Preventing (and fixing) parser mismatch vulnerabilities

As I discussed previously, parser mismatches are an underappreciated class of vulnerabilities. In that post I described what they are and showcased a few examples, but today I'd like to talk about what to do about them.

I'll present a few options, with advice on when to use each technique.

Today's example

First, we'll need an example to use for all of the techniques. None of the real examples I had was useful for illustrating all of the techniques, so here's an imaginary (but still realistic) one.

Let's imagine a microservice architecture where an authorization server sits in front of an backend server. The auth server accepts signed requests in the form of a JSON document with an embedded signature and checks whether the signature matches the user and request. If the signature is valid for that request, the auth server sends the JSON on to the backend server so it can act on the request. For instance, if Bob's client needed to load his account data, it would send {"user": "bob", "do": "get-account-data", "sig": "Tm90...IHJlYWw="} where the sig is a signature over the rest of the object, using Bob's key.

And here's what an exploit might look like, with a repeated user key:

{
  "user": "alice",
  "user": "bob",
  "do": "get-account-data",
  "sig": "Tm90...IHJlYWw="
}

The JSON standard explicitly does not indicate how repeated keys should be handled; an implementation may take the first value, take the last value, throw an error, anything! So perhaps the authorization server takes the last value, "bob", and checks the signature using Bob's key. Some internal call like check_sig(user="bob", do="get-account-data", sig="Tm90...IHJlYWw=") returns true, so the JSON is passed along to the backend server. If the backend server has a different JSON parser, it might take the first (or lexicographically first!) value, "alice", and interpret the request as "get Alice's account data". In this way, Bob can illegitimately gain access to Alice's account data.

(The rabbit hole goes much deeper than just repeated keys; you may not even want a signature, and definitely not an embedded signature. See How (not) to sign a JSON object for a much more in-depth treatment. This toy example is also missing some other essential security features. Basically, don't do anything like this.)

So, how will we fix this?

Technique 0: "Don't do that, then"

We have to start here: Since all parser mismatch vulnerabilities involve the use of multiple parsers (that disagree about certain inputs)... why not just use the same parser in both places?

And yes, if you can do this, it should work. But if you're already in the situation of asking "how do I prevent parser mismatch", there's probably some good reason this isn't an option. Maybe the two code locations are in different programming languages, different applications that are not deployed in lockstep, or even just are owned by different people. For instance, one of the parsers might be in your web server and the other in the user's browser. Or multiple peer clients implement a protocol (e.g. bittorrent), and you have only authored one of them.

Also consider change over time. Even if the two locations use the same parser now, one might be changed to use a different one later, and parser mismatch could be reintroduced.

Using the same parser everywhere would be great, don't get me wrong. And I think it's very much worth using parser-generators like ANTLR to turn formal grammars into executable code, since the same grammar can be reused across multiple languages. (I've used this approach in my URL parsing library to good effect: In goes a grammar lifted straight from the RFCs, and out comes parser code that perfectly implements the spec.) Avoiding handwritten parsers, striving to make unambiguous specs, providing formal grammars for data formats—all of these will reduce parser mismatch. But none of these is a guarantee, and often these things aren't under your control anyhow.

Technique 1: Pass the parse

On to the first technique. The essence of parser mismatch is that the parsers disagree on the parse for certain inputs. So one option is to have the first code location do its parsing, then pass the parsed elements to the second code location.

Here's a strawman solution to the JSON example that nevertheless clearly illustrates the principle: The authorization server parses the input, verifies the signature, and then places the parsed pieces into an HTTP request to the backend server, like so:

POST /api/get-account-data

As-User: bob

Well, barring URL and HTTP header injections, there's certainly no opportunity for disagreement, now; parsing of the JSON happens just once, and the extraneous alice field never reaches the backend. But we'd have to dramatically change the format the authorization server speaks to the backend! Why can't we just send the parsed pieces as JSON, like we were doing originally? And my answer is that it simply would have been confusing from a pedagogical standpoint. ;-) But that's not a good architectural reason, so let's stick with sending JSON instead: The authorization server explicitly constructs {"user": "bob", "do": "get-account-data"} and sends that to the backend. The code might look like this:

val request_data = parse_json(request.body);
if check_sig(request_data) {
    send_to_backend(to_json({
        'user': request_data['user'],
        'do': request_data['do'],
    }))
}

That redundancy is still a little awkward, and we'll see a cleaner option in the next section. Before that, though, some quick notes on when this option is appropriate:

  • Always works... if you can use it.
  • Only possible when one code location is passing data to another. Doesn't work if e.g. two peers in a network are parsing the same piece of data. Works great if both locations are in the same application and a parsed data structure can be passed around, but people are usually already doing that for the sake of performance. (Big exception: URLs are often parsed multiple times in the same application.)
  • Requires that you have control over how the data is conveyed from one location to the next.

Technique 2: Reserialize

Notice that the effectiveness of the code sample in the pass-the-parse section relied entirely on the parse_json/to_json pair of calls. The code happened to drop the sig field, since that's irrelevant to the backend server, but including it probably wouldn't cause any harm. If we didn't mind including it, then this code would be functionally equivalent:

val request_data = parse_json(request.body);
if check_sig(request_data) {
    send_to_backend(to_json(request_data))
}

The request data gets round-tripped through a parser and then a serializer. The parser removes ambiguities, and the serializer re-encodes the data back into the original format. The assumption here is that even if the parser accepts ambiguous inputs, the serializer is extremely unlikely to produce ambiguous outputs. I refer to this technique as reserialization.

It looks a lot cleaner than pass-the-parse, and the resulting code is completely general. The vulnerability is prevented by code that doesn't know anything about the specifics of the original attack, or even which fields or ambiguities would have enabled the vulnerability.

There's some risk that without the explicit construction that occurs in pass-the-parse, some extraneous information could be sent to the backend and be acted on inappropriately. This can itself result in vulnerabilities, but it's no worse than the original situation and at least the parser mismatch is prevented.

A few points on how this differs from pass-the-parse, since they can look very similar:

  • Reserialization always uses the same input format for both the authorization and backend servers, while pass-the-parse can use different ones.
  • Pass-the-parse always uses explicit construction, while reserialization does not require it (though it can, optionally).

Sometimes it's not clear which is in use: Code that performed a filtering step on the parsed request data before passing it to the backend could be described as implementing either technique.

So, when does this make sense to use? Here's a look at the benefits and constraints of this technique:

  • No need to change how the first location speaks to the second location.
  • Only requires changes to the first location.
  • Patch may not require any knowledge of business logic.
  • Allows (and requires) same input format for both code locations.
  • Requires having a full parser and a complete data model. This almost certainly wouldn't work for the HTTP header splitting or HTTP request smuggling examples from my last post. What tool would you reach for that could reserialize an entire HTTP request, down to fixing the ambiguities in those examples? Even HTML would be a little iffy; in the industry, very few tools model HTML as data, preferring to pass it around as text (or templates, at best.) But for JSON and URLs? Works great.

Technique 3: Be strict

So far we've looked at two very similar techniques that both rely on parsing (and accepting) ambiguous inputs. But there's another clear option: Reject ambiguous inputs! Use a strict parser and just reject anything that doesn't match the grammar. This will prevent disagreements on malformed inputs by simply not letting those inputs through.

"Strict" can mean different things. If the format in question has a well-defined grammar, such as RFC 3986 for URLs, then a parser generated from the grammar (e.g. using ANTLR) will reject invalid inputs. In the example with URLs and hostnames, the server could reject a backslash-containing URL outright, so the browser's lax parser never gets a chance to see it.

In the JSON example, the authorization server could simply turn on strict-mode in its JSON parser and reject the malicious input. This would be a case of being stricter than the spec, and is probably the right option.

No one can reasonably complain about their API call with duplicate JSON keys being rejected, but people might complain about not being allowed to use URLs with illegal characters. For example, Wikipedia articles often have parentheses in their URL: https://en.wikipedia.org/wiki/Mars_(disambiguation). Copying this from Firefox's address bar, I get bare parentheses in my clipboard, which is interesting because Firefox would never send it that way to the server; it would instead send /wiki/Mars_%28disambiguation%29, since parentheses are not legal in URLs. However, Firefox keeps backslashes encoded (except for display), so if there were an article called Back\Slash the copied URL would contain Back%5CSlash. So, a reasonable compromise might be to preprocess the URL to replace parentheses and a few other relatively safe characters with their percent-encodings and only then run the strict parser. (A safer but more involved alternative would be to make a lax variation on the parser.) There still might be some room for parser mismatch here—some piece of software might very reasonably stop reading at the first illegal character, and come up with https://en.wikipedia.org/wiki/Mars_—but it would reduce the area of exposure.

In the other direction, you may have the freedom to be much more strict than the spec, and only accept a narrow subset of the spec. If you accept URLs, do you need to accept all protocols? Do you need to accept URLs with a userinfo component? What about a host component with non-ASCII characters? Do you need to allow a hostname to end in a period? It's always good to consider edge cases that represent possible legitimate user behavior, but they don't always need to be supported. Sometimes it's OK to draw a line and say that the caller needs to take on some responsibility for clean inputs.

Besides the issue of how strict you can be, sometimes this technique just isn't available at an architectural level. If you have a guard/actor pair as in today's example (and as described in the previous post) then having either the guard or the actor do the rejection is likely to be acceptable and effective. But sometimes the two locations are parsing "in parallel", not in sequence with one another, and having one reject and the other accept the input may count as a failure. This might be the case for nodes in a cryptocurrency network.

Summing up the characteristics of this technique:

  • May not always have the freedom to just reject questionable inputs, although "strict" isn't just one thing.
  • If code locations are in a sequence, rejecting inputs in either location may work. Unlike the other techniques, this one can sometimes work even when you can't control the first location in a sequence.
  • Not a guaranteed fix. You can always imagine the second location using a parser written so terribly that some input will be ambiguous or parsed incorrectly.

Conclusions

Depending on how you think about it, there are anywhere from two to four options presented here. Despite the similarity of pass-the-parse and reserialization, they're actually on opposite sides of a divide:

  • Technique 0, using the same parser in both places, seemed like a trivial and often-useless suggestion. But pass-the-parse is actually a sneaky way of doing just that: The data is only parsed once, for a given format, and so only one code location needs a parser. Now the "same" parser is used in all (one) locations it is needed.
  • Reserialization and strict parsing, on the other hand, are resigned to the idea that there will be multiple parsers. They instead try to narrow the space of possible inputs that an attacker can make use of. One technique alters ambiguous inputs to be in a safer subset of the spec, while the other simply rejects them.

Beyond such questions of ontology, there are a variety of practical concerns involved in choosing which technique to use. This is also not an exhaustive list. (I'd love to provide a flowchart or table to help with decision-making, but it seems impractical. Let me know if you can come up with one!) More importantly, though, I hope that this has given you a better sense of how to grapple with parser mismatches at a structural level and that you'll be better able to recognize them and work around them in whatever way is most appropriate for your code and architecture.


No comments yet. Feed icon

Self-service commenting is not yet reimplemented after the Wordpress migration, sorry! For now, you can respond by email; please indicate whether you're OK with having your response posted publicly (and if so, under what name).