Please disable your adblock and script blockers to view this page

Parse, Don't Validate (2019)


JSON
Haskell
check?Second
NonEmpty
Data
GHC
nonEmpty
HTTP
right?Unfortunately
The Seven Turrets of Babel: A Taxonomy of LangSec Errors
checkNoDuplicateKeys
Bool


upon.3
Matt Parson’s
Matt Noonan’s
’d

No matching tags

No matching tags

No matching tags


Data
Map
Danielsson

No matching tags

Positivity     40.00%   
   Negativity   60.00%
The New York Times
SOURCE: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
Write a review: Hacker News
Summary

I’m going to explain precisely what I mean in gory detail—but first, we need to practice a little wishful thinking.One of the wonderful things about static type systems is that they can make it possible, and sometimes even easy, to answer questions like “is it possible to write this function?” For an extreme example, consider the following Haskell type signature:Is it possible to implement foo? Trivially, the answer is no, as Void is a type that contains no values, so it’s impossible for any function to produce a value of type Void.1 That example is pretty boring, but the question gets much more interesting if we choose a more realistic example:This function returns the first element from a list. In Haskell, we express this possibility using the Maybe type:This buys us the freedom we need to implement head—it allows us to return Nothing when we discover we can’t produce a value of type a after all:Problem solved, right? Instead of weakening the return type, we can strengthen the argument type, eliminating the possibility of head ever being called on an empty list in the first place.To do this, we need a type that represents non-empty lists. It constructs a NonEmpty a from a [a] using the nonEmpty function from Data.List.NonEmpty, which has the following type:The Maybe is still there, but this time, we handle the Nothing case very early in our program: right in the same place we were already doing the input validation. Put another way, you can think of a value of type NonEmpty a as being like a value of type [a], plus a proof that the list is non-empty.By strengthening the type of the argument to head instead of weakening the type of its result, we’ve completely eliminated all the problems from the previous section:The code has no redundant checks, so there can’t be any performance overhead.Furthermore, if getConfigurationDirectories changes to stop checking that the list is non-empty, its return type must change, too. Often, the input to a parser is text, but this is by no means a requirement, and parseNonEmpty is a perfectly cromulent parser: it parses lists into non-empty lists, signaling failure by terminating the program with an error message.Under this flexible definition, parsers are an incredibly powerful tool: they allow discharging checks on input up-front, right on the boundary between a program and the outside world, and once those checks have been performed, they never need to be checked again! Haskellers are well-aware of this power, and they use many different types of parsers on a regular basis:The aeson library provides a Parser type that can be used to parse JSON data into domain types.Likewise, optparse-applicative provides a set of parser combinators for parsing command-line arguments.Database libraries like persistent and postgresql-simple have a mechanism for parsing values held in an external data store.The servant ecosystem is built around parsing Haskell datatypes from path components, query parameters, HTTP headers, and more.The common theme between all these libraries is that they sit on the boundary between your Haskell application and the external world. However, with a static type system, the problem becomes marvelously simple, as demonstrated by the NonEmpty example above: if the parsing and processing logic go out of sync, the program will fail to even compile.Hopefully, by this point, you are at least somewhat sold on the idea that parsing is preferable to validation, but you may have lingering doubts. A better solution is to choose a data structure that disallows duplicate keys by construction, such as a Map. Adjust your function’s type signature to accept a Map instead of a list of tuples, and implement it as you normally would.Once you’ve done that, the call site of your new function will likely fail to typecheck, since it is still being passed a list of tuples. Don’t be afraid to refactor code to use the right data representation—the type system will ensure you’ve covered all the places that need changing, and it will likely save you a headache later.Treat functions that return m () with deep suspicion.

As said here by Alexis King