State modeling with algebraic data types
By Nathaniel Cox
I recently had the task of implementing a one-time-password input view. We’ve all seen this before; You want to verify your identity, so some website or app sends you a six-digit code via text message or email. You then go to your inbox, copy the code, paste it into some UI, maybe press a button to confirm, and then go about your day, confident and happy that your account could never possibly be hacked.
It’s an experience we’ve all done countless times before, and give very little thought to. Such a simple little piece of UI can actually have quite a lot going on though. There were a number of interesting challenges in this small view, but I’m going to focus on just the logic for resending the one-time-code, and how best to manage the state.
The requirements I was given specified that the user should be able to request a new code if for whatever reason they didn’t get the first one (or maybe they just didn’t like it). Ok, so we’ve got a little label that says “Don’t see a text?” and a button next to it that says “Send a new code.” Seems simple enough.
But what should happen after the user presses the button? Can they just keep pressing it and generate hundreds of codes? What happens if there is a network error and the code never gets sent? Should we show any kind of error message? Like so many things in software development, the questions often answer themselves–the tricky thing is knowing when to ask which questions!
So after thinking through some of these questions with the design team, we update the requirements to make the user wait 30 seconds after requesting a new code before they can request another. While they wait, the label should read “Code sent.” and the button should be disabled and say “Send a new code in 30s.”
The button label should count down to 0, and then be re-enabled. If there is an error generating the code, the label should say “Failed to send code.” and the button should be enabled (of course we want to let them try again if the request failed).
I want you now to stop and think about how you would represent the state required to implement this logic.
If you primarily work in an imperative language, you may have come up with something like this:
When the user requests a new code, we set
canRequestCode to false, set
remainingSeconds to 30, and probably set up a timer to decrement
remainingSeconds every second. When
remainingSeconds reaches zero we set
canRequestCode to true.
If we get an error from the server we set
hasError to true. Simple, right? Well, no. How many bugs did we just introduce? When do we set
hasError to false? Oops, forgot about that. We should probably have done that when the user requested the code. If we get an error, did we set
canRequestCode back to true?
Oops. Probably should have done that too. What happens to
remainingSeconds if we get an error? Should we set it to 0? Does it matter? Well, that depends entirely on how much our view knows about our state; We might be fine or we might display something entirely unexpected.
Ok, this is a really simple (almost trivial) example. We can fix all these bugs and even write some unit tests (you are writing unit tests, right?) and still be done by lunch. The problem is that the model of our state is capable of representing invalid states.
For example we pretty clearly stated that whenever
remainingSeconds is non-zero we should not be able to request a new code, and yet nothing prevents us from having a non-zero value for
remainingSeconds at the same time that
canRequestCode is true. Are there other examples? Let’s try to calculate the total number of possible states that our model allows, and then try to figure out how many of these possible states are actually valid.
If we have three properties, the total number of states that those three properties can be in is the product of the number of possible states for each property.
canRequestCode are both booleans, which of course each have two possible states: true and false.
So the total number of states that these two properties together can be in is (2 * 2) 4. We can easily verify this by enumerating (true, true), (true, false), (false, true), (false, false). Now if we add in
remainingSeconds things get a bit more complicated. Technically the number of possible states depends a lot on our implementation, but if we are using 64-bit integers (which seem pretty common these days) we have 18,446,744,073,709,551,616 possible states. Give or take.
Ok, realistically we know that it’s actually going to be a number between 0 and 30. But even more concretely what we really care about here is whether or not
remainingSeconds is zero, so we can say that there are two meaningful states for the purposes of this discussion. So if we add
canRequestCode we now double the total number of states for our model, resulting in eight.
- has error, can request code, zero
- has error, can request code, non zero <- invalid
- has error, cannot request code, zero <- invalid
- has error, cannot request code, non zero <- invalid
- no error, can request code, zero
- no error, can request code, non zero <- invalid
- no error, cannot request code, zero <- invalid
- no error, cannot request code, non zero
As you can see, there are eight possible states that our code can be in, and only three of those states have a valid meaning in our program. This is not good. In practice this means that our code has to do extra work to handle these states that should never happen, and there are all sorts of opportunities for bugs and undefined behavior to creep in.
For this simple example it isn’t the end of the world, but every additional piece of state will multiply our total number of cases, which will very quickly get out of hand. If we doubled the number of booleans that we want to simultaneously track to six in total we’d suddenly have 64 possible states.
There must be a better way, right?
Let’s take a brief step back from this practical problem. In type theory, we would refer to our initial effort to represent our model as a product type. It’s pretty clear why it has this name, as we just demonstrated that the total number of representable states of this type (also known as the type’s domain) is the product of the domains of its members (2 * 2 * 2 = 8).
If there are product types, could there also be sum types? Of course! With a sum type, the number of inhabitants of the type is the sum of the inhabitants of its members, rather than the product. We can think of a product type as representing a logical AND (we have this thing AND we have this other thing too), whereas a sum type represents logical OR (we either have this thing OR that thing, but not both).
Just as logical conjuction (AND) and disjunction (OR) can be used to combine logical propositions into more complicated statements, product and sum types can be composed together to form compound types, which are often referred as algebraic data types.
The similarity to logic is no mere coincidence; The Curry-Howard Correspondence demonstrates that there is a 1-to-1 mapping between computer programs and logical proofs. For example, a function from type A to type B is isomorphic to the logical implication “If A then B.” It’s fascinating if you are into that sort of thing, but let’s try not to get too distracted by theory. Sum types aren’t just really cool, they are also incredibly helpful.
Swift is one of a growing number of mainstream languages that has full support for both product types (
class) and sum types (
enum). Until recently, only functional programming languages like Haskell, OCaml and F# were able to represent algebraic data types, but a new generation of programming languages (Swift, Rust, and Kotlin*) is bringing these types to a much wider audience.
As a Swift developer (who has dabbled in Rust, Kotlin and Haskell), I find it surprising that it has taken this long for mainstream programming languages to embrace sum types, because they are so incredibly useful for accurately modeling state in programs. Once you have used a language with sum types, writing programs without them feels a bit like driving with the parking brake on; It’s very inefficient, probably not very safe, and definitely not any fun.
So how would this model look like represented as a sum type in Swift? Here is what I came up with:
With this version we only represent the three valid states that we identified earlier: Ready to go, failed to send a code, and sent a code. Notice how the third state has an associated value — the date when we will be ready again (I could have also chosen to store the number of remaining seconds, but this proved to be a more elegant solution).
This ability to store associated values with a specific case or state is what gives sum types their expressive power (and separates them from the primitive enums of most imperative languages). We only have a date if we are in that third state.
This model makes it impossible to represent invalid states, such as being ready and simultaneously having a date in the future when we will be ready. The compiler will enforce this for us, so not only do we not need to write defensive code to handle invalid states, the compiler will reject code that tries to!
Once I found this way to model my state, everything else in the code fell into place. First our view model stores our current state:
If you’re not familiar with SwiftUI, the
@published property wrapper basically says that whenever the value of this property changes, any view that is “observing” changes to this view model should re-render itself automatically to reflect the new value.
Pretty cool stuff. We mark this state as private, because we don’t actually want the view to know about this state. Everything that the view needs to know about can easily be computed from the state. That’s the job of the view model!
Here is our derived view state. It’s pretty clear to see how everything derives from our stored state, and also quite easy to verify that it matches the requirements that we were given.
And finally here is all of our business logic.
The details of the code might not be clear if you are unfamiliar with Swift concurrency, Combine, and SwiftUI, but I hope you can at least appreciate the simplicity of this code. There are no conditional statements. The only branching code path is the
catch block to handle the failure to request a verification code, and even then there is no code on the “happy path” after the branching occurs.
Crucially, every state update is a single assignment to the property
resendCodeStatus. There are fewer opportunities to make mistakes because so much of the business logic is encoded at the type level.
I find it very helpful to start by trying to describe the business rules in plain English; If the word “or” is ever used, it is very likely that a sum type could be part of an elegant solution. Keep in mind that there is never one single correct solution. Just as 6 can be produced by any of the expressions 2 + 2 + 2, 3 + 3, 4 + 2, 2 * 3, etc, there are multiple ways to represent a system with six possible states.
One crucial (and not immediately obvious) point is that every type with a domain of N is isomorphic to every other type with domain N. For example, an
enum with cases
sendPushNotifications carries exactly the same amount of information as a boolean
shouldForward, but it is more expressive in that it explicitly says what the alternative to forwarding is. It is our job as programmers to find the representation that most clearly and efficiently solves the problem at hand.
I hope this has been an interesting look into how sum types can help us to model our application state more clearly and expressively. Most of us are very familiar with product types, but those of us lucky enough to be able to work in a language with sum types should not neglect this powerful tool in our toolkit. I’ve found that having access to sum types very often allows me to find clear and efficient solutions.
*Kotlin allows an emulation of sum types via inheritance with sealed classes. This amounts to pretty much the same thing, but it isn’t baked into the type system like it is in Swift, Rust and FP languages.