Do I really need that database?
I’m pretty sure that every time I’ve built a web application of any sort in the last decade, I’ve reached for a database. Usually SQLite, at least to start, because it lets me get going fast without any infrastructure in place. It’s not even a question, really: if I’m building a web app, it needs a database of some kind behind it.
Actually, let me rewind just a smidge. I really dig Clojure lately, and my favorite thing about it are the immutable data structures: once you’ve created your map, or vector, or whatever, you can’t change it. Instead, you can derive new things from it. Now that I’m comfortable working with immutable by default things, working without them feels almost like I’m building on shaky foundations.
As much as I like Clojure, the chances I can convince my team to start using it are effectively nil. As an experiment, I started sketching out what some of our data classes might look like if I could at least start using things in an immutable way by default, and I came up with something like this (we’re a Java shop, so it’s in Java):
All fields on that class are final, so we can only assign to them once. The constructor applies
what constraints we need, and then there’s a with*(..)
method for each field in the class
that’ll create and return a new object, with just the one field switched out.
In use, it gets used effectively the same way as a Clojure map, except that I have to write
all those with*
methods. They’re tedious, but not really any more so than writing setters.
Plus, I can keep all the data validation in one place (the constructor), I don’t have to write
getters, and I don’t have to write copy methods anymore, because these can’t be changed by
accident. (And we have had issues with things changing by accident..)
So they work, but how far can I take this? Using groov as example, I modelled out our entire project structure:
I lean on Guava to keep my collections immutable, and we’re good to go: I can model out the entire server state easily with immutable data.
That’s great, but things have to change sometime, else it’s kind of useless. So in this prototype I created a little container to hold just the most recent instance of this project, and a way to change it:
I now have a container for all of the important server state that I can hand off to things
that need it (servlets, background processing, etc.), and they can’t screw it up for anyone
else. Once something grabs a reference to Project.users
or whatever, no one can change
those objects out from underneath them.
The only bit of synchronization I’ve needed to this point is that AtomicReference
.
Now, getting back to that database: my data set is going to fit in memory, easily. The objects are all immutable, and the entire server is built around swapping out that Project instance. When the time came to start persisting this thing to disk, I didn’t reach for SQLite this time, I reached for Jackson and just wrote the whole thing to disk as JSON whenever the project changed.
As of this writing, I have a fully functional storage system that takes up a whopping 152 lines of code, it only writes the changed bits of the project (I could have written the whole thing in one go, but I like splitting it up a bit), and if something goes wrong I can easily inspect the data in a text editor.
I seriously don’t need a database this time around. It’s awesome. But it only works as long as I keep everything immutable: the moment something within that Project can change, the whole house of cards falls apart. For now though, I’m pretty happy with this, and it’s probably how I’ll start anything new from now on.
Granted, this won’t scale forever: eventually I’m not going to be able to keep everything in memory and it’s gonna have to spill to disk. When that happens, I think I’m going to give Datomic a strong look. As far as I can tell, this is what Datomic already does: it makes it look like your database is just a single immutable instance in memory. You pass around a database instance like I would pass around that Project instance above, and the data read from that instance is completely immutable. Changing the database (e.g. transacting it) returns a new instance, so again, you can’t change the data accidentally while something else is using it. Datomic handles the details of reading, writing, and caching behind the scenes so you don’t have to worry about it.