3 Steps to Designing Better Data Structures

Photo by Alexas_Fotos on Unsplash

As developers, one of the most valuable things we can do before we get to the keyboard and start coding, is to figure out what types of data structures our functions and modules will operate on. Why? As you will see below, when we use the right data structures it becomes much easier to write functions that operate on them, which in turn leads to fun and profit!

The goal of this article is to show you an example of how to approach coming up with good data structures. We'll do this by comparing three different data structures for a Tic Tac Toe board.

Here's a sneak peek at them. Which one would you bet your lunch money on?

A quick real-world example

But first, let me describe a problem I ran into in the past. I was building an Elixir/Phoenix application that served as a backend to a React client. One of the API endpoints was designed to serve up a large chunk of the current user's state, because    we wanted the React application to be very fast after the initial page load. The initial JSON payload looked something like this:

{
  "user": {
    "id": 1,
    "full_name": "Bob Dobolina",
    "email": "bob@dobo.inc",
    "projects": [
      {
        "id": 1,
        "name": "Project 1",
        "proofs": [
          {
            "id": 1, 
            "title": "A design proof",
            "comments": [...]
          }
        ]
      }
    ]
  }
}

On the user's dashboard we show a list of projects, and that's fairly easy to do with this structure. Where things get hairy, is when we render the page that displays a proof along with the comments for that proof. With this deeply nested data structure it would have been a real pain for the React client to traverse the tree and pull the data for a specific proof. The solution was to flatten this data structure and make look-up by id easier. The data structure had to make sense for the main operation we were performing on it, which was data lookup by id.

The three steps

The examples below are in Elixir, but the general approach applies to any programming language. Here is the general approach:

  1. Make a list of the operations you will need to perform on the data structure
  2. Brainstorm some data structures
  3. Evaluate each data structure by going over the operations you need to perform on them

At this stage you want to get a sense of whether or not the code will be easy to write and the performance is OK in terms of CPU and memory usage.

If you are thinking: "I work on web applications for businesses. What could I possibly learn from a Tic Tac Toe game?", I ask for your patience. The takeaways from this article are very relevant to the algorithmic problems we have to solve in our day-to-day work.

What are the operations that will be performed on the board?

First, let's put together a list of the operations that we will perform on the data structure, because without knowing this, we cannot evaluate if the data structures we're coming up with are any good. There are many ways we could represent a 3x3 board, but which one is a good fit for a Tic Tac Toe board? Well, what are some things we need to do with a Tic Tac Toe board?

  • Read and write the value at a given coordinate
  • Display the board as a 3 x 3 grid in the terminal
  • Check the 8 different positions that indicate a win
  • Know if the board is full, so we can see if the game is over

Brainstorm data structures and evaluate them

For the rest of the article we will be starting off with this board:

X _ _
_ O _
_ X _

Let's try out some data structures and evaluate the pros and cons.

First attempt: Nested tuples of rows and columns

Intuitively, when you think of a 3x3 grid you might think of a matrix or two-dimensional array. One way to implement that in Elixir is by using nested Tuples.

Side note: Why not use nested Lists here? For functional data structures, it's recommended to use tuples in situations where the position of a value means something. In this case, the position of the outer tuples represents rows, and the inner tuples represent columns.

{
  {"X", nil, nil},
  {nil, "O", nil},
  {nil, "X", nil}
}

Seems like a good start.

Let's look at our list of operations and get a sense of how easy it will be to write these as functions.

Reading and writing the value at a given coordinate.

It's the second player's turn and they want to write an O in the top corner of the board at row 1, col 3.

To do this we might write a function like write(board, row: 1, col: 3, move: "O").

How would you implement this function?

Since data structures in Elixir are immutable, it means we have to re-create the tuple for row 1 and re-create the board tuple to return an updated board. This is not hard to do, but it's clunky and the code is not very readable.

We don't have to write out the function to know that this would be unwieldy to implement.

Displaying the board as a 3x3 grid

This seems like it will be pretty straightforward if we flatten the nested tuples into a list and iterate over each element.

Checking the 8 different positions that indicate a win

Thanks to Elixir's pattern matching, checking if a board has a winning line is fairly easy to read and implement. For example, if you wanted to check if the middle row has a winning line you could write.

def find_winner(
  {
    {_, _, _},
    {a, b, c},
    {_, _, _},
  }
) when a == b and b == c and !is_nil?(a), do: a

Knowing if the board is full

To check if the board is full we can count up all the free tiles, which are represented as nil:

def board_full?(board)
  num_free_tiles = board
    |> Tuple.to_list
    |> Enum.reduce(0, fn(row, free_spots) ->
      free_in_row = row |> Tuple.to_list |> Enum.count(&is_nil/1)
      free_in_row + free_spots
    end)
  num_free_tiles == 0
end

Maybe there's a better implementation here, but so far, the code above is not looking very easy to read and maintain.

Overall, one of the weaknesses of this data structure seems to be that it is nested and accessing a value requires reading two tuples. Also tuples are not that easy to iterate over as we have to convert them to lists, if we want to use those juicy Enum functions.

Second attempt: Map with coordinates as keys

In the tuple data structure our biggest issue seemed to be making it easy to access values at a given row and column. Let's try making that easier by creating a Map where the keys are the {row, col} coordinate.

{
  {1, 1} => "X",
  {1, 2} => nil,
  {1, 3} => nil,
  {2, 1} => nil,
  {2, 2} => "O",
  {2, 3} => nil,
  {3, 1} => nil,
  {3, 2} => "X",
  {3, 3} => nil
}

Reading and writing to a position

This becomes easier with this data structure.

For reading the item at row 1, column 1 we can simply do a lookup by key:

%{{1, 1} => val} = board

To update the value at row 1, column 3 we can use Map.put:

Map.put(board, {1, 3}, "O")

Checking the 8 different positions to win

We can apply the same strategy for reading a single value to reading a group of values. For example, here's what it would look like if we used pattern matching to check if the first row had equal values.

@doc ~S"""
  Takes in a board and returns either "X" or "O" if there is a winner or nil.
  
  Example
  
  iex> TicTacToe.find_winner(
      {
        {1, 1} => "X", {1, 2} => "O", {1, 3} => nil,
        {2, 1} => nil, {2, 2} => "X", {2, 3} => nil,
        {3, 1} => nil, {3, 2} => "O", {3, 3} => "X"
      }
    )
  "X"

"""
# checks if the first row has the same tile 
def find_winner(%{{1, 1} => a, {1, 2} => b, {1, 3} => c})
  when a == b and b == c and !is_nil(a), do: a # check first row
def find_winner(%{{1, 1} => a, {2, 2} => b, {3, 3} => c})
  when a == b and b == c and !is_nil(a), do: a # check \ diagonal
def find_winner(%{{1, 3} => a, {2, 2} => b, {3, 1} => c})
  when a == b and b == c and !is_nil(a), do: a # check / diagonal
# ... and so on for the remaining 5 cases
def find_winner(_board), do: nil

This is easy enough to implement, but the code itself is not all that easy to read, in my opinion. From reading the first clause, it's not that easy to see that it deals with checking if the first row has the same tiles. It's always a bit of  a code smell to me when I have to add a comment to explain what a line does. Also, the board itself is not very readable in the doctest, not to mention that when you IO.inspect it in IEx it will be even less readable, because each key will be printed on a new line.

Displaying the board as a 3x3 grid

Maps in Elixir are not ordered, even though it seems like they are because IEx displays them in order. This means we can't just iterate over the keys and print out the values.

To display the board we can use a nested loop of row and column ids and look up the values that way.

This is not ideal. Let's see if we can come up with something better.

Third attempt: List of positions

Instead of thinking of the board in terms of rows and columns, what if we think of the board with positions from 0 to 8?

0 1 2
3 4 5
6 7 8

In Elixir, we could use a List to represent this.

[
  "X", nil, nil,
  nil, "O", nil,
  nil, "X", nil
]

This feels like a better fit than the other two! Are your intuition butterflies fluttering yet? Let's double check this intuition by reviewing the operations.

Reading and writing to a position

Instead of saying "Player 1 marked X at row 1 column 3", we would think of it as "Player 2 marked O at position 2", which is as simple as List.update_at(board, 2, "X").

To check if position 2 has already been taken, it's as easy as Enum.at(board, 2), which is an O(n) operation, but we don't need to worry about the time complexity for this problem at all, because we are dealing with such a small data set.

Checking the 8 different positions to win

Just as with the nested tuple example, checking if there is a winning line on the board we can use pattern matching. Let's say you wanted to check if the second row has all the same numbers:

[
  _, _, _,
  a, b, c,
  _, _, _
] = board

How about a diagonal?

[
  a, _, _,
  _, b, _,
  _, _, c
] = board

Knowing if the board is full & Displaying the board

Displaying the board and checking if the board is full are both also really easy to implement.

def board_full?(board)
  !Enum.any?(board, &is_nil/1)
end

def display(board)
  board |> 
    Enum.map(&tile_representation/1)
    |> Enum.chunk_every(3)
    |> Enum.join("\n")
    |> IO.puts
end

defp tile_representation(nil): "_ "
defp tile_representation(tile): "#{tile} "

I think we got a winner!

Keep data structures flat

As a general rule, when you are designing your data structures, flatter structures will be easier to work with. Nested structures like our Tuple and Map boards, are harder to update. Since Elixir is immutable, whenever we update a data structure, we have to make our change piece together the original structure. This becomes harder the more nested the data structures are.

Think about the difference in implementation between the three approaches. Which one would you rather work with? In my opinion the List implementation is much easier to work with.

Final thoughts

A good data structure can be the difference between having clean, maintainable, bug-free code and working with a janky code base.

I hope you enjoyed this article and if you have any cool ideas for how to store a Tic Tac Toe board, post them in the comments!

This blog post was inspired by the chapter on Data Structures in "Designing Elixir Systems with OTP" by James Gray and Bruce Tate, where they also go over a Tic Tac Toe example. It's a great book and I recommend you check out Chapter 3 if you want more tips on working with data structures in Elixir.