One weird trick, bugs HATE him

Tyler Adams

Mar 02, 2021

Today, we’re not talking about code. We’re talking about data.

SPOT Rule

Within a program’s data, every fact has a Single Point Of Truth.

That’s it. If your data does this, and it’s impossible to write a whole species of bug.

A Quick Example

This program tracks if people are friends (assuming no one-way friendships):

friends = {
}

def add_friends(a, b):
   if not are_friends(a, b):
     if a not in friends:
       friends[a] = set()
     friends[a].add(b)

     if b not in friends:
       friends[b] = set()
     friends[b].add(a)

def are_friends(a, b):
  return b in friends.get("a", set())

def main():
  add_friends("bob", "jim")
  print(are_friends("jim", "bob"))

Facts

The “facts” in this code are whether two people are friends. Are “bob” and “jim” friends? That’s a fact.

Points of Truth

For a given fact (say whether “bob” and “jim” are friends), where is it stored?

Two places:

whether “jim” is in friends[“bob”]
whether “bob” is in friends[“jim”]

This violates the SPOT rule.

Writing a bug

Consider this new code to remove friends

def remove_friends(a, b):
  if a in friends:
    friends[a].remove(b)

At first glance, it looks fine. It’s not. It needs to update the second source of truth.

  if b in friends:
    friends[b].remove(a)

Code that looks right but is wrong slows us down a lot. It’s much slower to debug than code that’s wrong and looks wrong.

The only reason this code looks okay when it isn’t is because there are two sources of truth. If there was one, then wrong code would look wrong. Right code would look right.

More importantly, however, the data lied, since one source of truth is properly updated to remove the friendship. This makes debugging very slow. A programmer could look see the program says they aren’t friends, consult the other source of truth, and see they are. That’s really hard to debug. Removing this possibility is why SPOT will help us code fast.

Refactoring to use SPOTty data

Let’s fix the data format (and therefore the code) to follow the SPOT rule.

We’ll redesign our data to store friend pairs as sets in a master set. As such, it makes it impossible to write a bug like the one we had above.

friends = set()

def add_friends(a, b):
   # frozensets, unlike sets, can be members of a set
   fset = frozenset([a, b])
   friends.add(fset)

def remove_friends(a, b):
    fset = frozenset([a, b])
    if fset in friends:
      friends.remove(fset)

def are_friends(a, b):
   fset = frozenset([a, b])
   return fset in friends

def main():
  add_friend("jim", "bob")
  print(are_friends("jim", "bob"))

The code looks a bit funky because a and b are mapped to frozen sets, but it’s worth it for SPOT. Writing remove_friends is almost impossible to get wrong.

def remove_friends(a, b):
    fset = frozenset([a, b])
    if fset in friends:
      friends.remove(fset)

And since there’s only one point of truth, there’s no chance of the data will lie.

(For those of you wondering why I don’t just enforce a < b and store b in friends[“a”] should see the advanced section below)

SPOT Violations

There’s two types of SPOT violations: explicit and implicit.

Explicit

A fact is explicitly stored in multiple places, like we saw above.

Implicit

A fact is implicitly stored in multiple places.

Let’s see a quick example: strings with a stored length (ignoring that python does this under the hood).

class FastString:
  def __init__(self):
    self.l = 0
    self.v = ""

  def set(self, v):
    self.v = v

  def __len__(self):
    return self.l

  def __repr__(self):
    return self.v

def main():
  s = FastString()
  s.set("a")
  print(s)
  print(len(s))

If we run this, we see s has value “a”, but length 0. Why is the data lying to us!?

The problem is when we set self.v, we also need to update self.l.

  def set(self, v):
    self.v = v
    self.l = len(v)

We could only write this bug and have the data lie to us because the length of s has two sources of truth: self.l and self.v. Here, self.v is implicit.

Caches

Most violations of SPOT come from caches.

Explicit violations come from caching data with slow lookup times (ex. network calls).

Implicit violations come from storing computaions.

However, by definition, a cache violates SPOT. A cache, is a consulted source of truth for some real source of truth. There’s the real truth, and the cached truth, and it’s easy to write code that doesn’t update both. Avoid caches if you can.

Advanced idea: Generalizing SPOT

For my more advanced readers, you might enjoy this insight: SPOT is a special case of reducing data invariants.

Earlier we said SPOT works because there’s a bug species which lives between two sources of truth. If there’s one, then the bug won’t live there.

The specie’s genus hides out in all data with invariants.

For example, sorted lists have an invariant that a_i < a_j for all i < j.

If any one piece of code breaks the sorted order invariant, then other code interacting with the list will behave in unpredictable ways. There are ways to mitigate these bugs and make them easier to find, but the root cause is the invariant itself.

These bugs are nastier than multiple sources of truth. You can compare two sources of truth to see they’re not the same. Invariants can be violated more subtly.

To avoid this, rewrite data formats to reduce the invariants. This can be harder than removing implicit invariants. Escalate the problem.

Conclusion

In this post we saw:

the SPOT rule makes bad code look bad
bugs due to SPOT violations are slow to debug
One example fixing of a SPOT violation
the two violation types: explicit and implicit
all caches violate the SPOT rule
(Advanced) the SPOT rule is a special case of the “no data invariants” rule

If you liked this post, give it a like or share it with a friend.

If you want to know more about any of these topics, let me know in the comments.

Andrew Judson

Mar 2, 2021

What is the correct solution to the FastString code block? Private fields and using a setter to modify it rather than direct access?

Would also be curious about situations where you have to break this condition (e.g. need another data structure with different read performance, or a denormalized database) - how do you mitigate it?

Expand full comment

1 reply by Tyler Adams

1 more comment...

CodeFaster

Discussion about this post