Andrew’s Distributed Systems Blog

2 min readApr 9, 2021

Introduction

I am a senior software engineer with an interested for large scale distributed systems in the platform/infrastructure space. My current interests really center around databases, streaming platforms, workflow engines, and other systems which at their core are responsible for data management.

I am starting this blog as a place to record my learnings about distributed systems. For the foreseeable future each blog post will center around some interesting technology in industry.

Upcoming Posts

Spanner: Highly available, externally consistent, horizontally scalable database from Google.
BigTable: This came before Spanner, and was one of the pieces of tech that lead Google to realize they needed Spanner.
MegaStore: This was a database that came out at Google before Spanner. And it shared some vague similarities with Spanner.
Colossus: This is the file system the backs Spanner.
Google File System: This is the filesystem that came before Colossus.
CockroachDB: This is a database that is open source that is based on Spanner. The goal is to provide similar abstractions without requiring use of atomic clocks.
RocksDB: A key value store that CockroachDB uses.

Technologies to Write About

Cadence
Spanner
Kafka
ZooKeeper
Chubby
S3
GFS
MapReduce
ElasticSearch
Redis
BigTable
Rocks DB
Dynamo DB
Cassandra
MemSQL
Cockroach DB
Megastore
M3
BigQuery
Firestore
MongoDB
Kinesis
QLDB
AWS SQS
Apache Kudu
Apache HBase
Voldemort DB
Neo4j
Memcached
Facebook Gorilla Time Series DB
Vitess
Presto
Apache Spark
Apache Flink
Apache Pinot
Kubernetes

In order to find more things to write about and as a starting point to understand technologies I can use the following resources

https://projects.apache.org/projects.html?category#database
Youtube videos from Amazon ReInvent
Youtube videos from Google Next
https://aws.amazon.com/ pick a technology and click on resources
https://cloud.google.com/products
https://github.com/gregsramblings/google-cloud-4-words

For each one of these technologies fundamentally there are three questions I want to try to answer.

What problem does it solve?

What is the problem space?
What are sample use cases?
What interface does it expose to users?
What other projects are similar to this one?
What are the guarantees provided by the system?

How is it built?

What is the high level architecture?
How are failures handled?
How is replication handled?
How are the guarantees provided?

What can we learn from it?

What did this system do well?
What could have it done better?
What future features might be added to this system?
What learnings about industry/software can we take from this system?

Exploring

While right now I want this blog to focus on technologies in industry, there are several topics I am interested in that do not fit squarely into the above framework. The following is a brain dump of things for me to explore and potentially write posts about.

SWIM protocol
RAFT protocol
Paxos protocol
Distributed Priority Queuing
Time Series Databases
Consistent Hashing
Graph Databases
Location Databases
Automatically consistent types like counters, or sets
Blockchain

Andrew’s Distributed Systems Blog

Introduction

Upcoming Posts

Technologies to Write About

Exploring

Written by Andrew Dawson