I am a senior software engineer with an interested for large scale distributed systems in the platform/infrastructure space. My current interests really center around databases, streaming platforms, workflow engines, and other systems which at their core are responsible for data management.
I am starting this blog as a place to record my learnings about distributed systems. For the foreseeable future each blog post will center around some interesting technology in industry.
- Spanner: Highly available, externally consistent, horizontally scalable database from Google.
- BigTable: This came before Spanner, and was one of the pieces of tech that lead Google to realize they needed Spanner.
- MegaStore: This was a database that came out at Google before Spanner. And it shared some vague similarities with Spanner.
- Colossus: This is the file system the backs Spanner.
- Google File System: This is the filesystem that came before Colossus.
- CockroachDB: This is a database that is open source that is based on Spanner. The goal is to provide similar abstractions without requiring use of atomic clocks.
- RocksDB: A key value store that CockroachDB uses.
Technologies to Write About
- Rocks DB
- Dynamo DB
- Cockroach DB
- AWS SQS
- Apache Kudu
- Apache HBase
- Voldemort DB
- Facebook Gorilla Time Series DB
- Apache Spark
- Apache Flink
- Apache Pinot
In order to find more things to write about and as a starting point to understand technologies I can use the following resources
- Youtube videos from Amazon ReInvent
- Youtube videos from Google Next
- https://aws.amazon.com/ pick a technology and click on resources
For each one of these technologies fundamentally there are three questions I want to try to answer.
What problem does it solve?
- What is the problem space?
- What are sample use cases?
- What interface does it expose to users?
- What other projects are similar to this one?
- What are the guarantees provided by the system?
How is it built?
- What is the high level architecture?
- How are failures handled?
- How is replication handled?
- How are the guarantees provided?
What can we learn from it?
- What did this system do well?
- What could have it done better?
- What future features might be added to this system?
- What learnings about industry/software can we take from this system?
While right now I want this blog to focus on technologies in industry, there are several topics I am interested in that do not fit squarely into the above framework. The following is a brain dump of things for me to explore and potentially write posts about.
- SWIM protocol
- RAFT protocol
- Paxos protocol
- Distributed Priority Queuing
- Time Series Databases
- Consistent Hashing
- Graph Databases
- Location Databases
- Automatically consistent types like counters, or sets