1. Paper Notes
  2. 1. benchmarks
    1. 1.1. ssb
  3. 2. bigdata
    1. 2.1. mapreduce
    2. 2.2. nephele
    3. 2.3. dataflow model
    4. 2.4. flink
    5. 2.5. flink state management
  4. 3. databases
    1. 3.1. cloudnative
      1. 3.1.1. aurora
      2. 3.1.2. taurus
    2. 3.2. columnstores vs rowstores
    3. 3.3. kv
      1. 3.3.1. rocksdb cidr17
      2. 3.3.2. wisckey
    4. 3.4. mmdb
      1. 3.4.1. mmdb overview
    5. 3.5. oltp
      1. 3.5.1. through the looking glass
      2. 3.5.2. staring into the abyss
    6. 3.6. olap
      1. 3.6.1. lakehouse
      2. 3.6.2. delta lake
      3. 3.6.3. vertica
      4. 3.6.4. duckdb
    7. 3.7. htap
      1. 3.7.1. greenplum
    8. 3.8. vector db
      1. 3.8.1. hnsw
      2. 3.8.2. ivf-hnsw
      3. 3.8.3. diskann
      4. 3.8.4. product quantization
    9. 3.9. graph db
      1. 3.9.1. kuzu
    10. 3.10. citus
    11. 3.11. optimizer
    12. 3.12. executor
      1. 3.12.1. volcano
    13. 3.13. concurrency control
      1. 3.13.1. evaluation of in-memory mvcc
    14. 3.14. cdc
      1. 3.14.1. dblog
    15. 3.15. rum conjecture
  5. 4. datalayout
    1. 4.1. cstore
    2. 4.2. cstore compression
    3. 4.3. dremel
    4. 4.4. rcfile
    5. 4.5. orc
    6. 4.6. table placement methods
  6. 5. data structures
    1. 5.1. btree family
      1. 5.1.1. bw-tree
    2. 5.2. hash table
      1. 5.2.1. linear hashing
    3. 5.3. trie family
      1. 5.3.1. art
      2. 5.3.2. hot
    4. 5.4. bitmaps
      1. 5.4.1. roaring bitmaps
    5. 5.5. skip list
    6. 5.6. bloom filter
  7. 6. distributed system
    1. 6.1. consensus
      1. 6.1.1. flp
      2. 6.1.2. paxos made simple
      3. 6.1.3. paxos made live
      4. 6.1.4. viewstamped replication
      5. 6.1.5. zab
      6. 6.1.6. paxos vs. vr vs. zab
      7. 6.1.7. raft
      8. 6.1.8. paxos vs raft
    2. 6.2. scheduler
      1. 6.2.1. borg
    3. 6.3. primary backup
    4. 6.4. chain replication
    5. 6.5. bolosky
    6. 6.6. holy grail
    7. 6.7. chandy lamport
    8. 6.8. asynchronous barrier snapshotting
    9. 6.9. zookeeper
  8. 7. filesystem
    1. 7.1. gfs
    2. 7.2. polarfs
  9. 8. llm
  10. 9. storage
    1. 9.1. kv store
      1. 9.1.1. dynamo
    2. 9.2. kudu
    3. 9.3. bluestore

论文阅读笔记

distributed system

在 A Brief Introduction to Distributed Systems 的介绍中,分布式系统被定义为:

A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system.

图灵奖获得者 Leslie Lamport 将分布式系统描述为:

one in which the failure of a computer you did not even know existed can render your own computer unusable.

  • consensus
    • Impossibility of Distributed Consensus with One Faulty Process
    • Paxos Made Simple
    • Paxos Made Live
    • Viewstamped Replication
    • Zab: High-performance broadcast for primary-backup systems
    • Vive La Difference: Paxos vs. Viewstamped Replication vs. Zab
    • In Search of an Understandable Consensus Algorithm
    • Paxos vs Raft: have we reached consensus on distributed consensus?
  • A principle for resilient sharing of distributed resources
  • Chain Replication for Supporting High Throughput and Availability
  • Paxos Replicated State Machines as the Basis of a High-Performance Data Store
  • Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail
  • Distributed Snapshots: Determining Global States of Distributed Systems
  • Lightweight Asynchronous Snapshots for Distributed Dataflows
  • ZooKeeper: wait-free coordination for internet-scale systems
  • scheduler
    • Large-scale cluster management at Google with Borg