1. Paper Notes
  2. 1. benchmarks
    1. 1.1. tpch-chokepoints
    2. 1.2. ssb
  3. 2. bigdata
    1. 2.1. mapreduce
    2. 2.2. nephele
    3. 2.3. dataflow model
    4. 2.4. flink
    5. 2.5. flink state management
  4. 3. compiler
  5. 4. databases
    1. 4.1. optimizer
      1. 4.1.1. overview
      2. 4.1.2. system r access path selection
      3. 4.1.3. volcano
      4. 4.1.4. cascades
    2. 4.2. executor
      1. 4.2.1. volcano
    3. 4.3. concurrency control
      1. 4.3.1. evaluation of in-memory mvcc
    4. 4.4. oltp
      1. 4.4.1. through the looking glass
      2. 4.4.2. staring into the abyss
      3. 4.4.3. system r
    5. 4.5. olap
      1. 4.5.1. lakehouse
      2. 4.5.2. delta lake
      3. 4.5.3. vertica
      4. 4.5.4. duckdb
    6. 4.6. htap
      1. 4.6.1. greenplum
    7. 4.7. cloudnative
      1. 4.7.1. aurora
      2. 4.7.2. taurus
    8. 4.8. columnstores vs rowstores
    9. 4.9. kv
      1. 4.9.1. rocksdb cidr17
      2. 4.9.2. wisckey
    10. 4.10. mmdb
      1. 4.10.1. mmdb overview
    11. 4.11. vector db
      1. 4.11.1. hnsw
      2. 4.11.2. ivf-hnsw
      3. 4.11.3. diskann
      4. 4.11.4. product quantization
    12. 4.12. graph db
      1. 4.12.1. kuzu
    13. 4.13. citus
    14. 4.14. cdc
      1. 4.14.1. dblog
    15. 4.15. rum conjecture
  6. 5. datalayout
    1. 5.1. cstore
    2. 5.2. cstore compression
    3. 5.3. dremel
    4. 5.4. rcfile
    5. 5.5. orc
    6. 5.6. table placement methods
  7. 6. data structures
    1. 6.1. btree family
      1. 6.1.1. bw-tree
    2. 6.2. hash table
      1. 6.2.1. linear hashing
    3. 6.3. trie family
      1. 6.3.1. art
      2. 6.3.2. hot
    4. 6.4. bitmaps
      1. 6.4.1. roaring bitmaps
    5. 6.5. skip list
    6. 6.6. bloom filter
  8. 7. distributed system
    1. 7.1. consensus
      1. 7.1.1. flp
      2. 7.1.2. paxos made simple
      3. 7.1.3. paxos made live
      4. 7.1.4. viewstamped replication
      5. 7.1.5. zab
      6. 7.1.6. paxos vs. vr vs. zab
      7. 7.1.7. raft
      8. 7.1.8. paxos vs raft
    2. 7.2. scheduler
      1. 7.2.1. borg
    3. 7.3. primary backup
    4. 7.4. chain replication
    5. 7.5. bolosky
    6. 7.6. holy grail
    7. 7.7. chandy lamport
    8. 7.8. asynchronous barrier snapshotting
    9. 7.9. zookeeper
  9. 8. filesystem
    1. 8.1. gfs
    2. 8.2. polarfs
  10. 9. llm
  11. 10. storage
    1. 10.1. kv store
      1. 10.1.1. dynamo
    2. 10.2. kudu
    3. 10.3. bluestore

论文阅读笔记

Optimizer

  • Michael Jungmair, et al., PVLDB 2022 Designing an Open Framework for Query Optimization and Compilation
    • propose to extend the scope of query compilers to also perform query optimization as a sequence of compiler passes
  • An Overview of Query Optimization in Relational Systems
  • Access path selection in a relational database management system
  • The Volcano Optimizer Generator: Extensibility and Efficient Search
  • The Cascades Framework for Query Optimization

Optional readings

  • Bailu Ding, et al. Foundations and Trends® in Databases, 2024, Vol 14 Extensible Query Optimizers in Practice
  • Michael S. Kester, et al., SIGMOD '17 Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?