Paper Notes
benchmarks
1. tpch-chokepoints
2. ssb
bigdata
compiler
databases
datalayout
data structures
distributed system
filesystem
1. gfs
2. polarfs
llm
storage

论文阅读笔记

data layout

C-Store: A Column-oriented DBMS
Integrating Compression and Execution in Column-Oriented Database Systems
Dremel: Interactive Analysis of WebScale Datasets
RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems
Major Technical Advancements in Apache Hive
Table Placement Methods

Further readings

[1] Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation? by Daniel Abadi, 2017