Rules of Thumb of Data Engineering
This paper documents some of the main rules we use in engineering database systems.
2. Storage performance and price
Moore’s Law, which posits that circuit densities increase four fold every three years. Memories get four times larger each three years, or about 100x per decade. “ an extra bit of addressing every 18 months”
Moore’s Law originally applied only to random access memory (RAM). It has been generalized to apply to microprocessors and to disk storage capacity. Number of transistors in a dense integrated circuit doubles about every two years from wikipedia.
Summarizing the Storage rules of thumb:
- Moore’s Law: Things get 4x better every three years.
- You need an extra bit of addressing every 18 months.
- Storage capacities increase 100x per decade.
- Storage device throughput increases 10x per decade.
- Disk data cools 10x per decade.
- Disk page sizes increase 5x per decade.
- NearlineTape:OnlineDisk:RAM storage cost ratios are approximately 1:3:300.
- In ten years RAM will cost what disk costs today.
- A person can administer a million dollars of disk storage: that is 30TB of storage today.
And two observations:
- Disks are replacing tapes as backup devices.
- On random workloads, disk mirroring is preferable to RAID5 parity because it spends disk space (which is plentiful) to save disk accesses (which are precious)
3. Amdahl’s system balance rules
Gene Amdahl is famous for many rules of thumb.
- Amdahl’s parallelism law: If a computation has a serial component S and a parallel component P, then the maximum speedup is (S+P)/S.
- Amdahl’s balanced system law: A system needs a bit of IO per second for each instruction per second: about 8 MIPS per MBps.
- Amdahl’s memory law: ? ? 1: that is, in a balanced system the MB/MIPS ratio, called alpha (? ), is 1.
- Amdahl’s IO law: Programs do one IO per 50,000 instructions.
4. Networking: Gilder’s Law
George Gilder predicted in 1995 that network bandwidth would triple every year for the next 25 years.
- Gilder’s law: Deployed bandwidth triples every year.
- Link bandwidth improves 4x every 3 years.