前言
Go
1.19 发布时更新了内存一致性模型,刚好可以结合 Russ Cox
之前写的科普向文章一起来理解一下,尽管文章里没有讨论到 GPU
和 Linux Kernel
的内存一致性模型,但
总体上还是有很高的可读性,不记笔记感觉对不起这么好的材料。
P.S. “内存模型”这个名词视语境不同可以有完全不同的解释,我个人建议尽量不要滥用。
memory model:
- address space layout
- memory addressing scheme
- memory allocation scheme
- memory object model
- memory ordering
- memory segmentation
- type layout
参考资料
- Memory Models
- The Go Memory Model
- Linux-Kernel Memory Model (LKMM)
- Standard library:
笔记整理
memory consistency model:
- about the visibility and consistency of memory operations in multithreaded contexts
- cross-disciplinary contract: hardware <-> compilers <-> programmers
valid optimizations do not change the behavior of valid programs
- processor optimizations: largely revolves around how writes are propagated to other threads
- compiler optimizations: largely revolves around reordering of instructions
Data Race ∈ Race Condition
Every race involves at least one write: two uncoordinated reads do not race with each other.
SCPV (sequential consistency per variable), AT (Atomicity), HB (Happens-before), PB (Propagates-before)
The total order over all the synchronizing operations is separate from the happens-before relationship. It is not true that there is a happens-before edge in one direction or the other between every lock, unlock, or volatile variable access in a program: you only get a happens-before edge from a write to a read that observes the write.
关键原子指令
- read–modify–write (RMW)
- compare-and-swap (CAS)
- load-linked/store-conditional (LL/SC) / load-reserved/store-conditional (LR/SC)
- load-acquire/store-release (LDAR/STLR)
硬件内存模型 (Symmetric multiprocessing)
- Sequential Consistency: the ideal model
- Total Store Order (TSO):
x86
- Weak Consistency:
RISC-V
- Relaxed Consistency:
ARM
- Data-Race-Free Sequential Consistency (DRF-SC): current consensus
The gap between what is allowed and what is observed makes for unfortunate future surprises: hardware implementing a stronger model than technically guaranteed encourages dependence on the stronger behavior and means that future, weaker hardware will break programs, validly or not.
编程语言内存模型 (Concurrency)
All modern hardware guarantees coherence, which can also be viewed as sequential consistency for the operations on a single memory location. It turns out that, because of program reordering during compilation, modern languages do not even provide coherence.
Coherence is easier for hardware to provide than for compilers because hardware can apply dynamic optimizations: it can adjust the optimization paths based on the exact addresses involved in a given sequence of memory reads and writes. In contrast, compilers can only apply static optimizations: they have to write out, ahead of time, an instruction sequence that will be correct no matter what addresses and values are involved.
Threads Cannot Be Implemented As a Library: languages cannot be silent about the semantics of multithreaded execution.
- DRF-SC
- happens-before relation through synchronization operations
- total order with interleaved execution
- atomics (atomic variables/atomic operations)
- non-synchronizing
- relaxed: for hiding races, provide no ordering, cannot be used to build new synchronization primitives
- synchronizing (message receive/message send)
- sequentially consistent (strong)
- acquire/release (weak): coherence-only, provide limited ordering, create happens-before relation but do not provide DRF-SC
- non-synchronizing
- memory barriers/fences
- high-level synchronization mechanisms
- semaphore (binary semaphore/counting semaphore)
- spinlock
- barrier
- mutex
- readers–writer lock
- condition variable (signal/notify_one and broadcast/notify_all)
- monitor
- channel (buffered/unbuffered)
- atomic reference counting
- once
- pool
- future (explicit/implicit)
- futex
- sequence lock
- read-copy-update
- semantics for racy programs
- defines the behavior and possible results
- as undefined behavior: DRF-SC or Catch Fire
- distinguish invalid compiler optimizations
- prohibit paradoxes like out-of-thin-air values (acausality)
写在最后
When it comes to programs with races, both programmers and compilers should remember the advice: don't be clever. (Clear is better than clever.)