Scaling Laws, Carefully
Lilian Weng publishes a ~25-minute technical essay on scaling laws. She explains why Kaplan and Chinchilla reached opposing conclusions (embedding-parameter counting in small models) and warns against extrapolation in the data-constrained regime.