Random Note
Sep 5th 2025
About Random Matrix Theory, Jiaoyang Huang has a good course on that in spring.
Good classical paper to be refered to: Universal Laws for High dimensinal learning with random features
Song Mei’s course
Sep 8 2025
NNGP (slightly before NTK was born) https://berkan.xyz/digitalGarden/NNGP
Deep information propagation https://arxiv.org/pdf/1611.01232 worths reading
The long-range consistency condition (co-coercivities) in silver stepsize is sufficient and necessary to smooth and convex (Chat with Jinho).
https://arxiv.org/pdf/1502.05666
Sep 15
Stochastic Approximation http://ndl.ethernet.edu.et/bitstream/123456789/23707/1/Harold%20J.%20Kushner.pdf
Conjugate Kernel https://proceedings.neurips.cc/paper_files/paper/2017/file/489d0396e6826eb0c1e611d82ca8b215-Paper.pdf
https://arxiv.org/pdf/1602.05897 Table 1 for kernels in different activations.
Interesting paper: https://arxiv.org/pdf/1606.05340.
Signal propagation: If we consider $q_{l} = \frac{1}{N_{l}} | h^{(l)} (x) |2^2$ (norm of the activation), and study the plot $q_l$ against $q{l-1}$, it looks like the concave curve. It is interesting to see the norm converges at some fixed point $q_*$ after $L=4$ iterations (4 layers, all the things we consider here are at initialization, Figure 1 of the paper).
We also study the change of correlation i.e. $c_{l} = \frac{1}{q_* N_{l}} h^l(x)^\top h^l(x’)$. We plot the curve $c_l$ against $c_{l-1}$, if the curve looks like a concave curve, then it is stable and have vanishing gradient, which leads to bad NNGP performance. If it is convex, it would have exploding gradient and the correlations going to zero. This corresponds to the Figure 2. Maybe we can observe how $c_l$ changes during training, since this paper only studies at initialization.
https://dkarkada.xyz/posts/critical-signalprop-nn/