Model Complexity
thoughts on this paper?
3 Replies
I think it's a very interesting paper, but it seems to only use Ridge to get a result. I was expecting they would have used some deep net, and then instead of doing Ridge type shrinkage, just adjust weight decay upwards.
According to my analysis, deep networks with weight decay are mathematically equivalent to the paper's Ridge framework.1. Structural Net EquivalenceRFF functions as a wide neural network.Fixed random weights control the hidden layer.Ridge regression optimizes the final output layer.Wide networks behave like high-dimensional linear models.2. Regularization IdentityWeight decay is exactly prevent overfitting (L2) regularization.(L2) regularization defines Ridge regression shrinkage.Increasing weight decay equals raising parameter .Early gradient stopping provides implicit ridge shrinkage.3. Mathematical Proof ConstraintsRandom Matrix Theory requires exact tractability.Ridge regression allows clean closed-form proofs.Dynamic SGD training prevents analytical derivation.Authors prioritize exact limits for Sharpe ratios.
"Your point about weight decay is excellent because, in standard machine learning libraries (like PyTorch or AdamW), weight decay is exactly how we implement Ridge prevent overfitting (L2) shrinkage in deep nets."
Sign in to reply.