Rotating Signals: Why We Accidentally Re-Discovered the Dead Zone

So, I had this slightly left-field idea to create new signals for free out of old ones with a bit of geometry and linear algebra.

In the back of my mind, I suspected that whatever I was doing might somehow be violating the finance/statistics version of “there is no perpetual motion machine.” Still, things aren’t quite that axiomatic. There are nonlinear tricks, regime tricks, kernel methods from the noughties, and all sorts of things that can sometimes squeeze structure out of an apparently exhausted signal.

But that’s not what this post is about.

This post is about rotations.

The idea goes like this.

Suppose you had a crystal ball.

You have some cross-sectional signal $X_t$ with positive IC against a future-return target $T_t$ . At each timestamp $t$ , both are vectors in $\mathbb{R}^N$ , one coordinate per asset.

Now suppose you rotate $X_t$ by an orthogonal transformation that leaves $T_t$ fixed.

Then something slightly magical happens.

The component of $X_t$ aligned with $T_t$ remains unchanged, while the orthogonal component rotates around it. So the new signal $Y_t$ has:

exactly the same IC with the target,
but potentially much lower correlation with the original signal.

In other words, if we could rotate around the target axis itself, we could manufacture a whole family of “decorrelated cousins” of the original signal without losing predictive power.

Sounds amazing.

Unfortunately, we do not have the crystal ball.

Which is a bit inconvenient.

Still, the geometry itself is interesting.

The Geometry of Decorrelated Signals

For simplicity, suppose that at each timestamp:

the cross section is demeaned,
and xs-z-scored.

Then each signal lives on a sphere:

S^{N-2}.

Why $N-2$ ?

Because demeaning removes one dimension.

For a universe of $N=20$ assets, signals live on:

S^{18}.

Now suppose we take one signal vector $x\in S^{18}$ .

How many other vectors exist with exactly 50% correlation to $x$ ?

A huge number.

In fact, the set of all such vectors is itself another sphere of one lower dimension:

S^{17}.

Geometrically, you can think of it as a latitude shell around the original signal.

And that’s just for a single timestamp.

Once you consider tens of thousands of timestamps simultaneously, the dimensionality becomes absurdly large.

So purely from a geometric point of view, there is absolutely no shortage of possible “different-looking” signals.

Furthermore, if we require that all generated signals are pairwise 50% correlated with each other, then the problem becomes related to spherical packing and coding theory on high-dimensional manifolds.

Again: enormous combinatorial capacity.

So the geometry looked promising.

Very promising.

The Crystal-Ball Construction

Suppose:

x_t = \rho t_t + \sqrt{1-\rho^2}\,\varepsilon_t,

where:

$t_t$ is the normalized target direction,
$\varepsilon_t$ lies orthogonal to it.

Now apply a rotation that fixes $t_t$ :

R_t t_t = t_t.

Then:

y_t = R_t x_t = \rho t_t + \sqrt{1-\rho^2}\,R_t\varepsilon_t.

The predictive component remains untouched.

Only the orthogonal residual rotates.

Therefore:

\operatorname{IC}(y_t,t_t) = \operatorname{IC}(x_t,t_t).

Exactly.

Meanwhile, the correlation between $x_t$ and $y_t$ can be controlled by the rotation angle.

For example, rotating the orthogonal component by $60^\circ$ gives approximately:

\operatorname{corr}(x_t,y_t) \approx \rho^2 + (1-\rho^2)\cos 60^\circ.

So if $\rho$ is small, the new signal is roughly 50% correlated with the old one.

This is the dream construction.

But again: we don’t know the future target direction.

So the actual problem becomes:

can we learn one fixed rotation that approximately achieves this effect?

The Orthogonal Procrustes Problem

Now things started becoming unexpectedly elegant.

We project into the demeaned hyperplane:

H=\{x:\mathbf 1^\top x=0\},

which has dimension:

N-1.

For $N=20$ , that gives a 19-dimensional space.

Let:

\tilde X,\tilde T \in \mathbb R^{M\times 19}

be the historical signal and target trajectories projected into this hyperplane.

We then solve:

\max_{U\in O(19)} \langle U, C_{XT}\rangle

subject to:

\frac{\operatorname{tr}(C_XU)} {\operatorname{tr}(C_X)} = 0.5.

Where:

C_X = \frac1M \tilde X^\top \tilde X, \qquad C_{XT} = \frac1M \tilde X^\top \tilde T.

This is a constrained orthogonal Procrustes problem.

If you whiten things so that:

C_X = I,

then the geometry becomes especially clean.

The overlap condition simplifies to:

\frac{\operatorname{tr}(U)}{19}=0.5.

So the admissible rotations become a kind of latitude shell inside the orthogonal group.

The actual optimization is solved by taking:

A_\lambda = C_{XT}-\lambda C_X,

doing an SVD:

A_\lambda = V_1\Sigma V_2^\top,

and then setting:

U_\lambda = V_1V_2^\top.

Very neat.

Very satisfying.

And alarmingly effective in-sample.

Why Stop at One Signal?

At this point I thought:

why stop at one rotated signal?

Why not recursively generate an entire ensemble:

Y_1,Y_2,\dots,Y_n

such that every pair has exactly 50% correlation?

So we solved:

\operatorname{corr}(Y_i,Y_j)=0.5 \quad \forall i\neq j.

The single-rotation closed form breaks at this point, because now we have multiple nonlinear constraints.

So we parameterized rotations using the Cayley transform:

U=(I-S)^{-1}(I+S),

where $S$ is skew-symmetric.

For $19$ dimensions, this gives:

19\times18/2 = 171

degrees of freedom.

Then we solved the constrained optimization numerically.

And it worked.

Beautifully.

In-sample we got:

positive IC,
controlled pairwise correlations,
and improved ensemble Sharpe.

Everything looked fantastic.

Then Out-of-Sample Happened

Out-of-sample, things deteriorated badly.

Not catastrophically.

But enough to reveal the underlying issue.

And that issue is explained in the previous post: The Linear Dead Zone.

The key realization is that every rotated signal has the form:

Y_k = XR_k.

Therefore any linear blend becomes:

\sum_k a_k Y_k = X \left( \sum_k a_k R_k \right).

So the entire ensemble collapses to:

XM,

for one matrix:

M = \sum_k a_k R_k.

This was the moment where the whole thing suddenly became much less magical.

We were not manufacturing new alpha sources.

We were merely applying different linear transformations to the same original signal.

And because every $R_k$ was calibrated against the same target covariance structure, all rotations clustered around the same predictive eigendirections.

Geometrically diverse?

Absolutely.

Predictively diverse?

Not really.

The orthogonal residuals generated by the rotations mostly lived in weak-alpha directions.

So although the signals were decorrelated geometrically, they sat very close to the dead zone economically.

And out-of-sample estimation noise destroyed most of the apparent diversification benefit.