From Signals to Cities: Compression and the Geometry of Novelty

In our previous post, Compression, Non-Commutativity, and Information Coefficients, we explored how averaging and normalization interact in nontrivial ways on the sphere. In particular, we saw that compressing a sequence of vectors into a single direction introduces a systematic distortion.

In this post, we take that idea one step further — and connect it directly to how we think about novelty in AlphaNova competitions.

Signals, cities, and novelty

In AlphaNova’s next competition, we encourage scientists to search for signals that are novel.

Concretely:

A proposed signal should not be more than 0.5 correlated with any existing signal in the test set.

There is plenty of elbow room to achieve this — the space of signals is large, and true novelty is attainable.

The practical constraint

There is, however, a challenge.

We cannot expose all signals in the database directly.

Instead, we expose a compressed representation of signals, which we call cities.

A signal is a time series of cross-sectional forecasts (a sequence of points on $S^M$ )
A city is the compressed version of that signal:

\text{city}(X) = \frac{\frac{1}{N}\sum_{t=1}^N X_t}{\left\|\frac{1}{N}\sum_{t=1}^N X_t\right\|}

So each signal — a rich time series — is mapped to a single point on the sphere.

How scientists use cities

Given a candidate signal $X$ , a scientist can:

Compute its city (on the validation set),
Compare it to existing cities,
Check whether it is “too close” to any of them.

This provides a practical proxy for novelty.

The key empirical observation

Here is the interesting part.

Empirically, we observe that:

If two signals are close in city space, they are typically less close in signal space.

Equivalently:

Compression tends to shrink similarity.

(Insert conceptual compression figure here)

What the data says

We can make this precise.

The figure below compares city novelty (distance in compressed space) with global novelty (distance in full signal space) across 223 signals.

Two things stand out immediately:

The vast majority of points lie above the diagonal
The gap is not small — it is systematic and material

In fact:

74% of signals are more novel in global space than in city space.

Quantifying the gap

Let

\text{gap} = \text{global novelty} - \text{city novelty}.

Then:

Median gap: +12.9°
95% CI: [+10.7°, +16.8°]
Mean gap: +14.4°

So on average, compression underestimates novelty by about 10–15 degrees.

Statistical significance

We tested whether this effect could be due to chance.

Test	Result
Paired t-test	$p = 1.8 \times 10^{-25}$
Wilcoxon signed-rank	$p = 2.4 \times 10^{-22}$
Sign test (74% > 50%)	$p = 1.5 \times 10^{-13}$
Cohen’s d	0.79 (large effect)

All tests reject the null at extremely high significance.

The effect is not subtle — it is large, stable, and highly statistically significant.

Interpreting the units

We measure novelty in degrees (angles on the sphere).

If you prefer correlations:

\langle x, y \rangle = \cos(\theta),

so angle and correlation are directly related.

For reference:

$60^\circ \approx 0.5$ correlation
$90^\circ = 0$ correlation
smaller angles → higher correlation

So the 0.5 correlation threshold used in the competition corresponds to roughly 60° separation.

A gap of $10^\circ$ – $15^\circ$ therefore represents a meaningful change in correlation — especially near this operating range.

What this means in practice

This gives a clear operational takeaway:

City space is conservative.

If two signals look close in city space, they are very likely close in full signal space.
If they look far in city space, they are even farther apart in reality.

So compression does not just reduce information — it systematically shrinks similarity.

Why does this happen?

To understand this, we return to the geometry from the previous post.

Let $X = (x_1,\dots,x_N)$ and $Y = (y_1,\dots,y_N)$ be two signals, with $x_t,y_t \in S^M$ .

Define:

average similarity:
$D_N = \frac1N\sum_{t=1}^N \langle x_t,y_t\rangle$
compressed similarity:
$C_N = \left\langle \frac{\bar x}{\|\bar x\|}, \frac{\bar y}{\|\bar y\|} \right\rangle$

where

\bar x = \frac1N\sum x_t, \qquad \bar y = \frac1N\sum y_t.

What’s the difference?

$D_N$ measures average, time-local agreement
$C_N$ measures agreement of persistent directions

Compression removes:

temporal variation
regime shifts
rotating structure

and keeps only the stable component.

A useful identity

We have:

\langle \bar x,\bar y\rangle = \frac{1}{N^2}\sum_{i,j}\langle x_i,y_j\rangle

So compression replaces:

the diagonal average (matched times)

with

the full cross-average (all pairs)

This dilutes time-specific alignment.

A stylized mathematical result

There is no universal inequality between $C_N$ and $D_N$ .

However, in a natural probabilistic model, we can explain the observed effect.

Suppose:

x_t = \mu_x + \varepsilon_t, \qquad y_t = \mu_y + \eta_t

where:

$\mu_x, \mu_y$ are persistent components
$\varepsilon_t, \eta_t$ are zero-mean fluctuations

Then:

D_N \approx \langle \mu_x,\mu_y\rangle + \mathbb E[\langle \varepsilon_t,\eta_t\rangle]

C_N \approx \frac{\langle \mu_x,\mu_y\rangle}{\|\mu_x\|\,\|\mu_y\|}

So:

$D_N$ captures persistent + transient similarity
$C_N$ captures persistent similarity only

Key implication

If time-local covariance is significant, then:

C_N < D_N

for large $N$ .

Interpretation

This matches exactly what we see in practice:

signals often co-move at the same time (shared exposures, short-term factors)
but their long-run average directions are weaker or partially cancel

So:

Compression removes transient agreement and reveals only stable structure.

Final takeaway

Cities are compressed signals — and compression makes signals look more different than they really are.

This is precisely why city-space works so well:

it is computationally simple
it preserves persistent structure
and it provides a conservative test for novelty

If your signal is sufficiently novel in city space, there is even more room in full signal space.

Looking ahead

This raises interesting questions:

Can we characterize exactly when compression preserves similarity?
Can we design better “cities” that retain more structure?
What does this imply for portfolio construction?

But for now, the message is simple:

Compression is not neutral — it reshapes geometry in a useful way.

Closing thought

What looks like a limitation — compressing rich signals into a single point — turns out to be a feature.

Compression strips away time-local noise and exposes what actually persists. In doing so, it reshapes geometry in a way that is both mathematically subtle and operationally useful: it makes similarity harder to fake and novelty easier to detect.

Cities are not just a proxy. They are a filter.

And if your signal is truly novel, it will survive that filter.