Averaging vs normalization on the sphere

Introduction

Let $X_1, \dots, X_N \in S^M \subset \mathbb{R}^{M+1}$ be a sequence of unit vectors. Define the empirical mean:

B = \frac{1}{N} \sum_{i=1}^N X_i, \quad B_N = \frac{B}{\|B\|}.

Fix the “north pole” $e_0 = (1,0,\dots,0)$ , and define:

c = \frac{1}{N} \sum_{i=1}^N (X_i)_0 = B_0, \quad b = (B_N)_0 = \frac{B_0}{\|B\|}.

We study the quantity:

\kappa := c - b.

This simple difference encodes:

a non-commutativity phenomenon
a geometric correction for dispersion
a meaningful distinction in financial forecasting

Averaging is compression

The averaging map

A : (S^M)^N \to \mathbb{R}^{M+1}, \quad A(X_1,\dots,X_N) = \frac{1}{N}\sum_{i=1}^N X_i

is inherently many-to-one.

It compresses an entire sequence into a single vector, discarding:

temporal order
variation and dispersion
rotation of directions

Normalization introduces a second compression:

B \mapsto B_N = \frac{B}{\|B\|}

which removes magnitude and keeps only direction.

So we have a two-stage compression:

sequence → mean vector → mean direction

$c$ : computed after the first compression
$b$ : computed after the second

Thus

\kappa = c - b

measures the effect of compressing a vector into a direction.

A subtle non-commutative diagram

Consider:

(S^M)^N  ────────────────→  S^M
    |         A_s             |
    |                         | P
    | P                       ↓
    ↓                        R
   R^N   ────────────────→
            A_e

Where:

Top map $A_s$ (spherical averaging):

A_s(X_1,\dots,X_N) = \frac{\frac{1}{N}\sum_i X_i}{\left\|\frac{1}{N}\sum_i X_i\right\|}

Bottom map $A_e$ (Euclidean averaging):

A_e(x_1,\dots,x_N) = \frac{1}{N}\sum_{i=1}^N x_i

Left map $P$ :

P(X_1,\dots,X_N) = \big((X_1)_0,\dots,(X_N)_0\big)

Right map $P$ :

P(X) = X_0

Two paths

Down then right:

(X_1,\dots,X_N) \to ((X_1)_0,\dots,(X_N)_0) \to c

Right then down:

(X_1,\dots,X_N) \to B_N \to b

The diagram does not commute

P \circ A_s \ne A_e \circ P

and the gap is exactly:

\kappa = c - b.

Two “averages” that are the same—and not

Both $A_s$ and $A_e$ are “averages”:

Same idea: aggregate many objects into one
Different reality:
- $A_e$ : linear space
- $A_s$ : linear average + projection onto a curved space

Same abstraction, different geometry.

Geometry: coordinates vs directions

We have:

b = \frac{c}{\|B\|}, \quad \kappa = c\left(1 - \frac{1}{\|B\|}\right)

and:

\|B\|^2 = c^2 + \sum_{j=1}^M B_j^2

Define:

V_\perp = \sum_{j=1}^M B_j^2

Then $V_\perp$ captures structure orthogonal to the target and drives the discrepancy.

Financial interpretation: IC vs directional IC

Let $X_t$ be a time series of cross-sectional forecasts.

Let $e_0$ be the target.

Then:

(X_t)_0 = \langle X_t, e_0 \rangle = IC_t

So:

c = \frac{1}{N} \sum_{t=1}^N IC_t

is the time-average IC.

And $b$ ?

B = \frac{1}{N}\sum_{t=1}^N X_t, \quad b = \langle B_N, e_0 \rangle

So:

$b$ = IC of the aggregate direction

Compare and contrast

Quantity	Interpretation
$c$	Average IC across time
$b$	IC of the persistent direction

What compression signifies

\kappa = c\left(1 - \frac{1}{\|B\|}\right)

where $\|B\|$ measures:

temporal coherence of the signal

Cases

Stable signal → $\|B\| \approx 1$ → small $\kappa$
Rotating signal → $\|B\| \ll 1$ → large $\kappa$

Broader connections

Fréchet mean: https://en.wikipedia.org/wiki/Fr%C3%A9chet_mean
Jensen’s inequality: https://en.wikipedia.org/wiki/Jensen%27s_inequality
Directional statistics: https://en.wikipedia.org/wiki/Directional_statistics

One-line takeaway

Average IC is not the same as IC of the average direction—and compression measures the difference.