Evaluation

desirablebusboy

1mo agocompetition-5

Hello everyone, this is my first contribution here. My aim is to better understand what the most important evaluation standard really is. What should be considered the most important standard of evaluation — is it the sharpe, or something else?

15 Replies

Nonius1mo ago

Hey Desireablebusboy,

Welcome, glad you're here!

Short answer: it's two things, not one — Sharpe and how different your signal is from what's already known. Either one alone isn't enough.

A bit more concretely, here's roughly how we pick the "quality" set each round:

We line up every competition signal by Sharpe, highest to lowest.
We start with a "background wall" of signals that already exist (internal signals + previously-accepted ones).
Walking down the Sharpe-sorted list, for each candidate we check: is this signal sufficiently uncorrelated (|corr| ≤ 0.5) with everything in the wall so far?

If yes → it joins the quality set, and from now on it's also part of the wall.
If no → we skip it and move to the next candidate.

So Sharpe gets you in line, but decorrelation is what gets you through the door. A modestly-Sharpe signal that captures a genuinely new pattern can absolutely make it in, while a high-Sharpe signal that re-discovers something already represented will get filtered out. The intuition is: we don't want to pay twice for the same alpha, no matter how strong it looks individually.

One side effect worth knowing: order matters. Because the wall grows as we accept signals, a candidate that would be accepted on its own can be blocked by an earlier-accepted signal it happens to correlate with. So if you're trying to crack into the quality set, finding a direction that genuinely isn't in the existing set is often more valuable than squeezing out a slightly higher Sharpe along a crowded direction.

Prizes will be allocated on that basis by the way. It's interesting to note that the top of the leaderboard purely by sharpe is a signal from distinguishingtremor , sharpe -wise it looks pretty awesome. but it's above 50% correlated with a signal we already have. will definitely send him a note to tell him to tweak his signal to maintain good sharpe and get less correlated with signals we have. All of you can have up to 10 submissions !

Hope that helps — happy to go deeper on any of it. It's also described in the COMPETITION.md file you would have received in the zip download. It's also described in the Overview panel.

Current rankings of the quality (high sharpe and low correlation) signals. maybe we should have a seperate column for quality on leaderboard and have it ranked by that

#	Username	Sharpe
1	mathurin	0.035624
2	mmunoz	0.031170
3	Halim	0.022746
4	Halim	0.022555
5	Halim	0.022154
6	mathurin	0.021740
7	Halim	0.021584
8	Halim	0.019952
9	Halim	0.019434
10	Halim	0.019185
11	reprehensiblegrandeur	0.016573
12	metalhead	0.008921

desirablebusboy1mo ago

What is the difference between low correlation and City Novelty?

Nonius1mo ago

Probably best if I go through how a city is constructed. A signal in this competiton is a 20 dimensional vector time series. The target is also a 20 dimensional vector time series. both are demeaned which means they can be represented as 19 dimensional vector time series. for each time stamp, we rotate the target to be the north pole and all signals are rotated by the same rotation matrix for that time stamp. that preserves angles between all vectors and hence preserves cross sectional correlation/cosine similarity at that time stamp between signals and the target. we do this for every time stamp and so every signal is transformed in such a way that correlatioins are preserved. after that transformation, we define a signals city as the normalized time average of of the signal at each timestamp. that gives one point on an 18 dimensional sphere. City novelty measures how far you are to other cities associated with other signals. it's expressed in degrees but you can think of it as the cosine similarity of a city with it's nearest neighbour, if the degrees is above 60, then the cosine similarity is less than 0.5. if a signal's city has high novelty (high degrees/low cosine similarity), it will, with good probability, have low correlation with other signals, where correlation is measured as the time averaged cross sectional correlation of the signal with another signal.

desirablebusboy1mo ago

"It has been suggested that changing the way positions are displayed in the leaderboard might be better, so that they are shown based on this criterion. Why are the positions in the leaderboard not displayed according to this standard?

Nonius1mo ago

thanks for the suggestion. and it makes sense. we'll look into it; might take a bit of time.

Nonius1mo ago

Hey just to let you know we implemented your suggestion

desirablebusboy1mo ago

Noted the adjustment, thanks for following through.

Nonius1mo ago

important point on this "We start with a "background wall" of signals that already exist (internal signals + previously-accepted ones)." by previously-accepted-ones I mean signals excepted from a previous contest...not the current one.

rudiyantoamdkom1mo ago

The leaderboard displays the Sharpe ratio because it is an indicator of absolute success against the market, whereas the correlation clustering criterion is merely a business viability indicator for AlphaNova's internal portfolio.

desirablebusboy1mo ago

"How can we address the discrepancy between the results I obtain in my local tests and the results shown on the leaderboard? What methods or best practices can help ensure that the outcomes are consistent or at least reliably aligned?

Nonius1mo ago

Hi Desirablebusboy,

you have some good questions. here's the thing, we gave you a snapshot of data from our existing signals. the data we gave you was the citiification of our signals at a snapshot that was cut 3 weeks ago. those cities were computed based on 10s of thousands of testing period observations. so, like plate tectonics on Planet Earth, the cities will move around a bit, but, over the next 5 months, it's like, nope. now, the other factor you have to contend with is there are of these awesome scientists founding their own signals and associated cities in this contest, and that makes it hard. the data we give you was a frozen snap 3 weeks ago, but some data scientists found other relatively uncorrelated signals and, correspondingly, novel cities, and, on top of that, time marches on. we update market data on a frequent basis and recompute stats on all signals also on a frequent basis. so it's a movable feast. its not trivial and in fact, I'd guess it's very hard. I've tried myself, unsuccessfully to participate. since you've had some very good questions, happy to hop on a call or maybe organise an AMA hosted on Discord or some other venue. this contest IS hard

Nonius1mo ago

lemme try to take the analogy further. imagine you're an explorer 100s of years ago. maybe from Europe, Asia or wherever. you have a crude map of things, then you explore, and you land on some awesome islands, super awesome for starting a new civilisation. but, as you were sailing to these islands , some other people had the same maps and got there before you and so, boom, they win in some sense. it's kinda like that. add on top of that, continents, islands etc move around because of plate tectonics, but that latter concept is secondary

Nonius1mo ago

well, I'll be honest with you, it's a bit more cut throat than that. It's not an analogy I'd like to expose but it's like you get to the islands unexplored, some other people get there, the islands are far from civilisation, and , well, in our contest, the people with the best sharpe ratios "win" those islands. doesn't matter when they arrive.

desirablebusboy1mo ago

I think it would be valuable to have more discussions on whether there are engineering or methodological approaches to close the gap between local test results and real-world leaderboard outcomes. Exploring systematic ways to reduce this discrepancy could help us better align testing frameworks with actual performance. This was the core of my original question. And can we understand the limits of traditional methods in reducing the gap, and whether there could be more efficient engineering approaches in this area.

Nonius1mo ago

Good feedback . Let me give at think on this and get back to you.

One thing to keep in mind : your local results are on a validation set . We hold out the testing set so you’d normally see a drop in sharpe ratio on testing . As for how correlated your signal is on the leaderboard with a) our legacy signals and b) the other signals that are on the leaderboard , there’s some things you can do with the city data locally .

Will revert with more thoughts on this