Problem Summary

In the previous use case we considered implementation shortfall (IS) as a measure of algo performance. It is an “out of sample” measure in the sense that the benchmark price cannot be affected by broker actions. This is opposed to “in-sample” benchmarks such as TWAP which are always open to manipulation because benchmark calculation and broker actions occur at the same time. A good example is infamous 4pm FIX.

IS a good benchmark but it is highly affected by the market direction post the inception price. Therefore, it is difficult to compare long and short term algos because the algo user would need to have a well -defined tradeoff function between risk and return.

In this note, we consider other algo benchmarks which are also implemented at the Tradefeedr platform. They use “in-sample” data but they have the following appealing characteristics

  • They are horizon independent. We call this execution score (ES). The ES range from 0 to 100 with 100 being the best possible and 0 the worst possible score.
  • They not only contain the scoring of the execution but the scoring of the market action after the execution. We call this Reversal Risk.
  • The give some measure of signalling risk or information leakage.
  • For passive algos these benchmarks provide some measures of adverse selection. For example, passive orders placed in a venue with “sharp” traders are likely to execute in unfortunate times (for example BUY before market collapse) and hence be adversely selected.

As all “in-sample” benchmarks they can manipulated. However, they are quite complex to manipulate and also present several very useful insights and hence should be considered as part of algo-ranking toolkit.

Trading Opportunity Set

Algo executions go on over certain time period. If we achieve certain price how do we know it if is good of bad? One way to look is to compare with what was available in the market. That is to compare to the prices prevailing in the market during the execution.

We can argue that every tick is an trading (or execution) opportunity. Alternative way to look at is to consider a TWAP distribution. For example if price 1 was available for twice as long as price 2, then price 1 should have twice the weight in the distribution as we are more likely to trade on it. We call the set of available prices Execution Opportunity Set (analogous to Investment Opportunity Set being a set of assets available to an investor)

Relevant literature (Kissell, 2013) considers all executed prices as an EOS (he does not use this terminology but the logic is the same). However FX market is decentralized and reliable “tape” is not available (although there are a number of projects aiming to bridge this gap). Therefore, we consider all realized mid-quote in the market as EOS.

TWAP or just every tick distribution sound very different in theory but tend to be quite close in practice, especially in active markets. Figure 1 below shows EOS for the full day using both all ticks and TWAP method.

Figure 1: Execution Opportunity Sets (EOS)

Score and Reversal Score – Execution Scenarios

After EOS is defined we need to rank an algo execution against it. To this end we follow EDHEC (2008) and split the algo execution period into two period:

  • Execution Period
  • Post-Execution Period (which also can be Reversal Period)

Every single algo fill is ranked against all quotes in Execution Period. The rank is between 0 and 100 with 100 (the best possible score) meaning buying at the lowest (selling at the highest) of the period and 0 (the worst possible score) means buying at the highest (selling at the lowest). Algo score during the execution period is a size-weighted average of the scores of the individual fills. We can this Execution Score or ES.

Every single algo fill can also be ranked against all the quotes in Reversal period. Post-execution or Reversal score is a size-weighted average of Reversal Scores of individual fills. We call this Reveral Execution Score or ES.

The following scenarios are possible

  • High ES and High RES.
  • High ES and Low RES
  • Low ES and High RES
  • Low ES and Low RES

In what follows we consider all those possibilities, present executions scenarios leading to those scores and discuss what would have to be done better in each situation. We always presented BUY order. SELL order analysis would constitute a mirror image on this.

Figure 2, first chart illustrates the first scenario – good execution with a small number of opportunities to improve. The algo is buying on a raising market and does the bulk of execution early on either because of alpha or just good luck (more runs would be required to tell those apart). The market continues to go up after the algo is done suggesting that the algo execution was not pushing the market much. Each execution on the chart is labelled with its score against execution period (ES) and again reversal period (RES). For example the last execution has an ES or 1 (it bought at almost the top) but has a RES score of 75 as the market continues to go up.

Figure 2, second chart considers a similar scenario but much less successful algo. The executions are skewed toward the end of the period and hence the ES score is quite low. The ES score is quite high as the market was trending upwards. That would a typical scenario for an algo which was passive early on trying to collect spread in the market. However limit buy order are likely to be unsuccessful early if the market is moving away. So the algo likely switched to more aggressive style late in the game. The market run way from it and the ES is quite low. However there was no opportunity to buy at a better level post algo execution. Therefore, algo would have benefitted from being more aggressive either in terms of time schedule or in terms of not trying to collect the spread early on.

Figure 2: Algo Scoring Scenarios.

Figure 3 presents the remaining two scenarios. Top figure presents what should be a relatively easy case of buying on a falling marketing. The algo does pretty good job in buying late in the execution process. If the time horizon was fixed than the algo does a pretty good job. If he time horizon is not fixed the algo would have been better off adopting a less aggressive (more patient) execution style

Figure 3, bottom figure presents a scenario of relatively poor execution. The market is going up but the algo choses to buy late those getting unfavorable execution prices. Worse than that, immediately after algo stops executing, the market drops. Therefore, on a surface the algo would have been better off being either more aggressive (so buy at lower prices in the beginning) or more patient (hence buying at the reverse). However, in this specific case it is just a bad execution style as early activity of the algo (without fill, probably just order placement) has been read by other market participants. This is called signaling risk. A sharp trade read the information leakage in the algo and likely stepped in before thus leading to market upswing. The algo was more likely buying from this sharp trader. And the market want up sharply as information leakage was read by other players as well. As long as the algo stopped buy the net flow became sell (as sharp trades were selling taking profit) and the market collapses. So it is unlikely in this scenario that doing simplying doing same and waiting more (or less) would have changed much. Signaling risk is a big problem with algos.

Figure 3: Algo Scoring Scenarios, Part 2

Algo Scorecard – Cross Sectional Analysis

The following 4 quadrant map (Figure 4) is the natural extension of the analysis done in the previous sections. Each algo has two scores (ES and RES) and hence can be mapped into two dimensional space. This visualization is instrumental in identifying good and problematic patterns in algo performance. This particular visualization is by algo type although Tradefeedr API can be used to aggregate the results by currency pair, broker and any other variable of interest. From the visualization below it can be noted that:

  • All sweep also have bad reversal scores suggesting (naturally) a strong market impact.
  • Algos in good execution quadrant are mostly “patient” algos such as Liquidity Seeker or Passive Splitter.
Figure 4: Scores and Reversal Scores

It seem that despite the claim that ES is horizon independent it actually favors longer term, patient algos.

As implementation shortfall is a classical algo performance measure it is natural to as how the execution score compares with it. Figure 5 does exactly that. ES is completely different measure so we would not expect much correlation. However it is useful to check. A natural thinking would be that algos with negative implementation shortfall (such as those lucky ones buying in the falling marketing) are bound to have a good execution score. However, it is the case. As we can see below algos with negative shortfall can have a very poor execution score. That would be the case when algos buying on a falling market does it too quickly for example not fully utilizing its alpha (or luck).

Figure 5: Score vs Shortfall

Local Scores and Adverse Selection

If expressed simply (a little simplistically) adverse selection is like trading against a smarted trader with better alpha. Algo orders sent to brokers can end up being executed across different venues. It is by design as a trader allegedly uses (and pays for) the broker expertise to place the order around. However sometimes the orders are placed in pools which are subject to adverse selection. Figure 6 should the example. The BUY order is placed into mid pool to save on spread crossing costs. However, lots of sharp traders ping the pool for the presence of this BUY orders. Once they identify the order is present they use short term alpha to “prey” on the buy order. The submit sell order in a mid pool each time the price moves up. They can also use order placement technics on reference exchanges (exchanges used to construct mid for this pool) to move the mid upward. The exact mechanics of the process is not essential for this note. However, the end result would be buying local tops during the duration of the algo. This is common and well known scenario. Therefore, it is important to track it.

Figure 6: Adverse Selection in Mid Pool

As per picture above adverse selection would be best identify by adverse price movement immediately after the execution. If the resting order is aiming to buy, the price moved post the fill would be down delivering immediate regret to the trader. A simple and natural way to measure it is to measure the price move after the fill. Also something it is useful to measure the local score AROUND the fill time, that it before and after the fill. A combination of those two local scores (overall and after the fill) would allow to identify adverse selection better.

As can be seem from Figure 6 all fills happen at local tops if the “local” area is wide enough. However, some fills actually happen in local bottom delivering a very good local performance. However, overall a bad execution score (ES, from the previous section) would suggest a presence of adverse selection especially if the algo is passive.

The results for different algo are presented on Figure 7. Sweep algos has good local rank but not good execution scores. This is because those type of algos aggress (price take) and hence do not allow other trades to adversely select them. Normally a combination of poor execution score and poor local rank suggest adverse selection. So all algo in the bottom left quadrant are suspect.

Figure 7: Local Ranks & Execution Scores

Finally, local ranks can be tracked for individual order over time. It is next to impossible to adversely select algo straight away. Sharp trades would have to ping the mid pool for the presence of large buy order before attempting to manipulate the mid and take advantage of the order over time. Hence, one would expect for local scores (indicators for adverse selection) to become progressively worse over time (or present a visible regime-change where local ranks just become worse). Figure 8 shows a typical evolution of local scores.

They start around 50 handle. Around 8:30, 8:35 and 8:40 there are 3 very bad fills. That would be suspect but there are also very good fills counterbalancing the bad once. Therefore in this case it is unlikely that the order is adversely selected.

Figure 8: Quality of Execution over(Local Rank) Time

Statistical Analysis

As the Execution Score bounded between 0 and 100 it is the very convenient to apply standard statistical techniques to test whether Score from two different broker are statistically significant. As with most statistical test the number of observations is a key determinant.

The top chart of Figure 9 shows 200 algo runs (100 each) between two brokers. While the results look similar the statistically evidence suggest (with 97% probability) that broker 2 delivers better scores than broker 2.

Bottom chart shows a different picture. While it looks like that Broker 2 deliver much better results the reality is that we do not have enough observations to conclude for sure. Some continuous reweighted techniques (such as multi-arm bandits) can be applied to optimal algo placement but this is beyond the scope of this note.

Figure 9: Statistical Tests


This note presents a number of algo-scoring techniques. While they are quite data-intensive, they are available on Tradefeedr platform and via Tradefeedr API for proprietary investigations. There are likely to give insights into algo signaling risk and adverse selection in order placement. Those metrics are orthogonal to implementation shortfall: they contain additional information. Even if trader order are benchmarked using implementation shortfall those additional metrics allow for better optimization of algo-selection. They allow to identify potential information leakage and whether the order has been places in the wrong pool. Those scenarios are guarantee to take away from the performance of even the smartest algo.

As a benchmark those scores are useful but they have drawbacks as well. As they are scale independent they can give an algo dropping 100bps and an algo dropping 1bps the same score (if the second one was trades on a slow market and the first on a run-away market). If this is not desirable they would have to be complemented by some basis point cost measures.


Kissell, 2013, The Science of Algorithmic Trading and Portfolio Management, 2013

2008, Transaction Cost Analysis A-Z, An EDHEC Risk and Asset Management Research Centre Publication