Problem Setup

We assume the FX market participant (FXMP hereafter, client of Tradefeedr) is trading with several liquidity providers (LP) via FX aggregator setup (see Definitions). FXMP is a price taker. He observes FX quotes in real time from all LPs are selects the best quote to trade (highest bid to sell, lowest offer to buy). We assume for now that his trade size is smaller than the size attached of quotes so he does not have to hit several LP at once (so-called sweep). FXMP can be filled, partially filled or rejected as a results of his order.

FXMP is looking for a way to identify latency in his stack and the costs associated with it.

The study assume the FXMP market data and trade messages (order, fills, rejects) are loaded in the Tradefeedr Platform.

The prices and quantities observed by FXMP can be schematically demonstrated on Figure 1.

Figure 1: Liquidity Stack

 

 

Liquidity Stack Optimization

We have considered venue analysis before. It is a comparison of spread paid and other LP execution metrics across LPs. We noticed that it is not very constructive to just compare spread paid across LPs because every time you trade with LP it is supposedly the best price. At the same time LP (or venue) analysis can be a very useful first step to understand how to create an alternative version of LP stack for AB testing.

Improving overall liquidity stack should be the end goal of liquidity optimization. The typical approach can be similar to the one used by other businesses: create two alternative models (liquidity stacks) and apply them in a randomized fashion to explore which model is the best. 

For example, alternative stack excluding LPs with the following characterizes can be constructed:

  • If an LP demonstrates a high reject ratio and high reject cost then an alternative liquidity stack without this LP can be setup.
  • High market impact LP can also be candidates to be removed as having your market impact amplified by awkward execution is not good.

LP which takes too long to come back with order confirmation (latency) should also be considered as a candidate for removal as it locks resources and potentially stops alternative executions.

A/B setup and objective function

The A/B setup in FX liquidity optimization is different from a standard setup where two alternative model are considered (standard example would be different versions website accepting clients in randomized fashion). In LP stack we normally add or remove one LP (say the one which we think is doing a bad job handling our flow because for example rejections are too frequent)

Figure 1: Experiment Design

 

Then in a spirit of standard A/B testing FX trader sends order to either pool A or pool B in randomized way. Randomization is crucial to ensure fair testing. There are several way to design an experiment.

  • Blind testing. In this way LP1-4 do not know about the existence of Pools A and B. Therefore the pricing from A and B is going to be exactly the same apart from LP4. Therefore, unless LP4 has the best price the execution between two streams in exactly the same (if we ignore the case of sweeps for the time being assuming the best bid/offer quantity is enough to accommodate out order). Therefore the only difference between the two LP stacks comparing execution of LP4 versus execution of the best price of L1-3.
  • Testing with Communication. In this scenario LP1-4 are informed that the pools A has 3 LPs and Pool 4 has 4 LPs. They may or may not adjust the pricing. In this case it may or may not be the case that we are testing best price of LP4 against the prices of L1-3.

What we are comparing in the above is how effectively LP1-3 replace LP4 price in the time when LP4 price is the best of the bunch. There are two ways for LP1-3 to deliver a better service. First, if LP4 rejects the order frequently and the market moves against the trade. Second, if the very presence of LP4 increase the market impact of the trader and make it look more toxic to other LPs. This is only possible in sweep style execution where order is allocated into several LPs.

In this note we consider rejection case and hence we optimize the Effective Spread defined as

Effective Spread = Spread Paid + Reject Ratio*Reject Cost

Therefore we calculate Effective Spread metric for both pools and test whether the results is statistically different to make a decision. We can approach the experiment in two ways which can be broadly described as in A/B testing literature as frequentist and Bayesian.

Full Sample Approach

In the approach we fix a number of tries (sent order in our case) before the experiment. Once the given number of orders/trades is achieved we can look at the average effective spreads achieved by each Pool and conclude whether the results are different statistically. We will use the following definitions which broadly follow any standard statistics book.

  • Null hypothesis is the hypothesis that there is no difference in performance between two pools. That is that the effective spreads achieved by two pools are the same in statistical sense.

  • Confidence:
    is the probability of concluding that the pools are the same if there are really the same. In other words, this is the probably of correctly identifying that the pools are the same.
  • Power: is the probability of deciding the pools are the different (either A better than B or vice versa) if there are really the different. This is the probably of correctly identifying the difference between pools.

The natural question is how many observations we need before we can drop LP4 or alternatively conclude that LP4 is needed. Figure below gives a broad answer for fixed confidence and power.

  • If Spread volatility is 10$/m, to tell apart the measured difference of 5$/m required about 70 trades
  • For Spread volatility is 30$/m same exercise would require over 500 trades.
  • However if the measured difference is 10$/m that we only need approximate 170 trades to tells apart two pools (with 30$/m spread volatility)

Dynamic Updating

Alternative way to conduct the experiment is a Bayesian update. In this scenario we start with the prior (original view) that pools are the same but we have not confidence in this (so-called non-informative prior). Alternatively we can built some pre-existing beliefs into a prior if there is a reason to believe that one pool is better than the other or we are more confidence about the expected spread in one pool. As we accumulate more observations we can tell apart the performance difference with more confidence (Bayesian posterior above the mean or “confidence level”). The process is shown in the Figure below. In this example we can pretty quickly conclude that poolA is inferior to poolB (spread paid is higher)

Different Objective Functions: Comparing Sweeps

Optimizing Effective Spread is just one application. A/B testing can be applied to market impact. A natural application is the analysis of Sweep trades. By sweep trade we mean a trade done against a number of LPS simultaneously. The object is to save on the spread paid. For example a trader who has 20 million to execution can ask for a price in 5 million from 4 LPs and hit them simultaneously.

LPs may have a problem with the sweep trading. As per example above they price the trade on the basis they would have to hedge 5 million and hence expect a market move corresponding to 5 million trade. If instead 20 million trade hits the market, the adverse market move can be much bigger. If all LPs in the stack internalize (use balance sheet to absorb the transaction) than the market impact may be small. This is because for some LPs the transaction might actually result in offsetting trade or will not hit balance sheet risk limits and no market hedging will be necessary. However, if one of the LPs is hedging his 5 million trade directly into the market (externalizing) this can generate an adverse market impact. Also externalizing LP can be very fast thus pushing the market before the remaining LP in the stack have a chance to react. In this scenario the remaining LPs would react by increasing the spread to the trader for all of his future transaction assuming each of them is a sweep. Alternatively they may delay the response and reject if the see the market move which is adverse to them. Both scenarios are not good for a trader.

Normally it is not obvious whether an LP is externalizing or internalizing. Consider Figure 1 again and assume we have a reason to believe that L4 is externalizing. Then we can consider two pools and run the A/B test as per above but considering market impact, not effective spread as objective function. The result might be like this chart below.

Conclusion

FX market participants trade in an aggregator setup as routes to the best visible price. Therefore direct comparison across LPs is not directly interpretable. This is simply because measured LPs can show best price in different condition. Consequently just because Spread Paid of LP1 is greater than that of LP2 does not mean that LP1 provides a bad service.

Therefore, we have to look at the liquidity stack (set of LPs) as a whole. Individual LP analysis (also known as venue analysis) can be instrumental to form the liquidity stack properly. Once the liquidity stack is formed standard statistical tools can be applied to test alternative versions. Standard statistically technics exists to test whether there is enough evidence that two liquidity stack are different. If they are certain LPs can be dropped or added.

LPs internal risk and order management system change. Therefore, constant testing would be required just like in other industries. Constant tracking and testing is only possible when analytics platform allows near real-time feedback loop. This is what Tradefeedr Platform is built for.