Dean Markwick

The Joys of Free Cloudflare

2026-05-18T00:00:00+00:00

I’ve been tinkering around with the free tier on Cloudflare and have managed to churn out a couple of side projects. Of course, I had a little help with various AI systems, but it was assisted rather than vibe-coded.

Enjoy these types of posts? Then sign up for my newsletter.

Most of my work life is spent in Jupyter notebooks looking at data. Most of my blogging is looking at data using Julia or Python. Every now and then I branch out and build something other people can use like Crypto Liquidity Metrics. It’s a simple Netlify-hosted website with pretty simple HTML and JavaScript, but it doesn’t really use ‘the cloud’ in any meaningful way. I want to start expanding my horizons and using more of what’s available to build things.

I’m not sure how I ended up on Cloudflare but the fact you can use things without entering any payment details reassures me that I won’t lose my house if one of these things gets popular. Although once you read what I’ve built you’ll see there is very little chance they take off!

Cloudflare Pages

My first little project is a personal train timetable. I live in a place where there are fast trains that skip out lots of stops. However, they are only at specific times throughout the day. I would normally use the National Rail app and have to scroll through the regular slow trains and keep my eye out for a fast one. What I needed was a way to filter the trains by platform as the fast ones left from their own platform. For this I needed a way of getting train data.

Rail departure information is made available for free through the Darwin Data Feeds. You apply for a key and off you go. However, it’s a bit complicated to query as it’s not a REST API. Thankfully, someone has done the hard work and built a REST API for the same data. This API is called huxley2 and is an open-source, self-hosted version. You still need a token from National Rail but a REST API is much easier to work with. I query the API from the browser, filter based on the platform and return the resulting trains. So all in, pretty easy. I let Claude style the front end and it’s all done.

I now needed a place to host this single page app. This is where Cloudflare Pages come in. You can upload the HTML and JavaScript files and it’s done. Of course because I’m a professional programmer I didn’t do that, I connected it to my GitHub and every time I push a commit it rebuilds the website.

I save this webpage to my phone’s homepage and job done. I open the link and it tells me the next fast train home. Now obviously, this is only useful for me and a few family members. So no chance of it taking off! Now I could add cookies, let you choose the filters for the trains you are interested in, and make it more applicable to a wider audience, but there’s not really much upside in building that out. This stays personal for now.

Building out a 1 page website with some HTML and JavaScript is simple. I wanted to ramp it up a little more and see what else Cloudflare can deliver for free.

Cloudflare Workers and D1 SQL Database

CBOE publishes their daily FX volumes as a JSON file on their website. It only contains the last 30 days, so I wanted to save this down each day to build out my own personal history. It’s trivial to write the JSON parsing. This is a problem around automation—I don’t want the code on my laptop where I have to make sure the script runs manually. Cloudflare provides Workers, which gives you a short burst of compute to do something interesting. For me, I use this to run through the JSON data and get it ready for saving down.

async scheduled(request, env) {

    const spotURL = "https://cdn.cboe.com/fx/spotInstrumentVolume.json";
    const ndfURL = "https://cdn.cboe.com/fx/sefInstrumentVolume.json"

    await saveData(spotURL, env);
    await saveData(ndfURL, env);

    return new Response("Data saved successfully");
}

Now, to save it down I want a database. I don’t want to be saving it to a CSV; I want something better. A database gives us a better way to query the data immediately. D1 is Cloudflare’s implementation of SQLite, and it’s trivial to bind the worker to the database, which means the worker can access and use the database as needed. Of course, you have to define the tables and set the keys, but for something as simple as this data (date, sym, volume), it’s trivial.

async function saveData(url, env) {
    const dailyData = await getDailyData(url);
    const volumes = dailyData.map(x => parseData(x));
    const statements = volumes.map(item => bindStatement(item, env)).flat();
    await env.DB.batch(statements);
}

This gives us all the data into a database nicely. We set the schedule to run at midnight every night and it can do its thing. I check it each morning, and sure enough, the new data is there.

We can now think about building a small dashboard for this data. More HTML and JavaScript! For the frontend, I wanted to be more self-reliant and in control. After some conversations with Gemini and Claude, I settled on Pico CSS, which provides a clean style straight out of the box. For the charts, I used Chart.js, the most popular charting library according to the AI tools. For the table, I used Grid.js.

The flow is very simple. When the dashboard is loaded, it pulls the data into memory via a simple SQL query to the D1 database. I then plot the total volumes of the day (i.e., sum across the currency pairs), plot an individual currency chosen from a dropdown, and build a table that shows the top 10 highest volumes for yesterday and their volume relative to the 30-day average.

Only the first graph is shown as I don’t want a massive screenshot! But overall it looks very smart, and I’ve learned some new web development skills.

It should all stay under the free tier. If the data starts to get too big, I’ll have to make better use of caching and no longer just dump everything into memory immediately. But for now, it’s another job done. Overall, I’m pretty proud. I’m building up a nice dataset in the cloud and a slick frontend in front of the data. All in a weekend’s work.

If you haven’t already, sign up and start tinkering yourself. Start talking to the AI tools (Claude/Gemini/ChatGPT) to sketch out how to approach something, and just start small. The Cloudflare Docs are great and their command line tool Wrangler also makes things very easy to setup locally.

A Fundamental FX Factor Model

2026-04-19T00:00:00+00:00

I’ve been reading The Elements of Quantitative Investing to branch out from my usual high-frequency finance to something slower or mid-frequency. Factor models are a big part of this quant topic, and I’m trying to get a deeper understanding by following the book and applying the process to FX data.

Enjoy these types of posts? Then sign up for my newsletter.

Factor models provide a mechanism for explaining returns. They are multivariate models that break down the features that drive an asset’s performance. The key assumption is that each individual asset’s return is not independent of the others, but there are common factors that drive returns and an asset’s sensitivity to those factors drives its returns. In equities, you’ll hear of value and momentum factors, and there are even ETFs that you can invest in for exposure to those factors. We want to come up with something similar in the FX space.

A factor model attempts to explain asset-universe return behaviour. From this you can start to build portfolios, decompose risk across the different factors, and even look at returns not explained by the factors, which in turn becomes alpha research. These types of models are the foundation of many other quant topics, so it’s good to get a handle on them.

I will start by getting the data and the features into place. Part of that is using the DXY functions from my previous post (Making Sense of the DXY) and adding some new ETF data. I’ll then run two models: one to explain price moves over time and another to explain price moves between currencies themselves. Using the models in tandem forms the FX factor model. We will then explore the specific factors and how you can build factor portfolios from different currency pairs.

FX vs Equities

Typically, factor models and most academic research in this field use equity data. However, I am an FX man at heart (for my sins?) and so I want to use currency data. This restricts the universe to about 30 assets rather than the 2,000 US stocks. Therefore, to overcome the small sample size, we will use weekly data rather than monthly.

Monthly data will remove as much “trading” noise as possible. You want the price moves to reflect the underlying performance of the asset and not the day-to-day flows and execution noise. Daily data isn’t an option as FX trades 24 hours a day but the ETFs only trade during the regular market hours. This presents a synchronisation problem. A currency move could happen overnight based on some headlines hours before the ETF is even open for trading. So we will split the difference and use weekly data. This should give us enough data while keeping the overall price movements based on the same time period and information.

Another problem with FX data is a lack of descriptive features. Again, in equities, you have the financial reports of a company, things like price to book and market capitilisation but these have no equivalent in FX so we need to a different way of coming up with characteristics. For this I’ll be using ETFs to try and see what macro features might move the currency pairs.

The Data Pipeline

We are bringing together ETF, currency, and DXY data. This is all simple to pull from twelvedata.

Downloading and Preparing the ETF Data

I’ll be using different macro ETFs as general factors. These four ETFs proxy the major macro drivers:

VTI (Risk Appetite): When stocks rally, investors move from cash to risk assets, weakening the dollar (capital flows out of US). When stocks fall, the reverse.
BND (Interest Rates): Bond prices move inversely to rates. Rising US rates strengthen the dollar; falling rates weaken it.
GLD (Inflation/Uncertainty): Gold rallies when inflation forecasts rise or geopolitical risk spikes, often correlated with currency volatility.
USO (Commodity Risk): Oil is priced in dollars. Oil rallies often reflect emerging market demand, shifting currency flows.

Each of these ETFs forms a standard macro-economic indicator that I suspect currencies might respond to. You could go further and break down the stocks into different regions or sizes (small-cap, large-cap etc.) and likewise for the bonds, which could be broken down by country. But for now, these are a good high-level weather vane for how the global economy is moving.

We will be using the same functions from my previous post, just updating it to save at a weekly frequency. Then we load those files, combine everything, and calculate the log returns.

etfs = ["GLD", "BND", "VTI", "USO"]
etfDF = [load_data(etf) for etf in etfs]
etfDF = pl.concat(etfDF)
etfDF = etfDF.sort("datetime")

etfDF = etfDF.with_columns(
    pl.col("close").log().diff().over("ccy").alias("log_return"))

We need to normalise the log returns by rolling volatility. To calculate the volatility, we take the standard deviation of the returns in a 52-week period.

etfDF = etfDF.with_columns(pl.col("log_return").rolling_std(window_size=52).over("ccy").alias("vol_52"))
etfDF = etfDF.with_columns((pl.col("log_return")/pl.col("vol_52")).alias("log_return_scaled"))


etfDF = etfDF.select(pl.col("datetime"), pl.col("ccy"), pl.col("log_return_scaled"))
etfDF = etfDF.pivot(values="log_return_scaled", index="datetime", columns="ccy")

When we plot these normalised ETF returns, they line up with what we expect.

We need to ensure the different ETFs aren’t overly correlated. Highly correlated ETF returns would indicate redundant information, and multicollinearity in our regression analysis would lead to unstable coefficient estimates. Ideally, the ETF returns should capture distinct dimensions of macro risk.

Polars makes it easy to calculate the correlations over time with the rolling_corr function.

etfDFCorr = etfDF.with_columns(
    pl.rolling_corr("VTI", "USO", window_size=52).alias("VTI_USO_corr"),
    pl.rolling_corr("VTI", "BND", window_size=52).alias("VTI_BND_corr"),
    pl.rolling_corr("VTI", "GLD", window_size=52).alias("VTI_GLD_corr"),
    pl.rolling_corr("USO", "GLD", window_size=52).alias("USO_GLD_corr")
    ).drop_nulls()

Plotting these correlations gives us confidence that everything is reasonable.

At worst, we see a 0.6 correlation, which is just about acceptable as it only occurs for a brief period.

Sidenote: it’s interesting how stock–bond correlation hasn’t been negative since 2020. Thinking out loud, but that must have some big consequences for the risk profile of the 60/40 allocation. Another post for another day prehaps.

Now, onto the FX data.

Getting the FX + DXY Data

Again, following my last post I’m now just pulling the weekly data instead of daily. I’ve also wrapped the DXY calculations from my previous post (Making Sense of the DXY) into a nice function.

We load across the 33 currencies available.

dfs = [load_data(ccy) for ccy in ccys]
df = pl.concat(dfs)
df = df.sort("datetime")
df = df.drop("open", "high", "low")

Then join the DXY and ETF data.

dxy = load_dxy()
df = df.join(dxy.select(pl.col("datetime"), pl.col("dxy_close")), on="datetime", how="left")
df = df.join(etfDF, on="datetime", how="left")

We then calculate the returns and the 1-month, 6-month and 1-year momentum factors.

df = df.with_columns(
    pl.col("close").log().diff().over("ccy").alias("log_return"),
    pl.col("dxy_close").log().diff().over("ccy").alias("dxy_log_return"),
    pl.col("close").log().diff(n=4).shift(1).over("ccy").alias("log_return_4"),
    pl.col("close").log().diff(n=26).shift(1).over("ccy").alias("log_return_26"),
    pl.col("close").log().diff(n=52).shift(1).over("ccy").alias("log_return_52")
)

Like the ETF returns we also want to normalise the currency returns and DXY returns by their rolling volatility.

df = df.with_columns(pl.col("log_return").rolling_std(window_size=52).over("ccy").alias("vol_52"))
df = df.with_columns((pl.col("log_return")/pl.col("vol_52")).alias("log_return_scaled"))

df = df.with_columns(pl.col("dxy_log_return").rolling_std(window_size=52).over("ccy").alias("dxy_vol_52"))
df = df.with_columns((pl.col("dxy_log_return")/pl.col("dxy_vol_52")).alias("dxy_log_return_scaled"))

We normalise the momentum features in the same way.

df = df.with_columns((pl.col("log_return_4")/pl.col("log_return_4_vol_52")).alias("log_return_4_scaled"))
df = df.with_columns((pl.col("log_return_26")/pl.col("log_return_26_vol_52")).alias("log_return_26_scaled"))
df = df.with_columns((pl.col("log_return_52")/pl.col("log_return_52_vol_52")).alias("log_return_52_scaled"))

It is also recommended you winsorise the return data. This involves replacing the extreme values with the 5% quantiles and a simple polars function. This reduces the influence of outliers in the models and just keeps the data a bit cleaner.

cols = ["log_return_scaled", "dxy_log_return_scaled", "log_return_4_scaled", "log_return_26_scaled", "log_return_52_scaled",
       "GLD", "BND", "VTI", "USO"]

df = df.with_columns([
    pl.col(c).clip(
        pl.col(c).quantile(0.05),
        pl.col(c).quantile(0.95)
    ).alias(f"{c}_clipped")
    for c in cols
])

With the data collected we can now move on to some modelling.

FX Return Characteristics

We need to build a dataset of characteristics per currency pair. These are potential features that will explain an individual currency’s return over time.

Mathematically

\[R = \beta X,\]

where $R$ is the currency return, $X$ are the returns from other assets and we want to estimate $\beta$. If a currency is sensitive to oil, then it will have some element of dependence on the oil ETF USO and $\beta _\text{USO}$ will capture that effect.

$X$ contains the weekly values of

Weekly DXY return
Global stocks (VTI)
Global bonds (AGG)
Oil (USO)
Gold (GLD)
The currency’s momentum at 1, 6 and 12 month intervals.

The model is fitted per currency individually as a rolling one-year regression. We use volatility-normalised returns so the $\beta$s are more stable over time.

import statsmodels.formula.api as smf
from statsmodels.regression.rolling import RollingOLS

allParams = []
# sort the subdata by datetime to ensure the rolling regression works correctly
for ccy in ccys:
    subDF = df.filter(pl.col("ccy") == ccy).drop_nulls().sort("datetime")
    mod = RollingOLS.from_formula("log_return_scaled_clipped ~ dxy_log_return_scaled_clipped + GLD_clipped + BND_clipped + VTI_clipped + USO_clipped + log_return_4_scaled_clipped + log_return_26_scaled_clipped + log_return_52_scaled_clipped", 
                window = 52,
                data=subDF).fit()
    
    paramDF = pl.from_pandas(mod.params)
    paramDF = paramDF.with_columns(ccy=pl.lit(ccy), 
                                   datetime = subDF["datetime"],
                                   log_return = subDF["log_return"],
                                   log_return_prev = subDF["log_return"].shift(1), 
                                   r2 = mod.rsquared_adj.values,
                                   vol_52 = subDF["vol_52"])
    
    allParams.append(paramDF)


allParams = pl.concat(allParams).drop_nulls().sort("datetime")

We save the $\beta$ time series, the $R^2$ values, and the volatility.

To make sure the regression is doing a good job for all the time periods we plot the $R^2$ for a few currencies.

They are all the right order of magnitude with CNH and MXN being the worst but still manageable.

If we average over currency pairs and time, we get a rough understanding of the $\beta$ values.

betaSummary = (
    allParams
    .unpivot(index=["datetime", "ccy"])
    .group_by("variable")
    .agg(
        pl.col("value").mean().alias("mean"),
        pl.col("value").std().alias("std"),
        pl.col("value").min().alias("min"),
        pl.col("value").max().alias("max")
    )
    .sort("mean", descending=True)
)
betaSummary.filter(pl.col("variable").str.contains("dxy_log_return|log_return_4|log_return_26|log_return_52|VTI|BND|GLD|USO|Intercept"))

Variable	Mean	Std	Min	Max
dxy_log_return_scaled	0.485	0.327	-0.452	1.34
Intercept	0.0853	0.385	-2.26	25.1
USO	-0.0222	0.166	-0.942	0.730
BND	-0.0267	0.176	-0.970	0.788
log_return_4_scaled	-0.0341	0.146	-0.864	1.69
log_return_52_scaled	-0.0525	0.192	-1.92	0.932
log_return_26_scaled	-0.0591	0.180	-1.50	0.895
GLD	-0.0636	0.179	-1.06	0.599
VTI	-0.149	0.206	-0.982	0.844

DXY is the main driver, with a negative dependence on VTI. This makes sense and lines up with our beliefs: if stocks are doing badly, it’s likely people sold them for cash, and likewise when stocks are doing well people are moving from cash into equities. This helps confirm VTI as a general risk-on/risk-off factor.

It’s frustrating that the intercept has a large average $\beta$ value, as it means we are missing drivers of currency returns. An obvious omission is the carry factor and how interest rates across countries drive currency returns. Annoyingly, it’s hard to get free data for that, so we will have to make do for now.

We’ve now got a picture of how much each currency depends on macro factors but this tells us about individual currencies in isolation. We now need to know if differences in these sensitivities explain why some pairs outperform others.

To answer that, we regress across currency pairs at each point in time. This is known as cross sectional regression.

Cross Sectional Regression for Currency Returns

From the first regression we have currency characteristics over time. For the cross-sectional regression, we now use all the currencies per week and then run the regression to see if the sensitivity to the factors (the $\beta$s) explains the returns.

We also add in a currency group factor as an additional characteristic that classifies broad groups of currency pairs.

allParams = allParams.with_columns(
    pl.col("ccy").map_elements(ccy_group_map).alias("ccyGroup")
    )  

We normalise the $\beta$’s across the currency pairs which helps keep everything comparable.

This time mathematically,

\[R = \lambda B,\]

where $R$ are the currency returns for a given week and $B$ is the matrix of normalised $\beta$ values and the currency group indicator. We are using simple weighted regression to estimate $\lambda$. The weights use the inverse of the volatility to reduce the impact of high volatility pairs.

allParams2 = []

factor_cols = ["dxy_log_return_scaled_clipped", "GLD_clipped", "BND_clipped", "VTI_clipped", "USO_clipped", "log_return_4_scaled_clipped", "log_return_26_scaled_clipped", "log_return_52_scaled_clipped"]

for (i, dt) in enumerate(allParams["datetime"].unique()):
    subDF = allParams.filter(pl.col("datetime") == dt)

    subDF = subDF.with_columns([
    ((pl.col(c) - pl.col(c).mean().over("datetime")) / 
      pl.col(c).std().over("datetime")).alias(f"{c}_scaled")
    for c in factor_cols
    ])

    csr = smf.wls("log_return_prev ~ ccyGroup + dxy_log_return_scaled_clipped_scaled + GLD_clipped_scaled + BND_clipped_scaled + VTI_clipped_scaled + USO_clipped_scaled + log_return_4_scaled_clipped_scaled + log_return_26_scaled_clipped_scaled + log_return_52_scaled_clipped_scaled", 
                  data=subDF, weights=1/(subDF["vol_52"]**2)).fit()

    paramsRes = pl.DataFrame(data = [[x] for x in csr.params.values], 
             schema=list(csr.params.index.values))

    paramsRes = paramsRes.with_columns(datetime=pl.lit(dt))
    allParams2.append(paramsRes)

allParams2 = pl.concat(allParams2).drop_nulls().sort("datetime")
allParams2 = allParams2.filter(pl.col("datetime") != pl.date(2009,4, 14))
allParams2 = allParams2.filter(pl.col("datetime") != pl.date(2009,4, 15))
allParams2 = allParams2.filter(pl.col("datetime") != pl.date(2009,4, 16))

To check the performance of the regression, we plot the $R^2$ over time.

Again noisy, but the rolling average moves around 0.4, which is a respectable value.

We then calculate the t-stats by taking the average fitted parameters and dividing by the standard error.

variable	avg	std	N	std_error	t_stat
ccyGroup[T.EM]	0.00113	0.00920	797	0.000326	3.46
log_return_4_scaled_clipped_scaled	0.000266	0.00240	797	0.000085	3.13
log_return_26_scaled_clipped_scaled	0.000324	0.00295	797	0.000105	3.10
ccyGroup[T.SCANDI]	0.000586	0.00982	797	0.000348	1.68
ccyGroup[T.G7]	0.000452	0.00776	797	0.000275	1.64
ccyGroup[T.CEMA]	0.000584	0.0101	797	0.000358	1.63
log_return_52_scaled_clipped_scaled	0.000135	0.00291	797	0.000103	1.31
ccyGroup[T.LATAM]	0.000404	0.00949	797	0.000336	1.20
VTI_clipped_scaled	0.000102	0.00319	797	0.000113	0.906
GLD_clipped_scaled	0.0000670	0.00284	797	0.000101	0.668
Intercept	0.0000620	0.00520	797	0.000184	0.335
USO_clipped_scaled	-0.00000200	0.00285	797	0.000101	-0.0175
BND_clipped_scaled	-0.0000840	0.00289	797	0.000102	-0.822
dxy_log_return_scaled_clipped_scaled	-0.000340	0.00398	797	0.000141	-2.42

Anything over 2 is deemed significant, which gives us:

EM pairs
1-month momentum
6-month momentum
DXY return

We can look at the returns of these factors (just the significant ones).

The EM factor has the best return, and all the factor returns are positive except the DXY factor. For the EM factor, the coefficients are significant and positive; therefore we interpret this as investors demanding a premium return for holding EM pairs — at least the ones I’ve tagged as EM. Similarly, the two momentum factors command a similar premium.

But what currency pairs do you need to buy and sell to get these factor returns?

How to Build the Factor Portfolios

After fitting the cross sectional regression model we arrive at $\hat{\lambda}$ which are the the factor returns. What we now want are the currency weights that will get us to the factor returns

\[\hat{\lambda} = w R,\]

after some maths you arrive at

\[w = (B^TB)^{−1}B^T.\]

Easy enough to translate into Python.

ccyWeights = []

for dt in allParams["datetime"].unique():
    betas = allParams.select(pl.exclude("log_return", "log_return_prev", "ccyGroup", "r2", "vol_52"))
    betas = betas.filter(pl.col("datetime") == dt)
    B = betas.select(pl.exclude("datetime", "ccy")).to_numpy()

    W = np.linalg.solve(B.T @ B, B.T)

    res = pl.DataFrame(W, schema=betas["ccy"].to_list()).with_columns(
        pl.Series(name="factor", values=betas.select(pl.exclude("datetime", "ccy")).columns),
        datetime = betas["datetime"][0]
        ).unpivot(index=["datetime","factor"])

    ccyWeights.append(res)

ccyWeights = pl.concat(ccyWeights)

As we are using the $\beta$ matrix, we get a time series of weights. The currencies’ underlying sensitivities to the different features change over time, meaning that they will undergo different weighting in the factor portfolios over time too.

After running that calculation we get to the currency rates.

For the momentum factor, EUR hugs zero more than the other selected currencies.

If we look at the DXY factor and the currency weights for 2026 to have a more realistic view of how they are changing, we can see much more stability.

Small changes around EUR; CNH has hovered around zero; TWD has gone long since February; and AUD has picked up a short position. Given these are weekly weights, it’s good that there aren’t any wild swings, since big changes in positioning would lead to larger transaction costs.

Conclusion

Done. We’ve built a fundamental FX factor model. It’s involved, with lots of different ways to fall over, but we made it. Three factors were significant: 1-month momentum, 6-month momentum, DXY, and the EM factor. The smaller size of the FX universe compared to equities means there is less data through time and across assets. Also, the underlying $\beta$s are noisy given the tighter return ranges compared to equities. There is also a case that regime changes average things out to zero, but it’s hard to see that in the data. However, this model can help in hedging and explaining risk, but not serve as a source of expected returns.

If you’ve looked at FX factor models before, you’ll realise I’ve missed a pretty significant factor — carry. It’s very hard to get free data to calculate the carry factor across the full universe of currencies. I’m saving it for another day for a smaller set of pairs where there is data.

So I hope this has been a good walkthrough and explainer on how to approach these factor models.

Making Sense of the DXY

2026-03-10T00:00:00+00:00

My day job is in quant trading, but there’s another fascinating world: quantitative investing. While I focus on latencies and execution, quant investors are busy building the most efficient portfolios and ensuring they extract pure alpha. Not one to stay in my lane, I’m using this blog post as an opportunity to dive into the world of quant investing and level up my knowledge.

Enjoy these types of posts? Then sign up for my newsletter.

Now most quant investing examples use equities as the underlying asset class, but I am an FX man, so will be replacing Apple and Microsoft with Euro’s and Yen. In some ways, this is easier; I just have to worry about 30-odd currencies as my investible universe compared to the thousands, if not hundreds of thousands, of different stocks. But in many ways it’s harder. What drives FX returns is at a much higher macro-level compared to an individual stock, and things like central banks changing interest rates, government policy changes are difficult to translate to a dataset compared to the price-to-book ratio of a stock. Still, we will give it a go.

In short, we want to better understand what can influence a currency’s return and produce a systematic model. This post is going to start with the basics, pulling in the right data, building a proxy to the overall FX market and ending with some basic regressions.

Twelve Data

For any quant investing model, we need to start with data. I’m always on the hunt for new sources, and twelvedata is the latest one to come across my radar. It has a generous free tier and, more importantly, has FX data across all the main pairs. Plus, it has a Python API that is dead simple to use. This makes it ideal for this string of posts.

from twelvedata import TDClient
 
td = TDClient(apikey=API_KEY)

td.time_series(
        symbol="USD/JPY",
        interval="1day",
        start_date="2025-01-01",
        end_date="2026-03-01",
        outputsize=5000).as_json()

This returns the daily timeseries of USDJPY since 2025 til March 2026, formatted as a JSON. Pretty simple to then go from that to a dataframe or however you want to deal with the data.

I don’t want to get blocked by the API limits, so I’m going to save the JSON objects locally.

def download_data(td, ccy, start_date, end_date):
    return td.time_series(
        symbol=f"USD/{ccy}",
        interval="1day",
        start_date=start_date,
        end_date=end_date,
        outputsize=5000
    )

def save_data(data, ccy):
    with open(f"data/{ccy}.json", "w") as f:
        json.dump(data.as_json(), f)

def download_and_save_data(td, ccy, start_date, end_date):
    file_path = f"data/{ccy}.json"
    if os.path.exists(file_path):
        print(f"File for {ccy} already exists. Skipping download.")
        return False
    print(f"Downloading data for {ccy}...")
    data = download_data(td, ccy, start_date, end_date)
    print(f"Saving data for {ccy}...")
    save_data(data, ccy)
    print(f"Data for {ccy} downloaded and saved successfully.")
    print("Sleeping for 8 seconds to avoid hitting API rate limits...")
    time.sleep(8)
    return True

Then, to load the data for a particular currency, we have a separate function.

def load_data(ccy):
    df = pl.read_json(f'data/{ccy}.json')
    df = df.with_columns(
        pl.col("datetime").cast(pl.Date),
        ccy=pl.lit(ccy),
        open=pl.col("open").cast(pl.Float64),
        high=pl.col("high").cast(pl.Float64),
        low=pl.col("low").cast(pl.Float64),
        close=pl.col("close").cast(pl.Float64))
    return df

To make sure everything is working nicely, let’s load and plot JPY.

df = load_data("JPY")

fig = go.Figure(data=go.Ohlc(x=df['datetime'],
                    open=df['open'],
                    high=df['high'],
                    low=df['low'],
                    close=df['close']))

fig.show()

All looks good, so now we can download whatever pair our heart desires. Which leads us to the next part.

What is the DXY?

In my mind, the DXY is the FX equivalent of the S&P500. It gives a general indication of how the dollar’s value is changing by using the exchange rate of EUR, JPY, CHF, GBP, CAD and SEK vs the dollar. It’s calculated as a geometric weighted average of these six currencies, and given the dollar’s dominance in the FX market, it works as a reasonable proxy of how the overall FX market is moving.

If we cast our mind back to the Capital Asset Pricing Model, an asset’s expected return can be broken down to its $\alpha$ active return and its sensitivity to the market, $r_m$. The strength of this sensitivity is $\beta$.

\[r_i = \alpha_i + \beta_i r_m\]

In equities, $r_i$ is a single stock and $r_m$ is some measure of the overall market return (S&P500, FTSE100, etc.). In FX, $r_i$ is an individual currency and $r_m$ is the DXY. This gives us an easy quantitative model to judge how a currency’s return is driven by the overall movement in the dollar.

Now you can either read the DXY from a market data source (expensive) or you can calculate it yourself.

Calculating the DXY

The formula for the DXY is in a pdf here: U.S. Dollar Index Contracts. It’s a simple weighted geometric average, so we just need the individual currency prices, and we can implement the calculation.

dfs = [load_data(ccy) for ccy in ["EUR", "JPY", "GBP", "CAD", "SEK", "CHF"]]
combined_df = pl.concat(dfs)
combined_df = combined_df.sort("datetime")

The more eagle-eyed readers might have noticed that I’m saving down some of the pairs the ‘wrong’ way round. USDEUR instead of EURUSD, USDGBP instead of GBPUSD, etc. This is because the DXY needs to flip everything into USD base terms, so in the weighting, some of the negatives are changed to positive.

dxyWeightings = {
    "EUR": 0.576,
    "JPY": 0.136,
    "GBP": 0.119,
    "CAD": 0.091,
    "SEK": 0.042,
    "CHF": 0.036,
    "const": 50.14348112}

weights_df = pl.DataFrame(list(dxyWeightings.items()), schema=["ccy","weight"])
combined_df = combined_df.join(weights_df, on="ccy", how="left")

So now we have a dataframe of the relevant prices joined by the weightings.

Step 1: exponentiate the 4 prices by the right power.

combined_df = combined_df.with_columns(
    (pl.col("open") ** pl.col("weight")).alias("open_weighted"),
    (pl.col("high") ** pl.col("weight")).alias("high_weighted"),
    (pl.col("low") ** pl.col("weight")).alias("low_weighted"),
    (pl.col("close") ** pl.col("weight")).alias("close_weighted")

Step 2: For each day, take the product and multiply it by the constant.

dxy = combined_df.group_by("datetime").agg(
    pl.col("open_weighted").product().alias("dxy_open"),
    pl.col("high_weighted").product().alias("dxy_high"),
    pl.col("low_weighted").product().alias("dxy_low"),
    pl.col("close_weighted").product().alias("dxy_close")
).with_columns(
    pl.col('dxy_open')*dxyWeightings["const"],
    pl.col('dxy_high')*dxyWeightings["const"],
    pl.col('dxy_low')*dxyWeightings["const"],
    pl.col('dxy_close')*dxyWeightings["const"])

[Alt text: Line chart depicting daily DXY values. The x-axis shows time, and the y-axis shows the DXY value. The chart provides a clear view of the daily movement of the DXY.]

If you compare it to the Yahoo Finance DXY plot, it looks pretty similar, so I’m pretty confident this is all correct.

Individual Currency $\beta$’s

Now we can go on to measuring the currencies $\beta$ values. This is a simple linear regression of the log returns of an individual currency vs the log returns of the DXY.

We need to load in more currency pairs.

dfs = [load_data(ccy) for ccy in all_pairs]
combined_df = pl.concat(dfs)
combined_df = combined_df.sort("datetime")

For the regression, we need the individual currency returns and also the DXY returns. Simple log return calculation, and then join the DXY frame onto the individual currencies.

combined_df = combined_df.with_columns(
    pl.col("close").log().diff().over("ccy").alias("log_return")
)

dxy = dxy.with_columns(
    pl.col("dxy_close").log().diff().alias("dxy_log_return")
)

combined_df = combined_df.join(dxy, on="datetime", how="left")

We will do a rolling regression using a 252-day look back, which is roughly the number of trading days in a year.

from statsmodels.regression.rolling import RollingOLS

allParams = []

for ccy in ["EUR", "SEK", "CNH", "TWD", "TRY"]:

    subDF = combined_df.filter(pl.col("ccy") == ccy)
    mod = RollingOLS.from_formula("log_return ~ dxy_log_return", data=subDF, window=252)
    rres = mod.fit()

    paramDF = pl.from_pandas(rres.params)
    paramDF = paramDF.with_columns(ccy=pl.lit(ccy), Date = subDF["datetime"])
    allParams.append(paramDF)

allParams = pl.concat(allParams)

To examine the results, we plot the $\beta_i$ value over time for some different currencies.

EUR (green) is close to 1, which aligns with intuition as it’s the largest weight of the DXY calculation. TRY has the lowest $\beta$ out of these pairs, which suggests its returns are not driven by the overall dollar returns, again, makes sense given TRY’s movements reflect the underlying macroeconomics of TRY. SEK has a consistent $\beta > 1$ which again suggests it’s very susceptible to general dollar moves. It’s not pictured, but HKD comes out with the lowest $\beta$, which is reassuring as it is pegged to the dollar.

Overall, do these $\beta$’s tell us much? Not really, but it is interesting to measure, and this is the foundation needed before we start looking at other factors that might influence the daily currency movements. These can be things like momentum, oil/gold sensitivity, etc.

Conclusion

From this, we have built up a new dataset of daily currency prices and now have daily DXY values too. This has given the underpinnings of an FX factor model, and next time we can start looking at other components that could explain currency movements.

Premier League Survival – How Many Points Are Enough?

2025-10-31T00:00:00+00:00

It’s been an interesting start to the Premier League. All of the promoted teams (Sunderland, Leeds and Burnley) are outside the relegation zone, with Wolves and West Ham struggling at the bottom. So I want to look back at the other seasons and work out the average number of points throughout the season that characterises relegation teams, and how many points do you need to avoid relegation?

Enjoy these types of posts? Then sign up for my newsletter.

This is also a post where I dive into Python. I’ve been meaning to learn both Polars and Plotly, and given the relative simplicity of this post, it feels like the opportune time. It has also been a while since I’ve written about football and given my reduced output recently, it feels like a quick win to churn something out quickly.

Downloading the Data

The gold standard for free and easy football data is football-data, where they have a CSV of every season for many years. This makes it easy to download it directly and merge the seasons together.

Reading a CSV with Polars is no different to Pandas, but adding in a new column is slightly different with the use_columns function and giving it an alias.

s = range(2009, 2027)
seasons = [str((x-1))[2:4] + str((x))[2:4] for x in s]

rawDataList = []

for season in seasons:
    url = f"https://www.football-data.co.uk/mmz4281/{season}/E0.csv"
    rawData = pl.read_csv(url, truncate_ragged_lines=True)
    rawData = rawData.with_columns(pl.lit(season).alias("Season"))
    rawDataList.append(rawData)

rawData = pl.concat(rawDataList, how = "diagonal")

We diagonally concatenate the dataframes because not every season has the same columns, and this will null-fill any missing columns.

We then add a column of row indices and add the points scored by the home and away team based on the outcome of the match.

rawData = rawData.with_row_index("MatchID")
rawData = rawData.with_columns((pl.when(pl.col("FTR") == "H").then(3).when(pl.col("FTR") == "A")).then(0).otherwise(1).alias('PTH'))
rawData = rawData.with_columns((pl.when(pl.col("FTR") == "A").then(3).when(pl.col("FTR") == "H")).then(0).otherwise(1).alias('PTA'))

Formatting the Data

Currently, the data is in a ‘per match’ format with a home and away team. We need to rearrange this so that each team gets its own row per match, so if we filter for a specific team, we get all their matches rather than having to filter both the home and away columns.

The current columns refer to stats in terms of home (H) and away (A). We will replace those names with 1 and 2.

matchDetailsCols = ["MatchID", "Season", "Div", "Date", "HomeTeam", "AwayTeam"]
matchDetailsMap = dict(zip(matchDetailsCols, ["MatchID", "Season", "Div", "Date", "Team1", "Team2"]))

matchStatsCols = ["FTHG", "FTAG", "HS", "AS", "HST", "AST", "PSCD", "PSCH", "PSCA", "PTH", "PTA"]
matchStatsMap = dict(zip(matchStatsCols, [x.replace("H", "1").replace("A", "2") for x in matchStatsCols]))

allCols = matchDetailsCols + matchStatsCols
colsMap = matchDetailsMap | matchStatsMap
matchData = rawData[allCols]

So we create a frame with all the matches relabelled as Team1 and add a dummy indicator for a Home match.

team1Data = matchData.rename(colsMap)
team1Data = team1Data.with_columns(pl.lit(1).alias("Home"))

Likewise for Team2.

team2Data = matchData.rename(colsMap)
team2Map = dict(zip(team2Data.columns, [x.replace("1", "2") if "1" in x else x.replace("2", "1") for x in team2Data.columns]))
team2Data = team2Data.rename(team2Map)
team2Data = team2Data.with_columns(pl.lit(0).alias("Home"))

Then rejoin and sort by the matchID.

teamData = pl.concat([team1Data, team2Data], how = "diagonal")
teamData = teamData.sort("MatchID")

Now we want to add the cumulative sum of points, goals, and goals conceded to get a view of each team’s league position on a match by match basis.

teamData = teamData.select(pl.all(), pl.col("PT1").cum_sum().over(["Season", "Team1"]).alias("TotalPoints1"))
teamData = teamData.select(pl.all(), pl.col("FT1G").cum_sum().over(["Season", "Team1"]).alias("TotalGoals1"))
teamData = teamData.select(pl.all(), pl.col("FT2G").cum_sum().over(["Season", "Team1"]).alias("TotalGoalsC1"))
teamData = teamData.select(pl.all(), pl.int_range(pl.len()).over(["Season", "Team1"]).alias("N"))

This is a bit different to the usual groupby and aggregate, but makes sense to define the function over the column then specify the aggregation columns.

Finally, we are going to create a league table dataframe by taking the last points/goals/goals conceded by each team per season and use that to work out who got relegated each year.

leagueTable = teamData.group_by(["Season", "Div", "Team1"]).agg(pl.col("N", "TotalPoints1", "TotalGoals1", "TotalGoalsC1").last())
leagueTable = leagueTable.sort("TotalPoints1", descending=True)
leagueTable = leagueTable.select(pl.all(), pl.int_range(pl.len()).over(["Season", "Div"]).alias("FinalPosition"))
leagueTable = leagueTable.with_columns((pl.when(pl.col("FinalPosition") >= 17).then(1)).otherwise(0).alias('Relegated'))

We can then join this to the teamData, and this will form the basis of our stats.

teamData = teamData.join(leagueTable[["Season", "Div", "Team1", "FinalPosition", "Relegated"]], on = ["Season", "Div", "Team1"])

Relegation Statistics

The data is in a nice format, and we can manipulate it and see where this season is lining up. This is where plotly now comes in. I’ve always been a matplotlib user and enjoyed building up the plots layer by layer and a decent amount of control. Plotly was always missing from my arsenal, so if I’m dipping my toes into Python, I might as well plug that gap. I’ve neglected some of the final graph formatting points to keep the code chunks manageable.

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

First, we calculate the relegation stats. We want to calculate the average number of points, goals scored, and goals conceded after each game week for the teams that were eventually relegated.

relegated = (teamData.filter(pl.col("Season") != "2526")
                     .group_by(["N", "Relegated"])
                     .agg(pl.col("TotalPoints1").mean(), 
                          pl.col("TotalGoals1").mean(), 
                          pl.col("TotalGoalsC1").mean())
                     .sort("N").filter(pl.col("Relegated") == 1))

We then want to plot this and compare it to the currently promoted teams, plus Wolves and West Ham, who are in the most trouble. Also, shout out to https://teamcolours.netlify.app/ to get the actual colours of the teams for the plot.

fig = go.Figure()
fig.add_trace(go.Scatter(x=relegated["N"], y=relegated["TotalPoints1"],
                    mode='lines+markers',
                    name='Avg Points Of A Relegated Team'))

for team in ["West Ham", "Wolves", "Sunderland", "Leeds", "Burnley"]:
    latestTeam = teamData.filter(pl.col("Team1") == team, pl.col("Season") == "2526")

    fig.add_trace(go.Scatter(x=latestTeam["N"], y=latestTeam["TotalPoints1"],
                    mode='lines+markers',
                    name=team))


fig.update_layout(height=500, width=700,
                  title_text="Relegation Stats")

fig.show()

Wolves and West Ham are currently in trouble. They are below the average line at this point in the season, whereas Sunderland is storming it, Leeds are also quite safe, and Burnley’s recent performance have kept them above the fated line.

However, looking at the average points of a relegated team isn’t the best way of looking at this. It can get dragged down by a very poor team at the bottom of the league. Instead we need to look at the minimum and average number of points to stay safe every season.

This is the same calculation as above, but aggregating on the final position of each team and then filtering on position 16, one above the relegation zone.

safe = (teamData.filter(pl.col("Season") != "2526")
                .group_by(["N", "FinalPosition"])
                .agg(pl.col("TotalPoints1").mean(), 
                     pl.col("TotalGoals1").mean(), 
                    pl.col("TotalGoalsC1").mean(),
                    pl.col("TotalPoints1").min().alias("Min"))
                .sort("N").filter(pl.col("FinalPosition") == 16)
       )

Again, plotting this with the same teams.

fig = go.Figure()
fig.add_trace(go.Scatter(x=safe["N"], y=safe["TotalPoints1"],
                    mode='lines+markers',
                    name='Avg Points of a Safe Team'))

fig.add_trace(go.Scatter(x=safe["N"], y=safe["Min"],
                    mode='lines+markers',
                    name='Min Points of a Safe Team'))

for team in ["West Ham", "Wolves", "Sunderland", "Leeds", "Burnley"]:
    latestTeam = teamData.filter(pl.col("Team1") == team, pl.col("Season") == "2526")

    fig.add_trace(go.Scatter(x=latestTeam["N"], y=latestTeam["TotalPoints1"],
                    mode='lines+markers',
                    name=team))

fig.update_layout(height=500, width=700,
                  title_text="Safety Stats")

fig.show()

Again, Wolves and West Ham are well below the average line (blue), and Wolves are even below the minimum line (red). Burnley and Leeds are in touching distance. Sunderland is well above. From this, Sunderland should be happy and confident that they can stay up; Leeds are at the bare minimum. Wolves are in big danger, but with a new manager, they might be able to get going again. West Ham have already had their new manager bounce, and it’s still looking precarious.

This also shows that, on average, you need 37.23 points to survive in the Premier League, with 35 as the bare minimum. So the fabled 40 point mark is actually a slight over estimation.

It’s not just points, though. What about the number of goals each team has scored and how many they are conceding? Let’s look at these stats and also format up the graph so it’s a bit less default, and focus just on the games so far.

No real change to the conclusion. Sunderland are doing well on both points and goals scored, and their conceded goals are below the average in the 16th position. Wolves and West Ham are underperforming across the board. Leeds and Burnley are scraping by.

Conclusion

Based on these early-season trajectories, it’s not looking good for West Ham or Wolves. By contrast, Sunderland should be getting excited about the prospect of another season in the Premier League. Leeds and Burnley - not quite out of the woods. As another cliche goes, relegation is about hoping you are better than 3 other teams and at the minute Wolves and West Ham are struggling to find three other worse teams!

Easy Neural Nets and Finance - Part 1

2025-07-23T00:00:00+00:00

I’m fortunate enough to be participating in a lecture series at work that covers deep learning and its applications in finance. This will be a series of posts documenting what I learn and implementing the ‘homework’ (I’m 32, how am I still getting homework?) using Julia and Flux.

Enjoy these types of posts? Then sign up for my newsletter.

The phrase ‘deep learning’ already feels outdated, and the current hotness is more about AI and LLMs, so the lecture and topics might feel a bit out of date. But given LLMs wouldn’t be here without the deep learning, it feels like going back to the basics.

Plus, I’ve never really jumped in and explored neural nets, so this gives me a chance to do some deep learning in an applied way.

After reading this, you will be able to build your own neural net with different layers and compare it to a simpler linear model.

Predicting a Stock’s Daily Volume

If you Google neural nets and finance, you will find an infinite amount of copy-pasted quant finance Python examples of people using PyTorch/TensorFlow/JAX to predict the closing price of some stock. Kudos to these tutorials for putting something out there, but you will struggle to learn anything meaningful about either finance, modelling or neural nets.

This is my attempt to be different.

Instead of predicting prices or returns and showing that neural nets can make money, we will model the total number of shares traded per day. For starters, this is much easier as the data is a bit more signal and less noise. Plus, if I managed to build something that could predict prices, why would I share it?

So, we will be using deep learning to build a model of the total trading volume per day of the SPY ETF. A basic time series prediction task that can be approached both with linear models and deep learning.

You know the drill, fire up your Julia notebook and follow along.

using Dates, AlpacaMarkets, Plots, StatsBase
using DataFramesMeta, ShiftedArrays

Getting the Data

We are using similar data to my Cyclical Embedding post, except for this time, we will be using the SPY ETF instead of Apple.

spyRaw, npt = AlpacaMarkets.stock_bars("SPY", "1Day"; 
  startTime=Date("2000-01-01"), 
  endTime = today() - Day(1) ,
  adjustment = "all", limit = 10000)

From the raw data, we parse the timestamp and scale the volumes by a million.

spy = spyRaw[:, [:t, :v, :c]]
spy[!, "t"] = DateTime.(chop.(spy[!, "t"]));
spy[:, :vNorm] = spy[:, :v] .* 1e-6;

We also add in the returns with a lag because we are using the close-to-close return as a feature.

spy[!, "r"] = log.(spy.c) .- ShiftedArrays.lag(log.(spy.c))
spy[!, "prev_r"] = ShiftedArrays.lag(spy.r);

In this data, the daily volume isn’t stationary and it is also heavy-tailed.

plot(
  plot(spy.t, spy.vNorm, label = "IEX Daily Volume"),
  histogram(spy.vNorm, label = "Daily Volume Distribution")
  )

Looking at the autocorrelation, we can see a long-range dependence on the daily volumes, but when we take the daily difference in daily volume, we see a strong effect at lag 1, and the rest are much smaller.

A negative value at lag 1 indicates a mean reversion-like process, but more importantly, means modelling the difference in daily volume will be easier than just directly modelling the daily volumes.

Predicting the daily change in volume does reduce how far out we can forecast volumes, though, as it relies on using the known previous volume to produce the next day’s volume. If you estimate multiple days, then you will be compounding the error.

We lag the volume variables as required.

spy[:, :prev_vNorm] = ShiftedArrays.lag(spy[:, :vNorm])
spy[:, :delta_vNorm] = spy[:, :vNorm] .- spy[:, :prev_vNorm]
spy[:, :prev_delta_vNorm] = ShiftedArrays.lag(spy[:, :delta_vNorm])

spy = dropmissing(spy)

We add in the time-based variables and cyclically encode them.

spy[:, :DayOfMonth] = dayofmonth.(spy.t) .- 1
spy[:, :DayOfWeek] = dayofweek.(spy.t) .- 1
spy[:, :DayOfQtr] = dayofquarter.(spy.t) .- 1
spy[:, :MonthOfYear] = month.(spy.t) .- 1

spy = cyclical_encode(spy, "DayOfWeek")
spy = cyclical_encode(spy, "DayOfMonth")
spy = cyclical_encode(spy, "DayOfQtr")
spy = cyclical_encode(spy, "MonthOfYear");

We also add in if the date was the end of the month.

spy[:, :month] = floor.(spy.t, Dates.Month)
spy = @transform(groupby(spy, :month), 
                 :MonthEnd = (:t .== maximum(:t)))

Finally, train/test split.

spyTrain = spy[1:2000, :];
spyTest = spy[2001:end, :];

With the data prepared, we move on to building out the models.

The Baseline Model

We always want to make sure the neural nets are adding value, so we need something simple to compare to. In regular statistical modelling, this might be an intercept-only model, but in this case, we want the best linear model.

It’s a simple linear regression of all the available variables.

using GLM

linearModel = lm(@formula(delta_vNorm ~ prev_delta_vNorm + prev_vNorm + 
                                        MonthEnd + prev_r +
                                        DayOfWeek_sin + DayOfWeek_cos + 
                                        DayOfMonth_sin + DayOfMonth_cos +
                                        DayOfQtr_sin + DayOfQtr_cos +
                                        MonthOfYear_sin + MonthOfYear_cos
                        ), spyTrain)

This fits instantly and we get an in-sample $R^2$ of 23% and an out-of-sample MSE of 380.

To add the predicted volume to the test set, we need to add the prediction of the model to the previous volume.

spyTest[!, "linearPred"] = spyTest.prev_vNorm .+ predict(linearModel, spyTest);
sort!(spyTest, :t);

Everything lines up quite nicely. There are a couple of periods where the volume spikes and the model can’t keep up, but other than that, it looks decent.

Also interesting to look at the shape of the cyclically encoded variables.

Plenty going on here!

Day of the Week - Wednesdays and Thursdays have a larger positive effect than Mondays and Tuesdays.
Day of the Month - The middle of the month (10-15) has the higher positive effect.
Day of the Quarter - Larger positive effects towards the end of the quarter.
Month of the Year - Summer months have the most negative effect.

A positive effect here means a larger positive change in the daily volume compared to the previous day, and similarly, the same with the negative effects.

So, an intuitive model to begin with that has produced a strong foundation to improve upon with the neural net models.

Neural Nets in Julia

Let’s increase the model complexity and introduce the neural nets. We are still using the same variables, but we expand them to include even more lags of the change in volumes.

Preparing the Data for a Neural Network

We start with the dataframe, but iterate through and add the 30 lags of the previous volume changes.

rawData = spy[:, [:t, :delta_vNorm, :prev_vNorm, :MonthEnd, :prev_r,
                      :DayOfWeek_sin, :DayOfWeek_cos,
                      :DayOfMonth_sin, :DayOfMonth_cos,
                      :DayOfQtr_sin, :DayOfQtr_cos,
                      :MonthOfYear_sin, :MonthOfYear_cos]]

maxLag = 30
for i in 1:maxLag
    rawData[:, Symbol("lag_$(i)_delta_vNorm")] = ShiftedArrays.lag(rawData.delta_vNorm, i)
end

dropmissing!(rawData)

We then need to go from dataframes to matrices and flip the dimensions so each column is an observation rather than each row.

y = permutedims(rawData.delta_vNorm)
ts = rawData.t
x = @select(rawData, Not(:delta_vNorm, :t))
x = permutedims(Matrix(x));

Again, train/test split too.

xTrain = x[:, 1:2000]
yTrain = y[:, 1:2000]
tsTrain = ts[1:2000]

xTest = x[:, 2001:end]
yTest = y[:, 2001:end]
tsTest = ts[2001:end];

Flux.jl is Julia’s neural network library and the go-to for deep learning in Julia. It provides all the tools to build and train these types of models. One such tool is the DataLoader, which enables batch training for models. Batch training uses random subsets of the full data to train the model, which is very useful if you have too much data to fit into memory. You get to train the model on all your data by breaking it down into chunks.

Now, in this specific case, it isn’t needed as our data is small, but it’s always good to understand the techniques, and Flux makes it very simple. Pass in the x and y matrices, define the batch size and whether you want to randomise the samples or not.

Here we build random batches of 5.

train_loader = Flux.DataLoader((x, y), batchsize=5, shuffle=true);

Next, we need to build the model. In Flux, each layer of the basic net needs the number of input nodes and output nodes.

flux_model = Dense(size(x, 1), 1)

Simply taking the number of rows of the x matrix as the input, and we are outputting 1 number - the expected change in volume for that day.

We also need to define a loss function for the model. We will use the mean square error (MSE). We predict the values from the model and calculate the MSE compared to the true values.

function flux_loss(flux_model, x, y)
    yhat = flux_model(x)
    Flux.mse(yhat, y)
end

A neural net has several parameters that we need to optimise using the training data. With each batch of data, we evaluate the loss function and use the gradient of the loss function to push the parameters in the right direction to minimise the loss. The mechanics of moving around the loss function are controlled by the optimiser. In this case, we will use regular gradient descent, but there are many different optimisers out there that Flux provides - Optimiser Reference.

Again, Flux makes this easy to do out of the box without really needing to understand what’s happening behind the scenes. We provide a gradient descent optimiser, Flux.setup(Descent(eta)), flux_model) (with eta ($\eta$) being the learning rate) and update the parameters after each batch of data.

l, gs = Flux.withgradient(m -> flux_loss(m, x, y),flux_model)
Flux.update!(opt_state, flux_model, gs[1])

After all that, we throw everything into one function to easily iterate around the models. We are batch training with gradient descent and returning the trained model plus the loss history on both the full training set and the test set.

function train(train, test, flux_model, flux_loss; batchSize=1024, epochs=10, eta=0.01)
    (xTrain, yTrain) = train
    (xTest, yTest) = test
    
    train_loader = Flux.DataLoader((xTrain, yTrain), batchsize=batchSize, shuffle=true);
    opt_state = Flux.setup(Descent(eta), flux_model);
        
    allTrainLoss = zeros(epochs)
    allTestLoss = zeros(epochs)
    
    for epoch in 1:epochs
        loss = 0.0
        for (x, y) in train_loader
            l, gs = Flux.withgradient(m -> flux_loss(m, x, y), flux_model)
            Flux.update!(opt_state, flux_model, gs[1])
            loss += l / length(train_loader)
        end
        train_loss = flux_loss(flux_model, xTrain, yTrain)
        test_loss = flux_loss(flux_model, xTest, yTest)
        allTrainLoss[epoch] = train_loss
        allTestLoss[epoch] = test_loss
        
    end
    return (flux_model, allTrainLoss, allTestLoss)
end

We can now train the models, so let’s build some models!

A 1 Layer Neural Net

The simplest neural net is 1 layer with the features as an input and 1 value as the output. Nothing else!

flux_model = Dense(size(x, 1), 1)
flux_model, allTrainLoss, allTestLoss = train((xTrain, yTrain), (xTest, yTest), flux_model, flux_loss; epochs = 1000, eta=1e-6);

You might notice something strange here: the test loss is smaller than the training loss. This is a quirk of this data set; the test set has a tighter distribution than the training data, which is easy to see in a histogram.

Like I said, it’s a quirk of the dataset, but something to bear in mind for the rest of the examples.

Let’s look at the predicted values of this first neural net and how they line up with reality. Plus, we can compare it to the linear model. For the linear model, you just need to run predict and pass in the test dataset. Similarly, with the neural net, we evaluate the trained model on the testing matrix.

nnTest = DataFrame(t=tsTest, delta_vNorm_nn = vec(flux_model(xTest)'))
spyTest.delta_vNorm_lin = predict(linearModel, spyTest)
spyTest = leftjoin(spyTest, nnTest, on = :t);

As we are predicting the change in the daily volume, we need to add back in the previous value to get our predicted daily volume.

spyTest = @transform(spyTest, :v_nn = :prev_vNorm .+ :delta_vNorm_nn, :v_lin = :prev_vNorm + :delta_vNorm_lin);
sort!(spyTest, :t);

And then plotting

p = plot(spyTest.t, spyTest.vNorm, label = "True",  dpi=300, background_color = :transparent)
p = plot!(p, spyTest.t, spyTest.v_nn, label = "NN")
p = plot!(p, spyTest.t, spyTest.v_lin, label = "Linear")
p

Things line up quite well, nothing outrageous.

In terms of performance, we calculate the MSE from the dataframe.

@combine(dropmissing(spyTest), 
          :NN = mean((:vNorm .- :v_nn).^2), 
          :Lin = mean((:vNorm .- :v_lin).^2))

NN	Lin
405.55	370.57

The linear model is doing better so far.

2 Layer Neural Nets

We are now in the realm of multi-layer perceptrons (MLPs) and have introduced many more parameters into the model. We can also now build more complicated interactions with each layer.

In Flux, building out more layers is simple; you are chaining different dense layers together. We are choosing to have a fully connected MLP with 2 layers, with all the variables passed through.

flux_model2 = Flux.Chain(Dense(size(x, 1), size(x, 1)), Dense(size(x, 1), 1))

flux_model2, allTrainLoss, allTestLoss = train((xTrain, yTrain), (xTest, yTest), flux_model2, flux_loss; epochs = 1000, eta = 1e-6);

This trains in the same amount of time with the same train/test loss pattern. Again, assessing the MSE of this bigger model.

nnhTest = DataFrame(t=tsTest, delta_vNorm_nnh = vec(flux_model2(xTest)'))
spyTest = leftjoin(spyTest, nnhTest, on = :t);

spyTest = @transform(spyTest, :v_nnh = :prev_vNorm .+ :delta_vNorm_nnh)
@combine(dropmissing(spyTest), :NN = mean((:vNorm .- :v_nn).^2), 
                               :Lin = mean((:vNorm .- :v_lin).^2),
                               :NNH = mean((:vNorm .- :v_nnh).^2))

NN	Lin	NNH
405.55	370.57	401.424

This has improved on the 1-layer neural net, but still no better than the linear model.

Neural Net Regularisation

The linear model has 13 parameters, the 1-layer neural net has 42 parameters, and the 2-layer net has 1,764 parameters. This is a rapid growth in complexity which raises the likelihood that the model starts to overfit. How do we make sure the neural net models only pick out the key parameters and regularise themselves?

We have two options: add a penalisation score in the loss function that bounds the total size of the coefficients or introduce something called a dropout layer.

Penalising the Loss Function

You can extend regularisation into neural networks the same way you do linear models. You add an additional term to the loss function that penalises the total combined size of the coefficients.

function flux_loss_reg(flux_model, x, y)
    flux_loss(flux_model, x, y) + sum(x->sum(abs2, x), Flux.trainables(flux_model))
end

Therefore, if the model wants to allocate more weight to 1 parameter, it needs to take some weight from another. This acts as a balancing mechanism and should reduce the chance of overfitting.

We use this new loss function with the 2-layer net.

flux_model = Flux.Chain(Dense(size(x, 1), size(x, 1)), Dense(size(x, 1), 1))
flux_model, allTrainLoss, allTestLoss = train((xTrain, yTrain), (xTest, yTest), flux_model, flux_loss_reg; epochs = 1000, eta = 1e-6);

NN	Lin	NNH	NNHR
405.55	370.57	401.424	388.548

So slightly better than the unregularised version.

Neural Net Dropout Layers

An alternative way of regularising a network is to introduce a dropout layer. Dropout randomly sets the output of a node to zero during the training phase, which means the net has fewer parameters to optimise over and reduces the possibility of overfitting. When it comes to inference, all of the nodes are included but rescaled by the dropout probability. The original dropout paper is an engaging read - Dropout: A Simple Way to Prevent Neural Networks from Overfitting.

Again, very simple to use dropout in Julia and Flux; it is just another type of layer.

flux_model3 = Flux.Chain(Dense(size(x, 1), size(x, 1)), Dropout(0.5), Dense(size(x, 1), 1))

flux_model3, allTrainLoss, allTestLoss = train((xTrain, yTrain), (xTest, yTest), flux_model3, flux_loss; epochs = 250, eta = 1e-6);

For the final time, let’s evaluate this model on the test set and calculate the MSE.

nndTest = DataFrame(t=tsTest, delta_vNorm_nnd = vec(flux_model3(xTest)'))
spyTest = leftjoin(spyTest, nndTest, on = :t);

spyTest = @transform(spyTest, :v_nnd = :prev_vNorm .+ :delta_vNorm_nnd)
@combine(dropmissing(spyTest), :NN = mean((:vNorm .- :v_nn).^2), 
                               :Lin = mean((:vNorm .- :v_lin).^2),
                               :NNH = mean((:vNorm .- :v_nnh).^2),
                               :NND = mean((:vNorm .- :v_nnd).^2))

NN	Lin	NNH	NNHR	NNHD
405.55	370.57	401.424	388.548	411.105

The worst model so far!

Conclusion

So the linear model is still winning. The neural net and various iterations haven’t improved on this simple model, and the best neural net was the 2-layer with regularisation.

It must be noted that this problem isn’t exactly hard, and the amount of data is relatively small, so it is unsurprising that the added complexity of the neural nets hasn’t added anything. It’s hardly a ‘deep learning’ problem!

I’ve also not gone crazy with the neural net optimisations. You can include more layers, change the number of nodes in the layers, change the activation functions, and change the loss function - all sorts of things that could be tweaked and improve the model.

Hopefully I’ve not just added to the slop of neural net finance tutorials and you’ve found something useful. Unfortunately, the neural nets haven’t beaten the linear model, which shows you can’t just jump into the fancy tools without looking at the simpler models.

Other Julia/Finance Posts

For more quant finance tutorials check out some of my older posts.

Cyclical Embedding

2025-06-16T00:00:00+00:00

Cyclical embedding (or encoding) is a basic transformation for numerical variables that follow a cycle. Let’s explore how they work.

I am currently attending a Deep Learning in Finance lecture series (lectured by Stefan Zohran in preparation for his new book). The ongoing homework is taking a basic time series model and applying the various deep learning techniques. In the process of doing this homework, I’ve come across Cyclical Embeddings and how they are used to transform variables that move into a cycle into something a model can understand.

Consider this blog post me reading this Kaggle notebook: Encoding Cyclical Features for Deep Learning, converting it to Julia and using some examples to convince myself Cyclical Embeddings work and are useful.

Enjoy these types of posts? Then sign up for my newsletter.

Cyclical variables are especially pertinent in Finance. For example, day of the week you could either use a factor (the label directly) or number (Mon=1, Tue=2 etc.) in a model. Using a factor, your model now includes 5 additional parameters. If you use the number you’ll have to specify the form of the relationship (linear or using a GAM). Each has its ups and downs, but there is also a key piece of information missing: the days of the week form a cycle where 1 follows from 5. How can we translate this into something the model will understand?

As the name suggests, cyclical embeddings lead to a cycle and the natural functions are the trigonometry sin and cos. We take the one-dimensional variable and transform it into two dimensions

\[\begin{align*} x & = \sin \left( \frac{2 \pi t}{\text{max} (t)} \right), \\ y & = \cos \left( \frac{2 \pi t}{\text{max} (t)} \right). \end{align*}\]

If we apply this transformation to our day of the week we go from $t \in [0, 4]$ to a circle in $x$ and $y$.

I am reminded of polar coordinates and we can now see that Monday is the same distance from Friday as it is Tuesday. Crucially, the new variables are nicely bounded between -1 and 1 which is always helpful when building models. All in, this looks like a sensible transformation, now to see if it has a noticeable difference in modelling performance.

Practical Cyclical Embeddings - Daily Volumes

Let’s model the daily trading volume of a stock. It feels logical that the day of the week (Mon-Fri), day of the month (1-31) and month (1-12) would affect the amount traded. The summer months might be quieter, the end of the month might be busier (month-end rebalancing) and Fridays might be quieter. All three of these time variables are cyclical so the cyclical embeddings should help.

We have 3 separate choices:

Everything as a number (3 free parameters)
Days of the week and months as factors (5 + 12 + 1 free parameters)
Cyclically embedded the three variables (3x2=6 parameters)

So a balance between the number of parameters and the flexibility of the model.

We will use a simple linear model, nothing fancy.

As always we will be in Julia.

using Dates, AlpacaMarkets, Plots, StatsBase, GLM
using DataFramesMeta, CategoricalArrays, ShiftedArrays

To load the data in we will use my AlpacaMarkets.jl API and pull in as much daily data as possible.

aaplRaw, npt = AlpacaMarkets.stock_bars("AAPL", "1Day"; startTime=Date("2000-01-01"), endTime = today() - Day(2), adjustment = "all", limit = 10000)

Some basic cleaning and formatting.

aapl = aaplRaw[:, [:t, :v]]
aapl[!, "t"] = DateTime.(chop.(aapl[!, "t"]))

Julia makes it easy to add the factor variables and the numeric versions. As the numeric values all start at 1 we subtract one so they begin at 0.

aapl[:, :DayName] = CategoricalArray(dayname.(aapl.t))
aapl[:, :MonthName] = CategoricalArray(monthname.(aapl.t))

aapl[:, :DayOfMonth] = dayofmonth.(aapl.t) .- 1
aapl[:, :DayOfWeek] = dayofweek.(aapl.t) .- 1
aapl[:, :MonthOfYear] = month.(aapl.t) .- 1;

We normalise the volume to millions of shares and take the difference.

aapl = aaplRaw[:, [:t, :v]]
aapl[:, :vNorm] = aapl[:, :v] .* 1e-6;
aapl[:, :delta_vNorm] = aapl[:, :vNorm] .- ShiftedArrays.lag(aapl[:, :vNorm]);

As the regular volumes (vNorm) aren’t stationary, we can see a clear trend that changes, it’s better to model the difference in volumes each day.

plot(plot(aapl.t, aapl.vNorm, title = "Volume", label = :none), 
     plot(aapl.t, aapl.delta_vNorm, title = "Volume Difference", label = :none), layout=(2,1))

To apply the cyclical encoding we need to take one column and turn it into two.

function cyclical_encode(df, col, max)
    df[:, Symbol("$(col)_sin")] = sin.(2 .* pi .* df[:, Symbol(col)]/max)
    df[:, Symbol("$(col)_cos")] = cos.(2 .* pi .* df[:, Symbol(col)]/max)
    df
end

for col in ["DayOfWeek", "DayOfMonth", "MonthOfYear"]
    aapl = cyclical_encode(aapl, col, maximum(aapl[:, col]))
end

If you’ve not seen it before the $ is like Python F-strings and lets you use a variable in the string.

We do the normal test/train split.

aaplTrain = aapl[1:2000,:]
aaplTest = aapl[2001:end,:];

Now to build the three models.

The numerical model takes in the numbers directly.

numModel = lm(@formula(delta_vNorm ~ DayOfWeek + MonthOfYear + DayOfMonth), aaplTrain)

The factor model represents the day of the week and day of the month as categories so they each get a separate parameter.

factorModel = lm(@formula(delta_vNorm ~ DayName + MonthName + DayOfMonth + 0), aaplTrain)

The embedding model takes in the sin/cos transformation of each of the variables.

embeddingModel = lm(@formula(delta_vNorm ~ DayOfWeek_sin + DayOfWeek_cos + DayOfMonth_sin + DayOfMonth_cos + MonthOfYear_sin + MonthOfYear_cos), aaplTrain);

To assess how well the models perform we look at the RMSE (in sample and out of sample), AIC (in sample) and $R^2$ (in sample and out of sample).

Model	NumCoefs	RMSE	RMSEOOS	AIC	R2	R2OOS
Numeric	4	31.1041	50.2975	21346.9	0.0336539	0.0396665
Factor	17	31.2978	50.0453	21352.8	0.0433269	0.0276647
Embedding	7	31.7484	51.1591	21420.8	0.0002655	-0.000531

Interestingly, the embedding model performs the worst both in sample and out of sample.

When we pull out the Day of the Week effect it’s easy to see what the model has learnt.

params = Dict(zip(coefnames(embedingExample), coef(embedingExample)))

x = 0:0.1:4
ySin = params["DayOfWeek_sin"] * sin.(2 .* pi .* x ./ maximum(x))
yCos = params["DayOfWeek_cos"] * cos.(2 .* pi .* x ./ maximum(x))


p = plot(x, ySin, label = "Sin")
plot!(p, x, yCos, label = "Cos")
plot!(p, x, yCos .+ ySin, label = "Combined")

This indicates the lower volume changes are on Tuesday and the higher volume changes are on Thursday.

Based on the model performance it’s not a great showing for the embedding transformation. Let’s move on to another example where the cyclical nature might be more obvious.

Practical Cyclical Embeddings - Intraday Volumes

Another example would be the flow of trades over the day. In this case, the hour is the variable we will cyclically embed. For this, we use BTCUSD trades from AlpacaMarkets.jl and aggregate them over the day.

btcRaw, token = AlpacaMarkets.crypto_bars("BTC/USD", "1H"; startTime=Date("2025-01-01"), limit = 10000)

res = [btcRaw]
while !(isnothing(token) || isempty(token))
    println(token)
    newtrades, token = AlpacaMarkets.crypto_bars("BTC/USD", "1H"; startTime=Date("2025-01-01"), limit = 10000, page_token = token)
    println((minimum(newtrades.t), maximum(newtrades.t)))
    append!(res, [newtrades])
    sleep(AlpacaMarkets.SLEEP_TIME[])
end
res = vcat(res...);

Sidenote, I do need to wrap this functionality into the package itself.

We get the raw data into a suitable state.

btc = res[:, [:t, :v]]
btc[!, "t"] = DateTime.(chop.(btc[!, "t"]));

btc = @transform(btc, :Date = Date.(:t), :Time = Time.(:t), :DayOfWeek = dayofweek.(:t), :Hour = hour.(:t))
trainDates = unique(btc.Date)[1:140]
testDates = setdiff(unique(btc.Date), trainDates)

trainDataRaw = btc[findall(in(trainDates), btc.Date), :];
testDataRaw = btc[findall(in(testDates), btc.Date), :];

trainData = @combine(groupby(trainDataRaw, [:Hour]), :v = sum(:v))
trainData = @transform(trainData, :total_v = sum(:v), :frac = :v./sum(:v))

testData = @combine(groupby(testDataRaw, [:Hour]), :v = sum(:v))
testData = @transform(testData, :total_v = sum(:v), :frac = :v./sum(:v))

sort!(trainData, :Hour);
sort!(testData, :Hour);

Again, using a linear model we fit the embedded hour variables to the fraction of the volume traded per hour.

embedModelIntra = lm(@formula(frac ~ Hour_sin + Hour_cos), trainData)

When comparing the results, we are now just looking at the intraday profile of the trades for both the train set and test set overlaid with the model.

The model has done well to pick up the peak in the afternoon but has missed the peak in the early morning. The RMSE of this model is 0.029 vs 0.026 from using the training fractions directly, so again the encoded model has done worse. This is the limiting factor with this embedding, we have a single frequency of sin/cos when in reality this problem needs more degrees of freedom, i.e. multiple components

\[\sum _i c^1_i \sin \left(\frac{2 \pi \omega _i x}{\max (x)}\right) + c^2_i \cos \left(\frac{2 \pi \omega _i x}{\max (x)}\right).\]

This is now a GAM with trigonometric splines so we can view the cyclical encoding as a 1-spline GAM.

Conclusion

It’s an interesting transformation of time-like variables and gives you a route to smoothing out the beginning and ending of the cycles.

In these toy models, the embedding hasn’t improved performance but it’s possible that it’s more relevant in deep learning architectures where there are more parameters and more interactions. In all the above models there’s much more groundwork to do before we start eeking out performance gains from the time variables.

Fitting Price Impact Models

2025-03-14T00:00:00+00:00

A big part of market microstructure is price impact and understanding how you move the market every time you trade. In the simplest sense, every trade upends the supply and demand of an asset even for a tiny amount of time. The market responds to this change, then responds to the response, then responds to that response, etc. You get the idea. It’s a cascading effect of interactions between all the people in the market.

Enjoy these types of posts? Then sign up for my newsletter.

Price impact is happening both at the micro and macro level. At the micro level each trade moves the market a little bit based on the instantaneous market conditions commonly called ‘liquidity’. At the macro level, continuous trades in one direction have a compounding and overlapping effect. In reality, you can’t separate out either effect so the market impact models need to work for both small and large scales.

This post is inspired by two sources:

Both cover very similar models but one is a fairly expensive book and the other is on SSRN for free. The same author is involved in both of them too.

In terms of data, there are two routes you can go down.

You have your own, private, execution data and can build out a data set for the models.
You use publicly available trades and adjust the models to account for the anonymous data.

In the first case, you will know when an execution started and stopped so can record how the price changed. In the second case, the data will be made up of lots of trades and less obvious when some parent execution started and stopped.

We will take the 2nd route and using Bitcoin data to look at different price impact models.

As ever I will be using Julia with some of the standard packages.

using LibPQ
using DataFrames, DataFramesMeta
using Dates
using Plots
using GLM, Statistics, Optim

Bitcoin Price Impact Data

We will use my old trusty Bitcoin data set that I collected in 2021. It’s just over a day’s worth of Bitcoin trades and L1 prices that I piped into QuestDB. Full detail in Using QuestDB to Build a Crypto Trade Database in Julia.

First, we connect to the database.

conn = LibPQ.Connection("""
             dbname=qdb
             host=127.0.0.1
             password=quest
             port=8812
             user=admin""");

For each trade recorded in the database, we also want to join the best bid and offer immediately before it. This is where an ASOF join is useful. It joins two tables with timestamps using the entry of the 2nd table with time before the first table row. Sounds more complicated than it really is. In short, it takes the trade table and adds in the prices using the price just before the trade.

trades = execute(conn, 
    "WITH
trades AS ( 
   SELECT * FROM coinbase_trades
   ),
prices as (
  select * from coinbase_bbo
)
select * from trades ASOF JOIN prices") |> DataFrame
dropmissing!(trades);
trades = @transform(trades, :mid = 0.5*(:ask .+ :bid))

For these small tables, it calculates pretty much instantly and we are able to return a Julia data frame. Plus we calculate the mid-price for each row.

In all the price impact models we are aggregating this data:

Group the data by some time bucket (seconds or minutes etc.)
Calculate the net amount, total absolute amount and open and close prices of the bucket.
Calculate the price return using the close-to-close prices.

function aggregate_data(trades, smp)
    tradesAgg = @combine(groupby(@transform(trades, :ts = floor.(:timestamp, smp)), :ts), 
             :q = sum(:size .* :side), 
             :absq = sum(:size), 
             :o = first(:mid), 
             :c = last(:mid));
    tradesAgg[!, "price_return"] .= [NaN; (tradesAgg.c[2:end]./ tradesAgg.c[1:(end-1)]) .- 1]
    tradesAgg[!, "ofi"] .= tradesAgg.q ./ tradesAgg.absq

    tradesAgg
end

We are going to bucket the data by 10 seconds.

aggData  = aggregate_data(trades, Dates.Second(10))

As ever, let’s split this data into a training and test set.

aggDataTrain = aggData[1:7500, :]
aggDataTest = aggData[7501:end, :];

It’s just a simple split on time.

plot(aggDataTrain.ts, aggDataTrain.c, label = "Train")
plot!(aggDataTest.ts, aggDataTest.c, label = "Test")

Calculating the Volatility and ADV

All the models require a volatility and ADV calculation. My data runs just over a day, so need to adjust for that.

For the ADV we take the sum of the total volume traded and divide by the length of time converted to days.

deltaT = maximum(trades.timestamp) - minimum(trades.timestamp)
deltaTDays = (deltaT.value * 1e-3)/(24*60*60)
adv = sum(trades.size)/deltaTDays
aggDataTrain[!, "ADV"] .= adv
aggDataTest[!, "ADV"] .= adv;

For the volatility, we take the square root of the sum of the 5-minute return squared. Should probably be annualised if we were comparing the parameters across different assets.

min5Agg = aggregate_data(trades, Dates.Minute(5))
volatility = sqrt(sum(min5Agg.price_return[2:end] .* min5Agg.price_return[2:end]))
aggDataTrain[!, "Vol"] .= volatility;
aggDataTest[!, "Vol"] .= volatility;

The ADV and volatility have a normalising effect across assets. So if we had multiple coins, we could use the same model even if one was a highly traded coin like BTC or ETH vs a lower volume coin (the rest of them?!). This would give us comparable model parameters to judge the impact effect.

As our data sample is so small we are only calculating 1 volatility and 1 ADV. In reality, you calculate the volatility/ADV on a rolling basis and then do the train/test split.

Models of Market Impact

The paper and book describe different market impact models that all follow a similar functional form. I’ve chosen four of them to illustrate the model fitting process.

The Order Flow Imbalance model (OFI)
The Obizhaeva-Wang (OW) model
The Concave Propagator model
The Reduced Form model

For all the models we will state the form of the market impact $\Delta I$ and use the price returns over the same period to find the best parameters of the model.

The overarching idea is that the return in each bucket is proportional to the amount of volume traded in that bucket plus some contribution from the previous volumes earlier - suitably decayed.

Order Flow Imbalance

This is the simplest model as it just uses the imbalance over the bucket to predict return. For the OFI we are just using the trade imbalance, the net volume divided by the total volume in the bucket.

\[\Delta I = \lambda \sigma \frac{q_t}{| q_t | \text{ADV}}\]

As there is no dependence on the previous returns, we can use simple linear regression to estimate $\lambda$.

aggDataTrain[!, "x_ofi"] = aggDataTrain.Vol .* (aggDataTrain.ofi ./ aggDataTrain.ADV)
aggDataTest[!, "x_ofi"] = aggDataTest.Vol .* (aggDataTest.ofi ./ aggDataTest.ADV)

ofiModel = lm(@formula(price_return ~ x_ofi + 0), aggDataTrain[2:end, :])

The model has returned a significant value of $\lambda = 59$ and has an in sample $R^2$ of 11% and our of sample RMSE of 0.0003. Encouraging and off to a good start!

Side note, I’ve written about Order Flow Imbalance before in Order Flow Imbalance - A High Frequency Trading Signal.

The Obizhaeva-Wang (OW) Model

The OW model is a foundational model of market impact and you will see this model frequently across different microstructure papers. It suggests a linear dependence between the signed order flow and price impact but again normalising against the ADV and volatility.

\[\Delta I = -\beta I_t + \lambda \sigma \frac{q_t}{ADV}\]

Again, we create the $x$ variable in the data frame specific for this model but this will need special attention to fit.

aggDataTrain[!, "x_ow"] = aggDataTrain.Vol .* (aggDataTrain.q ./ aggDataTrain.ADV);
aggDataTest[!, "x_ow"] = aggDataTest.Vol .* (aggDataTest.q ./ aggDataTest.ADV);

From the market impact formula, we can see that the relationship is recursive. The impact at time $t$ depends on the impact at time $t-1$. How much of the previous impact is carried over is controlled by $\beta$ and in the paper they fix this at $\frac{\log 2}{\beta} = 60 \text{ Minutes}$. This means we have to fit the model as:

Calculate the $I$ given an estimate of $\lambda$
Adjust the price returns by this impact
Regress the adjusted price returns against the $x$ variable.
Repeat with the new estimate of $\lambda$ until converged.

This is a simple 1 parameter optimisation where we minimise the RMSE.

function calcImpact(x, beta, lambda)
    impact = zeros(length(x))
    impact[1] = x[1]
    for i in 2:length(impact)
        impact[i] = (1-beta)*impact[i-1] + lambda*x[i]
    end
    impact
end
	
function fitLambda(x, y, beta, lambda)
    I = calcImpact(x, beta, lambda)
    y2 = y .+ (beta .* I)
    model = lm(reshape(x, (length(x), 1))[2:end, :], y2[2:end])
    model
end

rmse(x) = sqrt(mean(residuals(x) .^2))

We start with $\lambda = 1$ and let the optimiser do the work.

res = optimize(x -> rmse(fitLambda(aggDataTrain[!, "x_ow"], aggDataTrain[!, "price_return"], 0.01, x[1])), [1.0])

It’s converged! We plot the different values of the objective function and show that this process can find the minimum.

lambdaRes = rmse.(fitLambda.([aggDataTrain[!, "x_ow"]], [aggDataTrain[!, "price_return"]], 0.01, 0:1:20))
plot(0:1:20, lambdaRes, label = :none, xlabel = L"\lambda", ylabel = "RMSE", title = "OW Model")
vline!(Optim.minimizer(res), label = "Optimised Value")

We then pull out the best-fitting model and estimate the $R^2$. We have a nice convex relationship which is always a good sign.

owModel = fitLambda(aggDataTrain[!, "x_ow"], aggDataTrain[!, "price_return"], 0.01, first(Optim.minimizer(res)))

Which gives $R^2 = 11\%$. So roughly the same as the OFI model. For the out-of-sample RMSE we get 0.0006.

Concave Propagator Model

This model follows the belief that market impact is a power law and that power is close to 0.5. Using the square root of the total amount traded and the net direction gives us the $x$ variable.

\[\Delta I = -\beta I_t + \lambda \sigma \text{sign} (q_t) \sqrt {\frac{| q_t |}{\text{ADV}}}\]

aggDataTrain[!, "x_cp"] = aggDataTrain.Vol .* sign.(aggDataTrain.q) .* sqrt.((aggDataTrain.absq ./ aggDataTrain.ADV));
aggDataTest[!, "x_cp"] = aggDataTest.Vol .* sign.(aggDataTest.q) .* sqrt.((aggDataTest.absq ./ aggDataTest.ADV));

Again, we optimise using the same methodology as above.

res = optimize(x -> rmse(fitLambda(aggDataTrain[!, "x_cp"], aggDataTrain[!, "price_return"], 0.01, x[1])), [1.0])
lambdaRes = rmse.(fitLambda.([aggDataTrain[!, "x_cp"]], [aggDataTrain[!, "price_return"]], 0.01, 0:0.1:1))
plot(0:0.1:1, lambdaRes, label = :none, xlabel = L"\lambda", ylabel = "RMSE", title = "Concave Propagator Model")
vline!(Optim.minimizer(res), label = "Optimised Value")

Another success! This time the $R^2$ is 17% so an improvement on the other two models. It’s out of sample RMSE is 0.0008.

Reduced Form Model

The paper suggests that as the number of trades and time increment increases the market impact function converges to a linear form with a dependence on the stochastic volatility of the order flow.

\[\Delta I = -\beta I_t + \lambda \sigma \frac{q_t}{\sqrt{v_t \cdot \text{ADV}}}\]

For this, we need to calculate the stochastic liquidity parameter, $v_t$, which is simply the moving average of the absolute market volumes.

function calcLiquidity(absq, beta)
    v = zeros(length(absq))
    v[1] = absq[1]
    for i in 2:length(v)
        v[i] = (1-beta)*v[i-1] + absq[i]
    end
    return v
end

v = calcLiquidity(aggDataTrain[!, "absq"], 0.01)
vTest = calcLiquidity(aggDataTest[!, "absq"], 0.01)

plot(aggDataTrain.ts, v, label = "Stochastic Liquidity")
plot!(aggDataTest.ts, vTest, label = "Test Set")

Adding this into our data frame and calculating the $x$ variable is simple.

aggDataTrain[!, "v"] = v
aggDataTest[!, "v"] = vTest

aggDataTrain[!, "x_rf"] = aggDataTrain.Vol .* aggDataTrain.q ./ sqrt.((aggDataTrain.ADV .* aggDataTrain[!, "v"]));
aggDataTest[!, "x_rf"] = aggDataTest.Vol .* aggDataTest.q ./
sqrt.((aggDataTest.ADV .* aggDataTest[!, "v"]));

And again, we repeat the fitting process.

lambdaVals = 0:0.1:5
res = optimize(x -> rmse(fitLambda(aggDataTrain[!, "x_rf"], aggDataTrain[!, "price_return"], 0.01, x[1])), [1.0])
lambdaRes = rmse.(fitLambda.([aggDataTrain[!, "x_rf"]], [aggDataTrain[!, "price_return"]], 0.01, lambdaVals))
plot(lambdaVals, lambdaRes, label = :none, xlabel = L"\lambda", ylabel = "RMSE", title = "Reduced Form Model")
vline!(Optim.minimizer(res), label = "Optimised Value")

This model gives an $R^2=10%$ and out-of-sample RMSE of 0.0009.

With all four models fitted, we can now look at the differences statistically and how the impact state evolves over the course of the day.

Model	$\lambda$	$R^2$	OOS RMSE
OFI	43	0.11	0.0003
OW	14	0.11	0.0006
Concave Propagator	0.34	0.17	0.0008
Reduced Form	1.7	0.10	0.0009

So, the concave propagator model has the highest $R^2$ followed by the reduced form model. The OFI and OW models have slightly lower $R^2$. But, looking at the RMSE values from the out-of-sample performance its clear that the OFI model seems to be the best.

When we plot the resulting impacts from the 4 models we generally see they agree, with only the OFI model being the most different. This difference comes from the lack of time decay from the previous volumes.

Conclusion

Overall, I don’t think these results are that informative, my data set is tiny compared to the paper (1 day vs months). Instead, use this as more of an instructional on how to fit these models. We didn’t even explore optimising the time decay ($\beta$ values) for Bitcoin which could be substantially different from the paper dataset on equities. So there is plenty more to do!

Importance Sampling, Reinforcement Learning and Getting More From The Data You Have

2024-12-17T00:00:00+00:00

A new paper hit my feed Choosing trading strategies in electronic execution using importance sampling. I’ve only encountered sampling as part of a statistical computing course as part of my PhD, and I had never strayed away from Monte Carlo sampling, but this practical example provided an intuitive understanding of its importance and utility.

Enjoy these types of posts? Then sign up for my newsletter.

The key tenet of the paper is to use the data you have to evaluate a strategy you are considering without actually running the new strategy in production. In real life, changing something like these strategies can take a long time, with limited upside but unlimited downside if it all goes wrong.

This blog post will run through the paper and replicate the main themes in Julia. I believe the author is a Julia user too, I remember enjoying their JuliaCon talk about high-frequency covariance matrices - HighFrequencyCovariance: Estimating Covariance Matrices in Julia and the associated Julia package HighFrequencyCovariance.jl

The Execution Traders Problem

You are an execution trader with access to 4 different broker algorithms (algos) to execute your trade. With each trade you need to choose an algo and measure the trade’s overall slippage - the price you paid vs the price at the start of the order. You want to choose the best algo to ensure each of your trades gets the best price.

How do you choose what one to use? Do you have enough data to decide what one is the best one? Is any one algo better than the other? These are all difficult questions to answer but with some data on how the algos performs you should be able to use the data to help inform your decision.

We are trying to maximise the performance of each trade by choosing the correct algo. Our trade is described by a variable $x$ and each algo performs differently depending on $x$. The paper calls the performance ‘slippage’ but then tries to maximise the slippage which sounds weird to me - I always talk about minimising slippage! But that’s splitting hairs.

The performance of algo $i$ is described by an analytical function with parameters $\alpha _i, \beta _i$ plus some noise that depends on the duration of the trade $d$ and the volatility $\sigma$.

function expSlippage(x, alpha, beta)
   @. -alpha*(x - beta)^2 
end

function slippage(x, alpha, beta, d, sigma)
    expSlippage(x, alpha, beta) + rand(Normal(0, d*sigma/2))
end

The $\alpha$’s and $\beta$’s are simple constants set in the paper.

alphas = [5,10,15,20]
betas = [0.2, 0.4, 0.6, 0.8]

x = collect(0:0.01:1)
p = plot(xlabel = "x", ylabel = "Expected Slippage")
for i in eachindex(alphas)
   plot!(p, x, expSlippage(x, alphas[i], betas[i]), label = "Algo " * string(i), lw = 2) 
end
p

Here we can see where each algo is better for each $x$. In reality, this is impossible to know or it might not even exist.

We are going to devise a rule of when we will select each trading algo:

If $x<0.5$ then we will randomly select Strategy 1 62.5% of the time and the others 12.5% of the time.
If $x>0.5$ then Strategy 3 62.5% and the others 12.5%.

function tradingRule(x)
    if x < 0.5
        return [0.625, 0.125, 0.125, 0.125]
    else 
        return [0.125, 0.125, 0.625, 0.125]
    end
end

Julia’s vectorisation makes it easy to simulate going through multiple trades.

x = rand(Uniform(), 100)
d = rand(Uniform(), 100)
stratProbs = tradingRule.(x)
strat = rand.(Categorical.(stratProbs))
stratProb = getindex.(stratProbs, strat)
slippageVal = slippage.(x, alphas[strat], betas[strat], d, 5)

res = DataFrame(x=x, d=d, strat=strat, stratProb=stratProb, prob=stratProb, slippage=slippageVal)
first(res, 3)

x	d	strat	stratProb	prob	slippage
0.0192748	0.95432	1	0.625	0.625	1.29969
0.0700494	0.930581	1	0.625	0.625	0.855019
0.925858	0.90087	3	0.625	0.625	-2.62943

This is our ‘production data’ for 100 random trades. The aim of the game is to understand how good our trading rules are rather than trying to estimate how good the individual algos are.

Does our rule above do better than just randomly choosing an algo? This is where we can use importance sampling to take the 100 trades and specially weight them to assess a new trading rule.

Importance Sampling

Importance sampling is about using observed probabilities $q$ and observations of a variable with different probabilities $p$. In our case we want to calculate the expected slippage of a trading strategy given the observations we have of the current strategy.

\[\mathbb{E} [\text{Slippage}] = \frac{1}{N} \sum _i \text{Slippage}_i \frac{p_i(\text{New Strategy})}{q_i(\text{Current Strategy})}\]

$q_i(\text{Current Strategy})$ is equal to the stratProb column in the dataframe and $p_i$ is the probability we would have chosen the given algo under the new strategy.

For the importance sampling, we calculate the likelihood ratio using equal probabilities and then take the weighted average of the slippages.

res = @transform(res, :EqProb = 0.25)
res = @transform(res, :ratio = :EqProb ./ :stratProb)
@combine(res, :StratSlippage = mean(:slippage), :EqStratSlippage = mean(:slippage, Weights(:ratio)))

StratSlippage	EqStratSlippage
-1.02243	-1.8774

The average slippage for the 100 trades is worse (more negative) that the current strategy. This suggests that randomly choosing would perform worse.

Then plotting the average slippage across the orders.

res = @transform(res, :StratSlipapgeRolling = cumsum(:slippage) ./collect(1:length(:slippage)))
res = @transform(res, :EqSlipapgeRolling = cumsum(:slippage .* :ratio) ./cumsum(:ratio))

plot(res.StratSlipapgeRolling, label = "Production", lw =2)
plot!(res.EqSlipapgeRolling, label = "Equal Weighted", lw =2)

The timeseries of the slippage shows that the equally weighted strategy is worse, so gives us confidence in the current strategy. When we observe a bad outcome the likelihood ratio weights that outcome based on how different the probability is from the production strategy.

How can we use importance sampling to build better strategies?

Easy Reinforcement Learning and Expected Slippage

Each trade is described by $x$. In this toy model that is just a number but in real life this could correspond to the size of the order, the asset, the time of day and any combination of variables. In the original paper they use the spread, volatility, order size relative to the ADV and duration as descriptive variables of a random dataset. I’m going to keep it simple and stick to $x$ being just a single number.

We want to understand if a particular $x$ means we should use algo $i$. For this, we need to build an ‘expected slippage’ model where we use the historical $x$ values and outcomes of using algo $i$.

For the modelling part, we will use xgboost through MLJ.jl.

using MLJ
xgboostModel = @load XGBoostRegressor pkg=XGBoost verbosity = 0
xgboostmodel = xgboostModel(eval_metric=["rmse"]);

The inputs are $x$ and an indicator of the chosen algo.

res2 = coerce(res[:,[:x, :strat, :slippage]], :strat=>Multiclass);

y, X = unpack(res2, ==(:slippage); rng=123);

encoder = ContinuousEncoder()
encMach = machine(encoder, X) |> fit!
X_encoded = MLJ.transform(encMach, X);

xgbMachine = machine(xgboostmodel, X_encoded, y)

evaluate!(xgbMachine,
          resampling=CV(nfolds = 6, shuffle=true),
          measures=[rmse, rsq],
          verbosity=0)

The overall regression gets an $R^2$ of 0.5 on our 100 trade dataset - a decent model.

In this new simulation, we will fit the xgboost model on the trades to build up an expected slippage model with all the data we have so far. prepareData and fitSlippage transform the data and fit the model.

We will then use this model to predict the expected slippage (predictSlippage) for each algo and use that to selected what algo to use for a given trade.

function prepareData(x, strat, slippage)
    res = coerce(DataFrame(x=x, strat=strat, slippage=slippage), :strat=>Multiclass);
    y, X = unpack(res, ==(:slippage); rng=123);
    encoder = ContinuousEncoder()
    encMach = machine(encoder, X) |> fit!
    X_encoded = MLJ.transform(encMach, X);
    return X_encoded, y
end

function fitSlippage(x, strat, slippage, xgboostmodel)
    X_encoded, y = prepareData(x, strat, slippage)
    xgbMachine = machine(xgboostmodel, X_encoded, y)

    evaluate!(xgbMachine,
          resampling=CV(nfolds = 6, shuffle=true),
          measures=[rmse, rsq],
          verbosity=0)
    return (xgbMachine, encMach)
end

function predictSlippage(x, xgbMachine, encMachine)
    X_pred = DataFrame(x=x, strat = [1,2,3,4], slippage = NaN)
    X_pred = coerce(X_pred[:,[:x, :strat, :slippage]], :strat=>Multiclass)
    X_pred = MLJ.transform(encMach, X_pred)
    preds = MLJ.predict(xgbMachine, X_pred)
    return(preds)
end

function slippageToProb(preds)
    scores = exp.(preds) ./ sum(exp.(preds))
    p = ((0.9 .* scores) .+ 0.025) ./ sum((0.9 .* scores) .+ 0.025) 
    return p
end

The predicted slippage is then transformed into a probability using the softmax function (slippageToProb) which gives us a mapping of the real-valued estimated slippage onto a probability. We then sample which strategy to use from this probability. By adding an element of randomness into the algo selection we are making sure we can use the importance sampling framework to either change the model (xgboost to something else) or change how we build the probabilities (softmax to something else).

To simulate the problem we will start by randomly choosing a strategy for the first 200 runs. After this we will start using the xgboost regression model to predict the expected slippage of each strategy and use this to decide what strategy to use.

epsilon = 0.05
volatility = 5
N = 1000

x = zeros(N)
strat = zeros(N)
slippages = zeros(N)
d = zeros(N)
stratProb = zeros(N)

for i in 1:N
    xVal = rand(Uniform())
    dVal = rand(Uniform())

    if i > 200
        xgbMachine, encMachine = fitSlippage(x[1:i], strat[1:i], slippages[1:i], xgboostmodel)
        predCost = predictSlippage(xVal, xgbMachine, encMachine)
        stratProbs = slippageToProb(predCost)
    else
        stratProbs = [0.25, 0.25, 0.25, 0.25]
    end

    stratVal = rand(Categorical(stratProbs))
    slippageVal = slippage(xVal, alphas[stratVal], betas[stratVal], dVal, volatility)
    
    x[i] = xVal
    strat[i] = stratVal
    stratProb[i] = stratProbs[stratVal]
    slippages[i] = slippageVal
    d[i] = dVal
end

res = DataFrame(x=x, d=d, strat=strat, stratProb=stratProb, slippage=slippages)

Again, we output each strategy and the probability the strategy was used. We use the importance sampling approach to estimate the slippage for choosing an algo randomly to gives us a comparison to the xgboost method.

res = @transform(res, :EqProb = 0.25)
res = @transform(res, :EqRatio = :EqProb ./ :stratProb)
res = @transform(res, :StratSlipapgeRolling = cumsum(:slippage) ./collect(1:length(:slippage)))
res = @transform(res, :EqSlipapgeRolling = cumsum(:slippage .* :EqRatio) ./cumsum(:EqRatio));

plot(res.StratSlipapgeRolling[50:end], label = "Production")
plot!(res.EqSlipapgeRolling[50:end], label = "Equal Weighting")

For the first 200 trades we are just selecting randomly, so no difference in performance. Then afterwards we can see the XGBoost model starts to outperform as it learns what algo is better for each $x$. So whilst we have only run the XGBoost model in production it has shown it is doing better than random by using the importance sampling method.

Testing a New Model Without Running it in Production

The XGBoost model is doing well and out-performing an equal weighted model, but what if you wanted to change from XGBoost to something else? How can you build the case that this is something worth doing?

By constructing new probabilities of whether the strategy would be selected (new $p_i$’s) and with the current strategy probabilities ($q_i$’s) we can estimate the slippage of the new model without having to run any more trades.

With MLJ.jl we can create a new model and pass it into the functions to replicate running the strategy in production. This time we use a simple linear regression model with the same features. We run through the trades in the same order so there is no information leakage.

@load LinearRegressor pkg=MLJLinearModels

linreg = MLJLinearModels.LinearRegressor()

newProb = ones(N) * 0.25

for i in 1:(N-1)

    if i > 200
        linMachine, enchMachine = fitSlippage(res.x[1:i], res.strat[1:i], res.slippage[1:i], linreg)
        predSlippage = predictSlippage(res.x[i+1], linMachine, enchMachine)
        stratProbs = slippageToProb(predSlippage)
        newProbVal = stratProbs[Int(res.strat[i+1])]
        newProb[i] = newProbVal
    end
    
end

res[:, :LinearProb] = newProb

res = @transform(res, :LinearRatio = :LinearProb ./ :stratProb)
res = @transform(res, :LinearSlipapgeRolling = cumsum(:slippage .* :LinearRatio) ./cumsum(:LinearRatio))
plot(res.StratSlipapgeRolling[50:end], label = "Production")
plot!(res.EqSlipapgeRolling[50:end], label = "Equal Weighting")
plot!(res.LinearSlipapgeRolling[50:end], label = "Linear Model")

Adding the linear regression decision rule to the data gives us a way of assessing this new model without having to run it directly in production. We can see that the linear model is better than XGBoost and also better than the equal weighting.

A simple bootstrap of taking the average slippage for each strategy a random amount of times provides the simplest performance measure.

bs = mapreduce(x-> @combine(res[sample(201:nrow(res), nrow(res)-200), :], 
              :StratSlippage = mean(:slippage), 
              :EqStratSlippage = mean(:slippage, Weights(:EqRatio)),
              :LinearStratSlippage = mean(:slippage, Weights(:LinearRatio))),
			  vcat, 1:1000);

@combine(groupby(stack(bs), :variable), :avg = mean(:value), :sd = std(:value))

variable	avg	sd
StratSlippage	-1.55385	0.0967389
EqStratSlippage	-1.59169	0.119028
LinearStratSlippage	-1.52706	0.133231

As it’s a toy problem, nothing of significance between the models - but both models do better than the random allocation.

Conclusion

Importance sampling gives you a way of getting more out of the current data and strategy you are using. By weighting the observations in a new way you can get an idea whether a new strategy is worth it or not. By rethinking your current setup you can easily add a bit of randomness into decisions and use the importance sampling framework going forward.

Alpha Capture and Acquired

2024-09-19T00:00:00+00:00

People are never short of a trade idea. There is a whole industry of researchers, salespeople and amateurs coming up with trading ideas and making big calls on what stock will go up, what country will cut interest rates and what the price of gold will do next. Alpha capture is about systematically assessing ideas and working out who has alpha and generates profitable ideas and who is just making it up as they are going along.

Enjoy these types of posts? Then sign up for my newsletter.

Alpha capture started as a way of profiling a broker’s stock recommendation. If you have 50 people recommending you 50 different ideas, how do you know who is good? You’ll quickly run out of money if you blindly follow all the recommendations that hit your inbox. Instead, you need to profile each person’s idea and see who on average can make good recommendations. Whoever is good at picking stocks probably deserves more of your business.

It has since expanded that some hedge fund have internal desks that are doing a similar analysis on their portfolio managers (PMs) to double down on profitable bets and mitigate risks of all the PMs picking the same stock. Picking stocks and managing a portfolio across many PMs are two different skills and different departments at your modern hedge fund.

A simple way to measure the alpha of a PM or broker recommendation will be to see if the price of a stock they buy (or recommend) goes up after the day they suggest it. Those with alpha would see their picks move higher on a large enough sample and those without alpha would average out to zero, some ideas would go higher, some ideas lower, the net result being 0 alpha. If a PM has the opposite effect, every stock they buy goes down they are a contrarian indicator so take their idea and do the opposite!

Alpha Capture Systems: Past, Present, and Future Directions goes through the history of alpha capture and is a good short read that inspired this blog post.

Basic Alpha Capture

What if we wanted to try our own Alpha Capture? We need some stock recommendations and a way of calculating what happens to the price after the recommendation. This is where the Acquired podcast comes in.

Acquired tells the stories and strategies of great companies (taken from their website). It’s a pretty popular podcast and each episode gets close to a million listeners. So this makes it an ideal Alpha Capture study - when they release an episode about a company does the stock price of that company go higher or lower on average? If it were to go higher then each time an episode is released call your broker and go long the stock!

They aren’t explicitly recommending a stock by talking about it, as they say in their intro. So it’s just a toy exercise to see if there is any correlation between the stock price and the release date of an episode.

To systematically test this we need to get a list of the episodes and calculate a ‘markout’ from each episode.

Collecting Podcast Data

The internet is a wonderful thing and each episode of Acquired is available as a XML feed from transistor.fm. So doing some fun parsing of XML I can get the full history of the podcast with each date and title.

function parseEpisode(x)
  rawDate = first(simplevalue.(x[tag.(x) .== "pubDate"]))
  date = ZonedDateTime(rawDate, dateformat"eee, dd uuu yyyy HH:MM:ss z")

  Dict("title" => first(simplevalue.(x[tag.(x) .== "title"])),
       "date" =>date)
end

function parse_date(t)
   Date(string(split(t, "T")[1]))
end

url = "https://feeds.transistor.fm/acquired"

data = parse(Node, String(HTTP.get(url).body))

episodes = children(data[3][1])
filter!(x -> tag(x) == "item", episodes)
episodes = children.(episodes)

episodeData = parseEpisode.(episodes)

episodeFrame = vcat(DataFrame.(episodeData)...)
CSV.write("episodeRaw.csv", episodeFrame)

After writing the data to a CSV I need to somehow parse the episode title into a stock ticker. This is a tricky task as the episode names are human friendly not computer friendly. So time for our LLM overlords to lend a hand a do the heavy lifting. I drop the CSV into Perplexity and prompt it to add the relevant stock ticker to the file. I then reread the CSV into my notebook.

episodeFrame = CSV.read("episodeTicker.csv", DataFrame)
episodeFrame.date = ZonedDateTime.(String.(episodeFrame.date), dateformat"yyyy-mm-ddTHH:MM:SS.sss-z")

vcat(first(@subset(episodeFrame, :stock_ticker .!= "-"), 4),
        last(@subset(episodeFrame, :stock_ticker .!= "-"), 4))

date `ZonedDateTime`	title `String`	stock_ticker `String15`	sector_etf `String7`
2024-03-17T17:54:00.400+07:00	Renaissance Technologies	RNR	PSI
2024-02-19T17:56:00.410+08:00	Hermès	RMS.PA	GXLU
2024-01-21T17:59:00.450+08:00	Novo Nordisk (Ozempic)	NOVO-B.CO	IHE
2023-11-26T16:24:00.250+08:00	Visa	V	IPAY
2018-09-23T18:28:00.550+07:00	Season 3, Episode 5: Alibaba	BABA	KWEB
2018-08-20T09:20:00.370+07:00	Season 3, Episode 3: The Sonos IPO	SONO	GAMR
2018-08-05T18:15:00.030+07:00	Season 3, Episode 2: The Xiaomi IPO	XIACF	KWEB
2018-07-16T21:40:00.560+07:00	Season 3, Episode 1: Tesla	TSLA	TSLA

It’s done an ok job. Most of the episodes seem to correspond to the right ticker but we can see it has hallucinated the RenTech stock ticker as RNR. RenTech is a private company, no stock ticker and instead, Perplexity has decided the RNR (a reinsurance company) is the correct stock ticker. So not 100% accurate. Still, it has saved me a good chunk of time and we can move on to getting the stock price data.

We want to measure the average price move of a stock after an episode is released. If Acquired had stock-picking skill, you expect the price to increase after the release of an episode as they are generally speaking positively about the various companies.

So using AlpacaMarkets.jl we get the stock price for the days before and the days after the episode. As AlpacaMarkets only has US stock data then only some of the episodes end up with a full dataset.

What is a Markout?

We calculate the percentage change relative to the episode date and then aggregate all the stock tickers together.

\[\text{Markout} = \frac{p - p_{\text{episode released}}}{p_{\text{episode released}}}\]

Acquired is about great companies so they choose to speak favourably about a company, therefore I think it’s a reasonable assumption that we expect the stock price to increase after everyone gets round to listening to it. So once we aggregate all the episodes we should hopefully have enough data to decide if this is true.

function getStockData(stock, startDate)
  prices = AlpacaMarkets.stock_bars(stock, "1Day", startTime=startDate - Month(1), limit=10000)[1]
  prices.date .= startDate
  prices.t = parse_date.(prices.t)
  prices[:, [:t, :symbol, :vw, :date]]
end

function calcMarkout(data)
   arrivalInd = findlast(data.t .<= data.date)
   arrivalPrice = data[arrivalInd, :vw]
   data.arrivalPrice .= arrivalPrice
   data.ts = [x.value for x in (data.t .- data.date)]
   data.markout = 1e4*(data.vw .- data.arrivalPrice) ./ data.arrivalPrice
   data
end

res = []

for row in eachrow(episodeFrame)
    
    try 
        stockData = getStockData(row.stock_ticker, Date(row.date))
        stockData = calcMarkout(stockData)
        append!(res, [stockData])
    catch e
        println(row.stock_ticker)
    end
end

res = vcat(res...)

With the data pulled we now aggregate by each day before and after the episode.

markoutRes = @combine(groupby(res, :ts), :n = length(:markout), 
                                         :avgMarkout = mean(:markout),
                                         :devMarkout = std(:markout))
markoutRes = @transform(markoutRes, :errMarkout = :devMarkout ./sqrt.(:n))

Always need error bars as this data gets noisy.

markoutResSub = @subset(markoutRes, :ts .<= 60, :n .>= 10)
plot(markoutResSub.ts, markoutResSub.avgMarkout, yerr=markoutResSub.errMarkout, 
     xlabel = "Days", ylabel = "Markout", title = "Acquired Alpha Capture", label = :none)
hline!([0], ls = :dash, color = "grey", label = :none)
vline!([0], ls = :dash, color = "grey", label = :none)

Not really a pattern. The majority of the error bars are intercepting zero after the podcast is released. If you squint a little bit there seems to be a bit of a downward trend post-episode which would suggest they talk about a company at the peak of the stock price.

Beforehand there is a bit of positive momentum, again suggesting that they release the podcast at the peak of the stock price. Now this is even more of a stretch given there is only 1 podcast a month and it takes more than 20 days to prepare an episode (I imagine!), so more noise than signal.

markoutIndRes = @combine(groupby(res, [:symbol, :ts]), :n = length(:markout), 
                                         :avgMarkout = mean(:markout),
                                         :devMarkout = std(:markout))
markoutIndRes = @transform(markoutIndRes, :errMarkout = :devMarkout ./sqrt.(:n))

p = plot()
hline!(p, [0], ls = :dash, color = "grey", label = :none)
vline!(p, [0], ls = :dash, color = "grey", label = :none)
for sym in ["TSLA", "V", "META"]
   markoutResSub = sort(@subset(markoutIndRes, :symbol .== sym, :ts .<= 60, :n .>= 1), :ts)
    plot!(p, markoutResSub.ts, markoutResSub.avgMarkout, yerr=markoutResSub.errMarkout, 
     xlabel = "Days", ylabel = "Markout", title = "Acquired Alpha Capture", label = sym, lw =2) 
end
p

When we pull out 3 examples of episodes we can see the randomness and specifically the volatility of TSLA here.

Conclusion

From this, we would not put any specific weight on the stock performance after an episode is released. There doesn’t appear to be any statistical pattern to exploit. No alpha means no alpha capture. It is a nice exercise though and has hopefully explained the concept of a markout.

Solving the Almgren Chris Model

2024-06-06T00:00:00+00:00

The Almgren Chris model from Optimal Execution of Portfolio Transactions is the most well known optimal execution model and provides the foundational math about how to think about trading some quantity of an asset. This blog post goes through the math and how we set the problem up and arrived at the various solutions.

Enjoy these types of posts? Then sign up for my newsletter.

I first encountered the Almgren Chriss model in my initial PhD year through a Microstructure and Machine Learning course. It was for 2 hours at 18:00 on a Friday night and on the other side of London from where I lived, so a bit of a pain for me to attend. This post in essence is inspired by these notes as I’ve always wanted to summarise them into a digital version. So this is a maths-heavy post that will act as a springboard for some more future content.

The Trading Problem

We have $X$ amount of something to trade over some time$0$ to $T$ such that $X_T = 0$. How should we slice and dice our trades to minimise the execution cost?

We need a model of

How the price moves
How our trading affects prices

then we can build a trading cost function that we then optimise in different ways.

Price Dynamics

The price evolves like $S_t = \bar{S} _t + \eta v_t + \theta (X_0 - X_t),$

$\bar{S} _t$ is the unperturbed stock price
$\eta \cdot v_t$ is the temporary market impact that scales with the trading speed $v_t$
$\theta \cdot (X_0 - X_T)$ is the permanent market impact

The unperturbed price is a simple Gaussian random walk with no drift: $\mathrm{d} \bar{S} _t = \sigma S_0 \mathrm{d} W_t$

The trading rate $v_t = - \frac{\mathrm{d} X_t}{\mathrm{d}t} = - \dot{X} _t$ so simply the speed at which we are executing the trades.

So the fundamental price ($\bar{S}$) evolves as a random walk but our actions of trading means that the observed price is higher by an amount proportional to our trading speed. The signs of the components are set up such that we are buying - so the faster we trade the more we distort the price from the true price by pushing it higher

Trading Costs

The final cost of the execution is the sum of the amount we traded multiplied by the price of all the trades. In continuous time this is simply the integral of this observed stock price multiplied by the trading speed over the execution window:

\[C_{0, T} = \int _0 ^T S_t v_t \mathrm{d} t,\]

which after inserting the equation for the asset price gives us three different components

\[C_{0_,T} = \underbrace {\int _0 ^T \bar{S_t} v_t \mathrm{d} t}_\text{(1)} + \underbrace{\int_0 ^T \eta v_t ^2 \mathrm{d} t}_\text{(2)} + \underbrace{\int _0 ^T \theta (X_0 - X_t) v_t \mathrm{d}t}_\text{(3)}\]

Term $(1)$ we use integration by parts:

\[\begin{align*} \int _0 ^T \bar{S_t} v_t \mathrm{d} t & =- \int _0 ^T \bar{S_t} \mathrm{d}X_t \\ & = - \left[\bar{S_t} X_t \right]_0^T + \int _0 ^T X_t \mathrm{d} \bar{S_t} \\ & = -(\bar{S}_TX_T - \bar{S}_0X_0) + \int _0 ^T X_t \sigma S_0 \mathrm{d} W_t \\ & = \bar{S_0} X_0 + \int _0 ^T X_t \sigma S_0 \mathrm{d} W_t \end{align*}\]

$\int _0 ^T \bar{S} _t v_t \mathrm{d}t = - \int _0 ^T \bar{S} _t \mathrm{d} x_t$ which with integration by parts and substituting in the GBM part

\[X_0 S_0 + \int _0 ^T x_t \sigma S_0 \mathrm{d} W_t\]

For term (3)

\[\theta \int _o ^T (X_0 - X_t) v_t \mathrm{d} t= -\theta \int _0 ^T (X_0 - X_t) \mathrm{d} X_t\] \[= \frac{\theta ^2}{2}\]

which gives us a formula for $C_{0, T}$

\[C_{0, T} = X_0 S_0 + \int _0 ^T X_t \sigma S_0 \mathrm{d} W_t + \eta \int _0 ^T v_t ^2 \mathrm{d}t + \frac{\theta ^2}{2}.\]

This is our expected cost function and we want to find the $v_t$ that minimises the final cost.

Minimising the Expected Cost

If we take expectations (we want to minimise the average execution path - each path will be different as it is a stochastic problem) we end up with just one term we can influence the expected cost:

\[\mathbb{E}[C] = \underbrace{X_0 S_0 + \frac{\theta ^ 2}{2}}_{\text{Constant}} + \underbrace{\mathbb{E} \left[\int _0 ^T X_t \sigma S_0 \mathrm{d} W_t \right]}_{ \mathbb{E}[ \mathrm{d}W_t] = 0} + \mathbb{E} \left[ \eta \int _0 ^T v_t ^2 \mathrm{d}t \right]\]

So we minimise the expected cost by finding the trading speed that minimises this term

\[\min _{v_t} \eta \int _0 ^T v^2_t \mathrm{d} t.\]

To solve this we apply the Euler-Lagrange equation to minimise the action. The action is the term inside the integral.

\[\frac{\partial f}{\partial X} = \frac{\mathrm{d}}{\mathrm{d}t} \frac{\partial f}{\partial v}\]

And from the above

\[\begin{align*} f & = v^2_t \\ \frac{\partial f}{\partial X} & = 0 \\ \frac{\partial f}{\partial v} & = 2 v_t, \end{align*}\]

\[\frac{\mathrm{d}}{\mathrm{d} t} v_t = 0,\]

which means the speed of the execution must be constant $v_t = B$.

\[X_t = A + B t.\]

We have the boundary conditions

\[X_0 = A,\] \[X_T = X_0 + BT = 0,\] \[B = \frac{-X_0}{T},\] \[X_t = X_0 - \frac{X_0}{T} t.\]

Putting this trading schedule back into the expected cost formula gives us an overall result

\[\int _0 ^T v_t^2\mathrm{d} t = \frac{X^2_0}{T^2} (T - 0) = \frac{X_0^2}{T}.\]

When we plot this schedule we can see that the speed is constant and we are simply running a TWAP (time-weighted average price).

The maths is telling us:

To minimise cost for an amount $X_0$ then you should run your TWAP for an infinite amount of time.

This neglects the price risk, so sure, run a very long TWAP but don’t complain when the market trends against you!

How can we account for this price risk?

Mean-Variance Optimisation of the Almgren Chriss Model

We now need to minimise both the expected cost and the variance of the expected cost with our trading schedule. This means we will now be sensitive to cases where the price moves far away from the starting value.

We introduce a new parameter, $\lambda$, that controls our risk aversion. So now we are worried about the price potentially running away from us if we take too long to finish the trade

\[\min _ {v_t} \left( \mathbb{E} [C] + \lambda \text{Var} [C] \right ),\]

so now we want to minimise the average and the variation of the trading cost and see what schedule that produces.

When we took the expectation, only the deterministic bits remained. When we calculate the variance only the random bits remain

\[\text{Var} [C] = \mathbb{E} \left[ \sigma _0 \bar{S} _0 \int _0 ^T X_t \mathrm{d} t \right] ^2 = \sigma ^2 \bar{S}_0^2 \int _0 ^T X_t ^2 \mathrm{d} t,\]

which means our minimisation problem can be written as:

\[\text{min} _{v_t} \int _0 ^T v_t ^2 \mathrm{d} t + \lambda \sigma ^2 \bar{S}_0^2 \int _0 ^T X_t ^2 \mathrm{d} t.\]

Using the Euler-Lagrange equations again

\[\begin{align*} f & = A v_t^2 + B X_t^2 \\ \frac{\partial f}{\partial X} & = 2B X_t \\ \frac{\partial f}{\partial v} & = 2A v_t \\ B X_t & = A\frac{\mathrm{d} }{\mathrm{d} t} v_t \\ & = - \frac{A}{B} \frac{\mathrm{d}^2}{\mathrm{d} t^2} X_t. \end{align*}\]

This is a second-order linear ordinary differential equation with solution

\[X_t = c_1 e^{\sqrt{\frac{A}{B}} t} + c_2 e ^{- \sqrt{\frac{A}{B}} t},\]

Again, applying boundary conditions

\[X_0 = c_1 + c_2,\] \[X_T = 0 = c_1 e^{\sqrt{\frac{A}{B}} T} + c_2 e^{-\sqrt{\frac{A}{B}T}},\] \[X_t = X_0 \frac{\text{sinh} \sqrt{\frac{\eta}{\lambda \sigma ^2 \bar{S}_0}} T-t}{\text{sinh} \sqrt{\frac{\eta}{\lambda \sigma ^2 \bar{S}_0}} T}.\]

Which is a funny expression, but underneath it is just an exponential.

We now have the additional $\lambda$ parameter and so plot the execution schedule for different risk aversions

A higher $\lambda$ means a higher risk tolerance so it becomes closer to the TWAP. In general, we can see that the Almgren Chriss solution is front-loaded - most of the trading is done early on in the time window.

Summary

Ok maths over, put down your pencils and breathe. We’ve gone through the full problem set-up and show how the TWAP minimises expected costs for a risk-neutral investor and how an exponential execution schedule minimises cost for a risk-sensitive investor.

Now we know the maths we can go on to do some interesting things.

Dean Markwick

The Joys of Free Cloudflare

Cloudflare Pages

Cloudflare Workers and D1 SQL Database

A Fundamental FX Factor Model

FX vs Equities

The Data Pipeline

Downloading and Preparing the ETF Data

Getting the FX + DXY Data

FX Return Characteristics

Cross Sectional Regression for Currency Returns

How to Build the Factor Portfolios

Conclusion

Making Sense of the DXY

Twelve Data

What is the DXY?

Calculating the DXY

Individual Currency \(\beta\)’s

Conclusion

Premier League Survival – How Many Points Are Enough?

Downloading the Data

Formatting the Data

Relegation Statistics

Conclusion

Easy Neural Nets and Finance - Part 1

Predicting a Stock’s Daily Volume

Getting the Data

The Baseline Model

Neural Nets in Julia

Preparing the Data for a Neural Network

A 1 Layer Neural Net

2 Layer Neural Nets

Neural Net Regularisation

Penalising the Loss Function

Neural Net Dropout Layers

Conclusion

Other Julia/Finance Posts

Cyclical Embedding

Practical Cyclical Embeddings - Daily Volumes

Practical Cyclical Embeddings - Intraday Volumes

Conclusion

Fitting Price Impact Models

Bitcoin Price Impact Data

Calculating the Volatility and ADV

Models of Market Impact

Order Flow Imbalance

The Obizhaeva-Wang (OW) Model

Concave Propagator Model

Reduced Form Model

Conclusion

Importance Sampling, Reinforcement Learning and Getting More From The Data You Have

The Execution Traders Problem

Importance Sampling

Easy Reinforcement Learning and Expected Slippage

Testing a New Model Without Running it in Production

Conclusion

Alpha Capture and Acquired

Basic Alpha Capture

Collecting Podcast Data

What is a Markout?

Conclusion

Solving the Almgren Chris Model

The Trading Problem

Price Dynamics

Trading Costs

Minimising the Expected Cost

Mean-Variance Optimisation of the Almgren Chriss Model

Summary