## Slippage and Root Mean Squared Error in Model performance

REAL traders all know how important slippage is.  You wanted to buy at 20.05 but got filled at 20.07.  Then you wanted to exit at 20.55 but got filled at 20.52.  The \$0.02 on the went in and the \$0.03 on the way out is what is referred to as slippage or skid. Total slippage \$0.05

When building mix models I always incorporate slippage but where has become an increasingly important question in my work.  It’s similar to when people run a regression analysis on data that has no linear time factor built it.  You have no idea if Y preceded X or X preceded Y.  Running a linear regression doesn’t make this so.

Root Mean Squared Error or RMSE is one of many ways to compare regression models.  The formula is SQRT(mean((Observed_data- Predicted_Value)^2)).

When building models I often have a maximum profit from the trigger point (trade entry).  Back to our example. If I bought in at \$20.07 and the highest print on the chart that day was 20.77. Max Profit would be \$0.70.  My question to answer is typically what features will allow me to predict that \$0.70 with the least amount of error & risk.

SLIPPAGE MATTERS! BIG TIME! I ran a Monte Carlo. 10,000 iterations. Running the same model where slippage was not subtracted from max Profit and when it was.  These were the results.

There are MANY interpretations of this.  I’ll offer a few.

• RMSE is sensitive to large numbers creating the flaccid bi-nomial distribution in the bottom graph.  Large numbers potentially produced by small max profits and large spreads. IE CMG’s spread is about \$0.20 where could be Max Profit = 0.05.
• The top graph RMSE, max profit, is normally distributed in this model because individual stock differences for max profit are not enough to disrupt the distribution.  Which suggests that my ability to predict a stock is potentially easier.
• SPREADS MATTER! The difference between the two graphs is caused by individual stock characteristics in the spread.  Like, two people, no spreads are ever really the same.  The error for my prediction goes up as we account for individual differences.

I think this illustrates something we experience every day in life. We have general predictions about what people will do in a given context (features) but everyone often does something unique which can not be accounted for.  I believe the bottom graph illustrates this.  It also illustrates why your real life trading profits may not reflect model performance if slippage is not included.  Because what happens. When I thought I was going to make \$50.00 I only made \$35.00…. Slippage!

## Integral Theory – States

Integral theory believes we enter different states of awareness, of consciousness, etc throughout any given day.   Have you ever been unable to break an angry mood? Your state was angry.  Have you ever been highly irrational for a long time? Your state was irrational.  Have you ever fell in love & were unable to fall out of it for a long time? Your state was in love.

States greatly effect the way we trade ever single day.  Author Denise Shull’s book Market Mind Games is essentially a book about emotional / psychology states.  Her book can be summarized as follows.  Be aware of your emotional context (state) it will greatly effect your trading decisions.  She’s correct.

This is a very basic overview of how states can effect you during trading.  Denise suggests writing down your emotional state as your are trading such as “I”m afraid to take this trade” or “If I lose on this trade I’m going to be very angry with myself.”  I added to this methodology by taking it a step further and after the bell has rung to go back and ask. What makes me afraid? or IfIi become angry with myself what would happen?

## Earnings Seasons Q4: My Love Hate relationship with Gap Data

It was 2 years ago I created a profitable fade strategy. However, as my risk tolerance shrank so did my use of the strategy. The strategy was simple: find stocks that were up a huge percentage relative to themselves and the market. Most importantly, they had to have no news. No catalyst that drove them up. These stocks were more likely than not to fill their gap. However, in some cases it took up to 3 months with 100% draw down of investment capital to receive a 20% gain. It was not worth it.

I took the next best thing Earning Season Gap Data. These couple of months out of the year are packed with gaps that fill, run, or simply stagnate. Version 1.0 of the Gap Bot grabbed data from yahoo and analyzed the gap of a given stock. By the end of 2010 Q4 I had 1000+ data points. As I was using this technique to analyze earnings gaps, my data set lacked individual stock robustness. In other words, I only had two data points for Amazon, Apple, and the like. This was not sufficient (though profitable) to achieve a robust algorithmic procedure for trading earnings gaps. After 1k + lines of new code and scarping the internet I now have 50,000+ points of data dating back to 1998. With all this new data more challenges arose.

Below you will see 6 graphs. It’s obvious they provide very little value at all! Big data sets from unknown distributions will produce data in which you can not use most statistical tools (ANOVA, Regression, etc) because they are based on assumptions of a normal distribution. Looking at the histograms you can see how the outliers distort the picture, but you can’t just take them out! Take MAA on 5/6/2010 for example. It created a \$0.05 downward gap (it opened below the previous days close). It ran \$0.94 from that open, closing the gap by a huge margin. That margin is 1880% and an extremely important piece of information.

Luck for me data can be transformed. By transforming the data you can make non-normally distributed data normally distributed. The glory of this is that now we can go back to our arsenal of statistical tools and begin to play 🙂 Below is the data after it has been transformed. As Earnings Season 2011 Q4 approaches I will be posting most data and more results. Happy Trading.

## Leader & Laggers S&P 500 Update

In my last post I talked about a program that would find a stock that would lead or lag a stock you selected.  Lets take \$AAPL for example.  Below are the top 5 positive and negatively related intra-day stocks for Apple Computers calculated on 9/18/2010 with 5 minute bars.

However, there is a special attribute to this information.   \$AAPL time1 is not correlated with \$DTV time1 then squared (that is how you calculate R^2. Read about it here).  It is correlated with time3 of \$DTV.  The illistration below should help.  (Please note the prices below are for examples purposes only)

On the left is the traditional stock by stock correlation.  On the right is a leaders/laggers correlation.  What this does is allow you to have an understanding of what \$DTV will do with greater confidence than if it was correlated with \$AAPL time1.  The reason is this:  When a stock is trending that trend can change suddenly and if you’re pairs trading or using correlated stocks it will change immediately with it.  When you lead/lag the R^2 tells you that while \$APPL is rising 10 mins later \$DTV should be rising as well (within a degree of confidence of course). Also if \$APPL is reaching a bottom \$DTV should reach a bottom 10 minutes later.  The chart below should illustrate this concept

As you can see as \$AAPL hits a bottom at \$243.62 at 11:45am \$DTV hits it’s bottom at 37.70 11 minutes later at 11:56.  This is the power of lead/lag.  There are many strategies one can implement knowing this information.

I am working on posting popular lead/lag coefficient such as the ES_F, EURUSD, and \$AAPL within this up coming month.

Hope you enjoyed this post

## Leaders & Laggers of the S&P 500 Index

Some time ago I began coding a program that would iterate through a list of stocks (that list being the Standard & Poors 500 (S&P500)). I use intra-day 5 minute bars going 20 days back. The information tells me how to best hedge my current intra-day positions. For example lets say \$AAPL has a negative correlation with stock XYZ with an R^2 above .90. When holding \$AAPL for longer periods of time I know stock XYZ will follow suit in the opposite direction. If \$AAPL spikes against me and my stop lose is hit, my hedge would have covered the lose from \$AAPL. I will be posting a more detailed example as well as data this up coming week.