Earnings Seasons Q4: My Love Hate relationship with Gap Data

It was 2 years ago I created a profitable fade strategy. However, as my risk tolerance shrank so did my use of the strategy. The strategy was simple: find stocks that were up a huge percentage relative to themselves and the market. Most importantly, they had to have no news. No catalyst that drove them up. These stocks were more likely than not to fill their gap. However, in some cases it took up to 3 months with 100% draw down of investment capital to receive a 20% gain. It was not worth it.

I took the next best thing Earning Season Gap Data. These couple of months out of the year are packed with gaps that fill, run, or simply stagnate. Version 1.0 of the Gap Bot grabbed data from yahoo and analyzed the gap of a given stock. By the end of 2010 Q4 I had 1000+ data points. As I was using this technique to analyze earnings gaps, my data set lacked individual stock robustness. In other words, I only had two data points for Amazon, Apple, and the like. This was not sufficient (though profitable) to achieve a robust algorithmic procedure for trading earnings gaps. After 1k + lines of new code and scarping the internet I now have 50,000+ points of data dating back to 1998. With all this new data more challenges arose.

Below you will see 6 graphs. It’s obvious they provide very little value at all! Big data sets from unknown distributions will produce data in which you can not use most statistical tools (ANOVA, Regression, etc) because they are based on assumptions of a normal distribution. Looking at the histograms you can see how the outliers distort the picture, but you can’t just take them out! Take MAA on 5/6/2010 for example. It created a $0.05 downward gap (it opened below the previous days close). It ran $0.94 from that open, closing the gap by a huge margin. That margin is 1880% and an extremely important piece of information.

Luck for me data can be transformed. By transforming the data you can make non-normally distributed data normally distributed. The glory of this is that now we can go back to our arsenal of statistical tools and begin to play 🙂 Below is the data after it has been transformed. As Earnings Season 2011 Q4 approaches I will be posting most data and more results. Happy Trading.

Enhanced by Zemanta

Eastern Psychological Association 2010 Conference

Sayeed, Dr. Gorman and I had a successful presentation on March 5th at the Eastern Psychological Association 2010 Conference presenting “An Exploratory Qualitative Analysis of the 2008 Presidential Campaign.” You can read about my current research, including the work on Presidential Leadership on the research section of my website. Below is the short abstract along with Scribd version of our paper. In addition, all graphs are included for your viewing pleasure

An Exploratory Qualitative Analysis of the 2008 Presidential Campaign.
Short Abstract: A content analysis using Hart’s DICTON program was performed on the 2008 Obama vs. McCain presidential campaign speeches. It was found that the content of the speeches varied over time on the DICTION dimensions of certainty, activity, optimism, realism, and commonality. Obama consistently demonstrated higher levels of communality throughout the campaign. Implications for dynamic, time series content analyses are discussed.

Scribd link to our paper entitled: An Exploratory Qualitative Analysis of the 2008 Presidential Campaign.

Realism Scores – Graph
Activity Scores – Graph
Certainty Scores – Graph
Commonality Scores – Graph
Optimism Scores – Graph

Where are we taking this research next? We are exploring machine learning techniques to evaluate factors of leadership.