Quantitative Analysis and Multiple Linear Regression Modeling
using StockCalc Valuation data
September 2021
Executive Summary:
StockCalc is a fundamental valuation platform. We generate valuations each night for the 8000 equities trading on NYS, NAS, ASE, TSX and TSX-V using an algorithmic approach that has been historically correct 78% of the time.
To build upon our existing product, we created a series of multiple linear regression models. We included the 6 fundamental valuation data points we generate each night (described below) along with technical indicators (Moving Averages, RSI, MACD) and commodity prices.
Using the R programming language we built linear regression models for each stock each night to enable us to test the dependent variables of recent close price, price 13, 26 or 52 weeks later against the series of independent variables (our fundamental valuations, technical indicators and commodity prices)
Our goal was to determine if we could improve upon our current approach to calculating a weighted average valuation (“The Algo”) by using this MLR approach (“The MLR”). Our current approach is correct 78% of the time going out 12 months. That means our valuation is reached 78% of the time over the next 12 months (we refer to as %success) with an average 10.2% difference between close price and our valuation (we refer to as %difference).
From the Analysis …
Near dated regression models (using most recent Close price) were able to outperform The Algo. We looked at both % success rate and % difference (price versus valuation) when making this determination.
Of note: the % difference we refer to in this document shows what the results would be if we assume returns are locked in when price reaches our valuation. Part of the analysis then looked at additional upside if we used a trailing stop protocol to let winning stocks continue to run out for 52 weeks. Those results exceed 40% annual returns once we netted winners against losers (winners up 77%, losers down 20 %)
Stocks on average reach our valuation in 13 weeks implying we can be running these models multiple times each year to magnify the returns. We will be doing more work on this to better quantify results for multiple successful models per year.
Contents
Executive Summary:
Focused on valuation; empowered by analytical tools.
Background:
“The Algo”
Methodology aka “The MLR”
SQL
Back-testing:
Initial Results:
Improving the Results:
The difference between markets:
The finer details:
There is more upside here:
How can you use/access this data?
About the author
Appendix 1: Description of the 5 fundamental valuation models used in Stockcalc
Appendix 2: Independent variables available for the regression models
Figures and Tables
Table 1 Initial regression results – various indexes
Table 2 Extended Results – Capturing 10% or more difference
Table 3 Comparing pre/including Covid results
Figure 1 Regression set-up for 13, 26 and 52 week analysis
Figure 2 sample regression logic using ADBE:NAS
Focused on valuation; empowered by analytical tools.
StockCalc is a company focused on providing the platform and data to enable its clients to undertake Fundamental Valuation of publicly traded companies. The modules that have been created help those looking to analyze companies arrive at a more thorough understanding of the value of the equity in the company they are examining.
Background:
The Stockcalc database contains 8000 equities that are run through 5 fundamental valuation models each night. We also examine Analyst consensus data and will use it if it out-performs our analysis (This occurs 5-15% of the time) Our data and analysis is available for advisors and analysts on TD Ameritrade’s Veo open Access platform as well as Schwab’s Advisor platform. StockCalc is also available in the Globe and Mail platform in Canada.
We generate up to 6 valuation points for each company in the database each night.
DCF: Discounted Cash Flow
PComps: Price Based (PE, PB, PCF) comparables
OComps: Other comparables (for companies that are not profitable)
Mul: Multiples (A comparables using the companies’ historic data)
ABV: Adjusted Book value
Analyst: Consensus Analyst Target Value
A detailed description of each of these is available in Appendix 1
Each night once all the models have been run for the 8000 stocks in the database we generate a Weighted Average Valuation (WAV) for each stock using an algorithm that selects the various models based on a series of criteria (industry, historic accuracy, range) and weighting.
From 2013 to mid 2020 our WAV values have been correct 78% of the time for the S&P 500 list of companies looking out over the next 12 months (ie Close price moved beyond our WAV over the next 12 months 78% of the time). We wanted to see if we could improve on that by generating an MLR model for each company each day. Using an MLR model approach also allowed us to incorporate other independent variables such as technical indicators and commodity prices.
“The Algo”
Looking at the S&P 500 list The Algo modeling has been correct 78% of the time: We define that as the price reached our valuation (either to the up or down side) 78% of the time over the next 12 months. By comparison, analyst consensus data is correct on average 50% of the time (various studies)
From 2013-01-01 to 2020-06-30 (ie 52 weeks ago so we can test against next 52 weeks)
Our Dataset has 368,221 records of daily weighted average valuations (we call those our WAV’s)
# records where our WAV was > ClosePrice and next 52 Week high > WAV: 140,897
# records where our WAV was <= ClosePrice and next 52 Week low <= WAV: 146,845
(140897 + 146845) / 368221 = 78%
Notes: models were initially constructed first of the month until Aug 2018 when we started running them daily for the 8000 stocks in the database.
Methodology aka “The MLR”
The regression set-up:
Each night we run the fundamental models against the 8000 stocks in our database. We constructed code using the R programming language (https://www.r-project.org/about.html) to enable us to select the data and build the regression models. We used the same daily valuation data as mentioned in The Algo to ensure consistency. We then constructed a series of linear regression models using R. Each stock was regressed each day after 50 days of data was available (generally the models started early 2017 as we had monthly data until 2018)
We tested four (4) time frames with our modeling:
Recent close price: We programmatically built the regression models using all data from 2013-01-01 to the run date less 1 day then plugged the current days model values (Fundamentals like DCF, Comparables etc along with technical or commodity prices if selected by the regression equation) to generate what the recent close price should be based on the regression equation.
Price 13, 26 and 52 Weeks later: For the other 3 time frames we needed to go back 13, 26 or 52 weeks into the data to construct the regression models as our dependent variable was the actual price 13, 26 or 52 weeks later so we could not use their present day values. We then used the recent day fundamental, technical or commodity prices (as selected by the regression equation) to generate a next 13, 26 and 52 week price estimate.
It is very important to re-iterate for these three time frames we needed to go back 13, 26 or 52 weeks into the data to construct the regression models as our dependent variable was the actual price 13, 26 or 52 weeks later. This meant we would be using data from 2013-01-01 to a given date to generate the regression equation then plugging in the data from the date we were calculating which was 13, 26 or 52 weeks later. You will see the impact in the results tables later on. The following figure depicts the set-up:
Figure 1 Regression set-up for 13, 26 and 52 week analysis
SQL:
The initial results from the r programs were written to excel then imported into our SQL database for easier processing. Our SQL database contains all the analysis we undertake each night along with the financial, ratio and price data used to generate the valuations.
Determining Model Success:
To determine if our estimate calculations were correct we compared our estimate price to most recent close to first determine if we show the stock as under or over-valued. We then looked out 13, 26 or 52 weeks to see if the High (if undervalued) or Low (if Overvalued) went beyond our estimate price. If they did we considered the model successful for that stock, date and time frame.
In the table below we show these details for Adobe (ADBE :NAS) as of Dec 31, 2018 close.
There are four reference points: Close, 13, 26 and 52 which refer to when (in the past) the regression equation was constructed (using last nights close data back to 52 weeks ago) The data from the date we are running was then run though each of those regression equations to determine what we calculate the price would be Today, 13 , 26 and 52 weeks from now.
Figure 2 Sample regression logic using ADBE:NAS
Our valuation database contains analyses going back to January 1st, 2013. For each company in the S&P 500 we generated an MLR for each date up to June 30th 2020 (52 Weeks), Dec 30th (26 week), (Mar 31st 2021 (13 Week) and June 30th 2021 (Recent close)
Initial Results:
We started by selecting all stocks in a given ETF to proxy the index lists. We used the DIA (Dow 30), SPY (S&P 500), QQQ (Nasdaq 100), IWM (Russell 2000) and XIU (Canada – SP/TSX 60). In the r code we ran each stock each day once we had 50 days of valuation available (we had monthly valuations 2013-mid-2018 then daily thereafter).
From the table below you see the initial results: For example our Algo (WAV columns) for the SPY was correct 76.93 % of the time with an average spread of 7.62% between the close price and what our valuation calculation was. Looking across the columns you see the close price had the highest % success due to the smallest time frame. It also had the least spread between price and value. For the DIA and SPY The Algo outperformed the regressions built with 52 and 26 week old data but the 13 week old data was more correct than our Algo with higher % difference between price and value.
The r2 column shows the minimum value selected based on each equations r2 value. For each model each day we captured the r2 value and used it to help in the selection. We did not track r2 for WAV.
This was the base case run to understand how The Algo compared to the MLR.
Table 1 Initial regression results – various indexes
Improving the Results:
In the next step we started to select sub-groups of data to run though the regression models. In this example we selected all stocks that showed a 10-50% difference between price and our valuation (for the Close, 13, 26 and 52 week analysis).
In the table below you can see there are 2 things happening compared to the previous table. Our average % difference between price and value has jumped to between 12 and 25% and our % success has been reduced due to the higher spread between price and value.
Note how well the close price columns performed in this table. We see a 75 to almost 90% success rate using 1 day old data to generate the regression equations with the % difference ranging from 12 to 17%. (Again noting results shown assume returns are locked in when price reaches our valuation – more on this below)
The Algo was able to outperform on the XIU data here (S&P/TSX 60 stocks) is also of note.
Table 2 Extended Results – Capturing 10% or more difference
The difference between markets:
As you can also see from the tables above we show smaller % difference between price and value for the DOW (DIA) than the SPY and IWM. This inherently makes sense given the coverage and ownership the DOW 30 stocks have compared to for example the Russell 2000 list.
The finer details:
To eliminate the impact of Covid on the markets we also tested data from 2017-2019. You can see the numbers remained consistent with the 2018 markets dragging down overall results slightly.
Table 3 Comparing pre/including Covid results
The real value starts here:
The results shown above assume returns are locked in when price reaches our valuation. To fine tune this further we selected the S&P 500 list of Undervalued stocks to Dec 31 2019 (to eliminate Covid impact again) where our models were 20% or higher than close price. We had 40 stocks during that time period meeting this criteria (64 records including Stocks that met the criteria more than once).
Thirty-five of these stocks were successful (our valuation target was exceeded over next 12 months) for an average return of 77% when we let these stocks run out for 52 weeks. The 5 stocks that were not successful had an average return of -20% 52 weeks later for a combined return of 42.5%. A trailing stop program would easily capture this additional return.
In our database, our WAV valuation models on average are being reached within 13 weeks. This implies the returns we presented in Table 2 could be achieved four times each year. The net 13 week return column in Table 2 ranges from 10 to 15% therefore 40% to 60% returns less negative returns for those stocks not achieving valuation.
The next steps in this process will be analyzing the data further to construct a protocol that can select the superior model (WAV, Close, 13) as well as reassessing when price reaches our valuation to see if we would continue to hold the equity.
How can you use/access this data?
The ability to slice and separate the data is inherent in the r code and resulting SQL database. We are well positioned to assist you with this data and analysis. StockCalc has a powerful API that provides connections for funds and firms looking to access this data. For more information please reach out to Brian to discuss
About the author
Brian Donovan is the President of StockCalc and is a Chartered Business Valuator (CBV – CICBV Website). He obtained in Finance MBA in 1999 from Wilfrid Laurier University in Canada and worked in a variety of capacities from MyPlant.com in the dot com days to founder of StockCalc in 2013. Brian can be reached by email at brian.donovan@stockcalc.com
Appendix 1: Description of the 5 fundamental valuation models used in Stockcalc
Discounted Cash Flow (DCF)
Discounted Cash Flow (DCF) valuation is a cash flow model where cash flow projections are discounted back to the present to calculate value per share. DCF is a common valuation technique especially for companies undergoing irregular cash flows such as resource companies (mining, forestry, oil and gas) going though price cycles or smaller companies about to generate cash flow (junior exploration companies, junior pharma, technology firms…).
Price Comparables
The Price Comparables valuation is the result of valuing the company we are looking at on the basis of ratios from selected comparable companies: Price to Earnings, Price to Book, Price to Sales, Price to Cash Flow, Enterprise Value (EV) to EBITDA. Each of these ratios for the selected comparable companies are averaged and multiplied by the values for the company we are interested in to calculate a value per share for our selected company.
Other Comparables
We have included the Other Comparables as a way to value companies that cannot be valued using Earnings based ratios. This technique is very useful for companies still experiencing negative cash flows such as mining exploration firms. We use Cash/Share, Book Value/Share, MarketCap, 1 Year Return, NetPPE as the ratios here. Each of these ratios for the selected comparable companies are averaged and multiplied by the values for the company we are interested in to calculate a value per share for our selected company.
Multiples
Multiples are similar to Price comparables where we look at current or historic ratios for the company in question to assess what it should be worth today based on those historic ratios. We use the same 5 ratios as in the price comparables and value the company with its historic averages.
Adjusted Book Value
With Adjusted Book Value (ABV) we calculate the book value per share for the company based on its balance sheet and multiply that book value per share by its historical price to book ratio to calculate a value per share.
Analyst Consensus
If we have Analyst coverage for the company we use the consensus target price here.
Appendix 2: Independent variables available for the regression models
Valuation:
DCF, P_Comps, O_Comps, Multiples, Adjusted Book, Analyst Consensus
Technical:
30 and 200 Day moving averages, MACD, RSI
Commodity:
Natural Gas, WTI, Brent, Gasoline, Propane
Aluminum, Copper, Nickel, Steel, Zinc, Iron Ore, Coal, Cobalt, Lead, Moly, Tin
Gold, Silver, Platinum, Palladium,
Lumber, Corn