Is ESG Research Unreliable?

by Larry Swedroe, 8/24/20

Assessing a company’s environmental, social and governance (ESG) behavior is a qualitative, subjective undertaking. New studies show that the major firms that issue ESG “ratings” use sufficiently different criteria, which results in unreliable research findings when their databases are used.

Given that an estimated $30 trillion in assets are invested based on ESG ratings, providers are influential institutions that inform a wide range of decisions in both business and finance. The trend to ESG investing has also led to a large body of academic research on its impact. These studies often rely on ESG ratings for their empirical analyses. A problem is that while corporate bond credit ratings among different agencies are highly correlated – credit ratings from Moody’s and Standard & Poor’s are correlated at 0.994 – there is not the same consistency of ESG ratings by the various providers. This can lead to inconsistency in research findings.

Florian Berg, Julian Koelbel and Roberto Rigobon, authors of the August 2019 study “Aggregate Confusion: The Divergence of ESG Ratings,” contribute to the literature by investigating the divergence of ESG ratings. Their database is from six prominent rating agencies: KLD Research & Analytics (MSCI Stats), Sustainalytics, Vigeo Eiris (Moody’s), RobecoSAM (S&P Global), ASSET4 (Refinitiv) and MSCI IVA. They began by categorizing all indicators provided by different data providers into a common taxonomy of 64 categories and 641 indicators. They then calculated category scores for each rating by taking simple averages of the indicators that belong to the same category. Next, they estimated the original ratings to obtain comparable aggregation rules. Using the category scores established by the taxonomy, they then estimated weights of each category in a simple non-negative linear regression. They then decomposed the divergence in scores into three sources:

Different scope of categories, denoting all the elements that together constitute the overall concept of ESG performance. Attributes such as greenhouse gas emissions, employee turnover, human rights and lobbying, etc., may not be included in the scope of a rating.

Different measurement of categories – indicators that represent numerical measures of the attributes. For example, if two raters want to measure discrimination against women, one rater could look at the gender pay-gap, while the other rater would use the percentage of women on the board and/or in the workforce. The two measures may be correlated but likely deliver somewhat different results.

Different weights of categories – an aggregation rule that combines the set of indicators representing numerical measures of the attributes into a single rating. Rating agencies take different views on the relative importance of attributes and whether performance in one attribute compensates for another. For example, a rating agency that is more concerned with carbon emissions than electromagnetic fields will assign different weights than a rating agency that cares equally about both issues. Different industries might also have different weights, as some attributes are judged more important to some industries than others.

They noted: “Divergence between ratings can arise from each of these three elements, whereas differences regarding scope and aggregation rule represent different views about the definition of ESG performance, and differences regarding indicators represent disagreement about appropriate ways of measuring.” Following is a summary of their findings:

It is possible to estimate the implied aggregation rule used by the rating agencies with an accuracy north of 90% on the basis of a common taxonomy. This demonstrates that although rating agencies take very different approaches, it is possible to approximate their aggregation rule with a simple linear weighted average. They also estimated the ratings using different methodologies, e.g., neural networks and random forests. The results were virtually identical.

Scope (44%) and measurement (53%) divergence are the main drivers, while weights (3%) divergence is less important – 53% of the discrepancy comes from the fact that the rating agencies are measuring the same categories differently, and 47% of the discrepancy stems from aggregating common data using different rules. The results allow investors, companies and researchers to understand why ESG ratings differ.

Ratings from different providers disagree dramatically with the correlations of ratings between the six providers, averaging just 0.61 and ranging from 0.42 to 0.73. The correlations of the environmental ratings are slightly higher than the overall correlations, with an average of 0.65. The social and governance ratings have the lowest correlations, with an average of 0.49 and 0.38, respectively. Thus, the information that decision-makers receive from rating agencies is noisy.

Their findings led Berg, Koelbel and Rigobon to conclude:

ESG performance is unlikely to be properly reflected in corporate stock and bond prices, as investors face a challenge when trying to identify outperformers and laggards – investor tastes can influence asset prices, but only when a large enough fraction of the market holds and implements a uniform nonfinancial preference. Therefore, even if a large fraction of investors have a preference for ESG performance, the divergence of the ratings disperses the effect of these preferences on asset prices.

The divergence frustrates the ambition of companies to improve their ESG performance because they receive mixed signals from rating agencies about which actions are expected and will be valued by the market.

A significant portion of the measurement divergence is rater-specific and not category-specific, suggesting the presence of a “rater effect” – a firm that performs well (poorly) in one category for one rater is more likely to perform well (poorly) in all other categories for that same rater.

The divergence of ratings poses a challenge for empirical research, as using one rater versus another may alter a study’s results and conclusions.

Summarizing their views, the authors noted: “These results also suggest that different sustainability ratings cannot be made congruent simply by taking into account scope and weight differences. Therefore, standardizations of the measurement procedures are required. They added: “Ambiguity around ESG ratings is an impediment to prudent decision-making that would contribute to an environmentally sustainable and socially just economy.” And finally, they stated: “To change the situation, companies should work with rating agencies to establish open and transparent disclosure standards and ensure that the data is publicly accessible.”

Their findings are consistent with those of Monica Billio, Michele Costola, Iva Hristova, Carmelo Latino and Loriana Pelizzon in the June 2020 study, Inside the ESG Ratings: (Dis)agreement and Performance. The authors investigated the disagreement among ESG rating agencies in terms of scores and its effects in the identification of the constituents of ESG indexes in terms of constituents overlap (ESG agreement portfolio), and studied the performance of the ESG agreement with respect to non-ESG portfolios. They analyzed the rating criteria used by nine prominent agencies and found:

There is a lack of common characteristics, attributes and standards in defining the E, S and G components.

The lack of consensus in the ESG ratings industry can lead the agencies to have opposite opinions on the evaluated companies – agreement among rating agencies is relatively low.

The low overlap of the ESG indexes disperses the effect of preferences of ESG investors on asset prices.

The non-ESG portfolios had superior Sharpe ratios over the period 2000-19 (1.09 versus 0.69). However, factor-adjusted alphas showed no statistically significant differences – the differences in Sharpe ratios are explained by differences in exposure to common factors (beta, size, value and momentum).

The lack of a globally accepted standard methodology creates two major issues. Investors face considerable difficulties in selecting ESG targets for investment. Companies encounter significant difficulties in identifying the characteristics they should comply with in order to be included into ESG indexes. These findings help explain why sometimes the academic research on the performance of ESG investing has reached disparate conclusions.

Economic theory

Economic theory suggests that if a large enough proportion of investors choose to favor companies with high sustainability ratings and avoid those with low sustainability ratings (“sin” businesses), the favored company’s share prices will be elevated and the sin stock shares will be depressed. Specifically, in equilibrium, the screening out of certain assets based on investors’ taste should lead to a return premium on the screened assets. The result is that the favored companies will have a lower cost of capital because they will trade at a higher price-to-earnings (P/E) ratio. The flip side of a lower cost of capital is a lower expected return to the providers of that capital (shareholders). And the sin companies will have a higher cost of capital because they will trade at a lower P/E ratio. The flip side of a company’s higher cost of capital is a higher expected return to the providers of that capital. The hypothesis is that higher expected returns (a premium above the market’s required return) are required as compensation for the emotional cost of exposure to offensive companies. On the other hand, investors in companies with higher sustainability ratings are willing to accept the lower returns as the “cost” of expressing their values.

Most research, such as the 2019 study, The Contributions of Betas versus Characteristics to the ESG Premium, find that the evidence agrees with the theory. However, this is not always the case. In fact, some studies, such as the 2019 study, Foundations of ESG Investing: How ESG Affects Equity Valuation, Risk, and Performance, show that ESG investing, while not improving raw returns, may improve risk-adjusted returns, as higher ESG-rated companies have less tail risk. Other studies show that ESG returns can be enhanced by tilting portfolios to factors with higher expected returns (such as momentum, size, value, investment and profitability). And finally, the findings on returns to ESG investing can be impacted by the dramatic increase in cash flows into ESG funds. The heightened demand for ESG investments has led to rising valuations of stocks with high ESG scores relative to stocks with low ESG scores, producing short-term capital gains and blurring the expected negative premium. However, the short-term benefit comes at the expense of now lower long-term expected returns. Since it is likely that the trend favoring ESG investing will continue, the “price” ESG investors pay for expressing their social views through their investments, in the form of lower expected returns, might be offset (at least to some degree) by continued rising valuations. However, what cannot continue forever eventually ceases.

Larry Swedroe is the chief research officer for Buckingham Strategic Wealth and Buckingham Strategic Partners.

Is ESG Research Unreliable?

Sponsored Content

Trending Topics View All

Upcoming Virtual Events View All