Stats and Data Science
Creating a hybrid measurement model by incorporating return path data (RPD) in TAM Audience Ratings.
Chris Mundy
June 9, 2020

In this blog we’ll look at how return path data (for example data collected from a Virgin or Sky set top box) could be used to enhance viewing data from a TV audience measurement (TAM) panel such as BARB. For clarity we’ll call the enhanced system a hybrid model.

Why might you want to incorporate RPD? Whilst TAM meter panels do a good job of measuring viewing overall, sampling error will increase as content being analysed becomes more marginal and granular; for a small channel on a narrow target audience at the programme or spot level the sampling error on published audiences may large. Whilst the panel may show zero ratings for a channel at a particular time there are likely to be some people in the population as a whole watching it. The argument goes that incorporating RPD data, either in its entirety or by way of a large sample of RPD homes, would increase the precision of the published data and certainly address the problem of zero ratings where audiences exist.

RSMB exists to find statistical solutions to media measurement challenges and, as you would expect, we have undertaken a lot of theoretical statistical analysis on this.

To successfully incorporate RPD into a measurement currency you need to crack three problems:

  1. RPD tells you what a set top box (or connected device) was doing but it doesn’t tell you if anyone was watching. This is relatively easily addressed with a capping algorithm to truncate long viewing sessions.
  2. Assuming viewing was taking place, RPD data does not tell you who was viewing so you need to model that.
  3. Having cracked problems (1) and (2) you need to be able to produce data files for analysis that are consistent. In the case of the UK, analysis produced from Database 1 (panel respondent level data to support applications like reach and frequency) needs to be consistent with Database 2 (processed data of minute by minute, programme and commercial ratings). This topic isn’t covered here.

There are various ways of tackling problem (2) and our work shows that the approach taken is the biggest determinant of the extent to which a hybrid model reduces sampling error compared to a meter panel alone.

It goes without saying that there will be assumptions of statistical independence somewhere in the model which may compromise accuracy; this must be traded-off against the gains in precision required for a usable currency. Everyone would agree that a hybrid model is not worth doing if we don’t actually get any gains in precision from incorporating RPD data.

Until it is available ubiquitous, RPD covers a subset of all viewing: one or more platforms (eg Sky) and/or devices (eg Samsung TVs). For this subset there are two components to the variance: the sampling variance in homes viewing and the sampling variance in people per home viewing – and the two effects may counter each other.

The simplest approach to converting STB data to demographic data is to use the meter panel to create “Viewers per View” for each reporting demographic for a viewing event and apply that to the set top box (STB) homes audience. However this may result in greater sampling variability than the meter data alone. Whilst a large RPD sample will reduce the variance in homes viewing, the absence of person demographics actually increases the variance in people per home and, overall sampling error can increase. The more targeted the demographic, the larger the sampling error.

That variability may be reduced by stratifying the sample. For example, if the reporting demographic is 16-24s, sampling error may be reduced by excluding STB homes that don’t include 16-24 year olds. To do that, access to household composition data is needed. If using all available STB (census) data then this could be from sign-up data, but more likely the STB data will be from a large recruited sample of STB homes and demographics will be collected as part of recruitment.

Another factor to consider is the effect of STB data, which covers only a subset of platforms and/or devices, on the variability of measured audiences as a whole. Perhaps counterintuitively, it’s not the case that measuring some platforms more accurately necessarily reduces sampling error across the whole hybrid model: it can actually make it worse. For the single source TAM measurement panel, the sum of viewing across all platforms is more robust than the separate platforms. At an extreme, people can have similar overall viewing levels but very different platform shares. Then if we correct platform A up or down but don’t make a compensating correction for platform B, the sum of platform A plus platform B is destabilised.

Steve Wilcox presented our findings from a hybrid model experiment using BARB data at the 2018 ASI conference. This compared the benefits of a 25,000 RPD “boost” with person demographics with the benefits of a census RPD boost with no person demographics. For each there were alternative models; one where the RPD included all platforms and one where it was for a single platform covering 40% of all viewing.  Three types of channel were considered: a “large” channel with an average TVR of 2.81, a “medium” sized channel with an ave TVR of 0.17 and a “small” channel with an ave TVR of 0.03.  The large and medium channels were cross-platform, the small channel was predominantly viewed on the single platform. There is too much detail to go into here, but in summary:

– the 25,000 with demographics boost covering all platforms increased effective sample size for all Adults significantly for small, medium and large channels. The effective sample size was at least 3 times higher than without the RPD boost. As we know the demographics of people living in RPD homes, these increases in effective sample size are reasonably consistent for all demographic target audiences.

– However where the 25,000 with demographics boost covered only the single platform, there were only marginal improvements to the effective sample sizes of the large and medium sized channels. This is because the sampling error for these channels is still dominated by the component of the audience on platforms only measured by the TAM panel. The small channel did see a significant increase in its effective sample size (nearly by 4 times, which again would be similar for subdemographic audiences) because it’s viewing is almost exclusive to the RPD boosted platform.

– the model that included census RPD data for all platforms (ie not a sample of RPD homes, but the entire dataset and across all platforms) saw substantial increase in effective sample sizes for the All Adults demographic: multiples of 9 times for the large and medium sized channels and 11 times for the small one. However, when you looked at subdemographics a very different picture emerged. For ABC1s, effective sample sizes didn’t improve for the large and medium sized channels although there was a slight improvement for the small channel. However for Men 16-34 the effective sample size actually fell compared to the TAM panel alone and this is because of the loss of the ability to stratify the hybrid model by household demographic composition.

– finally, the model for census RPD data for the single platform worsened the effective sample size for All Adults for the large and medium sized channels. It did improve it for the small channel by a multiple of 3, again because the channel is predominantly distributed on this platform.

So, perhaps against initial expectations, a large sample boost with demographics actually produces more precise results than a boost using census RPD data where demographics aren’t available.

It’s also worth noting that RPD data would not necessarily improve demographic precision for very small channels that the TAM panel struggles to measure with precision.

So does RSMB think RPD has a place in a hybrid model? Yes we do, but the model needs to be carefully constructed. We believe there are strong arguments for a well-managed sample boost of RPD homes with person demographics, certainly were there ever to be RPD data that covers all platforms. Such a boost may enhance sample sizes for all channels and subdemographics, with benefits more limited to small channels if the boost is limited to a single platform. That is not to say that census data can’t have a value: demographic data may be collected on sign-up or alternatively it may be used for small share platforms in a hybrid cross-media measurement model where its impact on overall sampling error will be limited.

As more and more RPD data is becoming available, not just from set top boxes but also connected televisions and viewing on mobile devices, RPD data is likely to become an integral part of future measurement solutions.