Four Big Questions About Gen AI

With AI advancements proliferating and its role in media growing, media companies should step back and ask some key questions about the data being used to fuel its advancements.

Mary M. Collins

News coming out of the Consumer Electronics Show (CES) and from other sources indicates that AI is going to play a major role in industries, media among them, in 2024. And you can’t really talk about AI without talking about data.

The best explanation I’ve heard for how AI, specifically generative AI such as Microsoft’s Copilot (formerly Bing Chat), Google’s Bard or OpenAI’s ChatGPT work, is that each is a big math problem processing vast amounts of data. At its essence, generative AI analyzes the data to come up with a prediction for the next word in a sequence. Anyone who’s ever used one of these programs knows that you must ask a series of questions (known as “prompts”) to get an answer to a complex question.

AI is still in its infancy and is expected to improve dramatically in 2024. A recent article in The New York Times reports that the industry predicts advancements will build upon each other, creating programs that can mimic human reasoning. Expected improvements range from instant videos — the ability to produce a short video instead of a still photo — to smarter robots, which will be able to predict the right actions to handle objects of various sizes and shapes.

Media itself is poised to take advantage of AI advancements. Use cases that immediately come to mind include real-time closed captioning and translation services for video, meeting and other transcriptions, preparation of content descriptions and metadata, analytics for advertisers or the creation of formulaic content such as traffic, weather or sports reports. (Sports Illustrated recently fired employees after it was outed for using AI-produced stories attributed to faked author profiles.)

For me, there are four questions to be asked about the data being used to develop and improve AI programs.


  • Who owns the data?
  • How is it accessed?
  • How is it validated?
  • What ensures that it’s being used responsibly?

Who Owns The Data Being Used To Train AI?

The question of data ownership really must be considered in terms of personal or private data and data made public by the owner.

In terms of personal or private data, I think everyone would agree that the use of a company’s trade secrets or other data such as employee information or internal communications is prohibited. In fact, there are regulations about how companies must respond when they uncover a data breach. A little murkier, although there are some rules around it, is the collection of individual’s data.

A recent PCH Consumer Insights study on data privacy and ethics prepared in conjunction with Publisher’s Clearing House examines consumers’ thoughts concerning the use of their individual data. In a nutshell, we consumers consider all data to be personal. Eighty-nine percent of the survey’s 45,231 Americans aged 25+ said their banking information is personal. On the lowest end of the scale, 47% categorized their workout data as personal. More than half (52%) consider what they watch on televisions in their home to be personal. Further, 38% responded that they are “not comfortable at all” with having their TV viewing data being used to train AI models. Another 14% were “somewhat uncomfortable,” which means that a total 52% do not like the idea of their data being used in this way. (Presumably none of these are Nielsen households.) When asked to value their TV viewership data, 26% concluded that it was worth more than $500. Adding $500 per viewer to the expense line for advertising viewership analytics would certainly upend current and proposed models’ profitability.

These findings should not be dismissed as an interesting academic exercise. When the same consumers were asked about their probable response to what they considered inappropriate use of personal data, 58% “agreed with the statement ‘I am willing to stop interacting with companies who have a bad reputation around data,’ and 52% agreed they are ‘willing to stop using companies who don’t provide an option not to track personal data.’”

It’s wise to consider this in the light of recent news that smart television manufacturer Vizio will be sharing data from TV tuners through its data division, Inscape. The company says that 22 million TVs are opted-in, and that data will be used by stations and measurement companies, including Comscore and VideoAmp, to provide more detailed viewing information from a larger number of viewers. Is anyone other than me skeptical about how those opt-ins were obtained? If this displeases consumers, will they blame their television manufacturer, the manufacturer’s data division, the measurement company or the station providing the programming and using the data to support its ad sales efforts?

Questions about the ownership of data shared in public are getting the most press now. At the end of 2023, The New York Times sued OpenAI and Microsoft for copyright infringement based on the two companies’ use of Times content in their models. If found in the Times’ favor, such a case could set a precedent about, and a possible value for, content used to train AI models.

Interestingly, Fox Corp. just announced a blockchain-based platform called Verify that will allow its subscribers to track online uses of their content. The objectives include giving media businesses a way to prove that their content has been scraped by generative AI companies.

Other media companies, including The Associated Press and Axel Springer, have signed content licensing deals with OpenAI.

How To Access Data?

That these arrangements are news is a sign that contracts for data access are few and far between. Broadcasters certainly have signed deals with third-party providers such as Nielsen, Comscore, VideoAmp and iSpot. However, while Nielsen has separate agreements with panel participants, agreements with primary sources seem to be the exception. Yes, I’ve probably consented to my smart TV’s collection and use of my data; there’s no reasonable way to opt out.

Litigation is expensive. The Times is presumably spending a lot of time and money to press its case against the two generative AI giants. I expect a decision, if there is no out-of-court settlement, is years away. Save a class action lawsuit, consumers don’t really have the resources to tackle this.

Data Validation

Media businesses have already seen the financial fallout from station data that don’t agree with advertiser data. In most cases, stations are compelled to accept advertiser data and discount projected revenue. Deepfakes and misinformation present a greater and ongoing concern for media companies. Local news continues to enjoy significant consumer trust. As I said last month, local news overperforms on FAST platforms. Regrettably, it may only take a misstep or two to erode that confidence.

There is a more direct threat on the horizon in 2024. It’s one of AI-generated content in political advertising. Unfortunately, such content can be subject to a number of different state and federal laws. Pillsbury attorneys Lee G. Petro and Adam J. Sandler explain: “Under federal law, broadcast stations are not liable for the content of political ads that qualify as a ‘use’ by a legally qualified candidate.”

That’s the good news. The bad news is that such protection doesn’t apply to ads from third party organizations such as political action committees.

Further complicating the situation is that, while there is no current federal law about AI in political advertising, several states have such laws or are working to address the question. Political advertising dollars are projected to significantly increase station ad revenues in 2024. The potential downside is that bad data and deepfakes have the potential to cause reputational and financial harm.

Ensuring Data Is Used Responsibly

This really is the overarching question for media businesses. It’s one that may take years to answer. Studios already had access to data that can be used to synthesize actors’ voices and images or writers’ work, but not their permission. The resolutions to the recent writers and actors strikes now provide guidance.

Consumers have strong feelings about both being served irrelevant commercial content and what they consider unauthorized use of their data. The second part of that concern is being addressed with state-by-state regulation; the sunsetting of third-party cookies means the first part will get worse before it gets better. Misuse of data poses additional risks.

The use of data and AI extend well beyond media and entertainment businesses. However, media companies may have the most to lose (and gain) from their use.

Former president and CEO of the Media Financial Management Association and its BCCA subsidiary, Mary M. Collins is a change agent, entrepreneur and senior management executive. She can be reached at [email protected].

Comments (0)

Leave a Reply