The Associated Press is more than three years into using automation to produce thousands of earnings reports stories each quarter and is now expanding into sports. And, says the AP’s Lisa Gibbs, that’s just the start of what automation and AI will do to speed the production of text and video stories and make its reporters better.
AI Promising More, Better Reporting From AP
There was much hue and cry in 2015 when the Associated Press said that “robots” would begin writing the lion’s share of its quarterly earnings report coverage.
Flash forward more than three years and automation has yet to take over the AP’s newsroom, but it is covering 12-fold more companies’ earnings and staffers have been freed to pursue more complex stories, says Lisa Gibbs, the AP’s director of news partnerships and newsroom lead on automation and its artificial intelligence strategy group.
Meanwhile, the AP is turning its attention to more machine learning experiments on its video production side. Those include AI-powered “versioning” of stories and real-time transcribing of videos in multiple languages, along with speeding up verification of social media posts and using image recognition software to better keyword photos.
In this interview with Michael Depp, TVNewsCheck‘s special projects editor, Gibbs says “robots” aren’t threatening reporters. Rather, she says, they are promising to empower them by seeking out and summarizing stories, statistics and social media from across the internet that are relevant to their beats or whatever they are working on.
An edited transcript:
The AP has one of the longest track records on the automated front in journalism. Can you frame out the areas where you’re employing it for stories now?
When we talk about using automation to turn data into tech stories, we are using that in business news and in sports. We’re generating approximately 3,700 corporate earnings stories per quarter, and then we’re producing hundreds of game stories for minor league baseball and getting into college basketball this year.
Do you see other sports into which you might expand?
We’ve talked about it, and we’re always looking for those opportunities. When trying to create stories from data, obviously having clean, consistent data is critical, so automation of text content really isn’t possible without a good data source. We tend to work with data partners. In the case of corporate earnings, we’re working with Zacks Investment Research. In the case of sports, we’re working with the leagues themselves.
What sort of editing resources do you still need to allocate to make sure that what’s going out passes basic AP muster?
Most of the hard work that journalists do around these stories is done up front. We write the templates that form what the stories are. That’s why the idea of “robot journalism” is a bit of a misnomer. The robot isn’t writing any story. Journalists write templates, which then pull data into them.
When we did our earnings project, we spent months not only working in the templates but then months of quality assurance testing. Once you’ve got everything up and running, it’s a question of spot checks and monitoring. The real work on the part of the journalist is the data maintenance.
These are obviously more earnings stories than you were covering when journalists were writing the complete report.
By a lot. We were [previously] able to write approximately 300 earnings stories per quarter. Automation gives us about a 12-fold increase in volume of companies that get some level of earnings stories. There are still about 100 companies that our beat reporters do thorough reporting and stories.
Are there common threads with errors or problems with AI-produced content?
I want to make a distinction first. A lot of people interchange automation and AI, but automated stories have zero artificial intelligence. There’s nothing smart about that. We wrote the templates, data files come in, the software spits out the story based on the data. There’s no process of learning that goes on.
That’s opposed to what we’re experimenting with now in the world of natural language processing, which is where we feed a text story into a summarization engine that spits out a two- to three-sentence summary and learns over time from sending thousands of versions of those summaries.
Our error rate for earnings stories is lower for automation than for our human stories, and that’s primarily because our robots do not make math errors and typos. In general, we believe that very little will be truly automated in the sense that it will have no human intervention. That’s why you’re seeing the idea of the “augmented journalists.” Most of these tools are really around improving efficiency.
AP also produces a lot of daily video content. How is AI being employed there to assist in production, editing or metadata?
In general, with all of these projects beyond our earnings and sports stories, we are at very early stages of testing, and there’s no automated video that we’re producing that’s going out to customers.
We’re testing a lot of tools and use cases. There’s the idea of real-time, machine learning-based transcription and translation, which can help save our video editors lots of time on the manual transcription of video.
AP has invested in Wibbitz and we are experimenting with it and Wochit in semi-automated editing to fully-automated video creating. We could use those tools to make our video production hubs more efficient and potentially be able to generate higher volumes of video.
We talk about a general theme called versioning. We have all kinds of customers: digital, broadcast. They all need different things. Some want a 90-second video, some want two-and-a-half-minutes, some want vertical video. How can we create an operation that doesn’t have a video editor sitting there creating five versions of the same video? Can we use these tools to speed up that versioning process?
So, what are you learning?
What we’re finding with these tools is generally what we find with all of these projects: When you first start doing them, they’re terrible. That gets back to the need to spend a lot of time training the robot. These systems don’t get built overnight with 10 examples. You need to feed it hundreds and hundreds of samples in order to get it smart enough to know how to do it and handle a bunch of different situations.
With automated video and Wibbitz, you take a text story and feed it into the system. It analyzes the story for keywords and then matches that up against photos and videos that have that same keyword, then generates a little mini-script from the text story, pulls together the photos and videos and produces a 90-second video.
Let’s say the script really needs work or a photo was improperly keyworded. The editor needs to go in and do some things to it, but we got you 60% of the way there.
Can it produce a summary text from video content?
We have a project underway to start automating summaries. If we can get all of this to work properly, we could use automation and AI to generate a story summary and/or script, and then another tool of automation or AI takes that summary or script and matches it up with photos and videos in your system and creates a 60- or 90-second video out of that. If all of that works properly, you can imagine how content creation becomes way more efficient than it is now.
Are you using AI in any kind of social media or messaging context right now?
When it comes to social media, we are creating a system that attempts to use machine learning to automate some of the processes that we use now for verification. We are building a tool called AP Verify that breaks down all of the steps of user-generated content verification and then examines how we can use technology to speed up each one of those.
The other thing we’re doing right now with Cortico, a division of MIT’s media lab, is to build a platform that pulls in all kinds of public commentary — whether it’s Twitter posts or public Facebook comments or what people say on talk radio — and brings it into a massive system that can then be queried by us to see if it can help give us better insights into what the public voters are really talking about and care about and how that varies from city to city, which is important to helping cover elections more authoritatively.
AP framed this initiative at the outset as a way to better position its human reporting resources in places where its actual journalists might be better deployed. How is that quantifiably happening?
We had estimated when we first launched our corporate earnings project that we saved the equivalent of three full-time staffers, which we then could assign to other beat teams or have them do other kinds of work. As the former business news editor, I think that actually underestimates the impact. It really changed our whole department thinking and enabled us to think much more creatively about the kind of storytelling we were doing.
That’s the kind of assessment we go through when evaluating any automation and AI project. We look at time saved [and] content generated. Does it help support an existing AP product? Does it potentially help create a new AP product?
AP was also emphatic early on that no human jobs were going to be lost in implementing this technology. Is that still the case?
Is there anything else in AP’s roadmap for AI and automated content?
We’re working on an image recognition project to see if we can better keyword our images using automated software — things that are very much here-and-now and not futuristic.
There’s a lot of discussion about using machine learning and AI to help reporters do their jobs better in the sense of helping them gain insights. That can be as simple as searching the world of news articles, social media and events and summarizing them in the form of daily reports for beat reporters. Machine reporters can find and read hundreds of sources way faster than a journalist could. Those kinds of newsgathering tools and data analysis we’re going to definitely explore. We’re only just beginning.