What News-Writing Bots Mean for the Future of Journalism
WHEN REPUBLICAN STEVE King beat back Democratic challenger Kim Weaver in the race for Iowa’s 4th congressional district seat in November, The Washington Post snapped into action, covering both the win and the wider electoral trend. “Republicans retained control of the House and lost only a handful of seats from their commanding majority,” the article read, “a stunning reversal of fortune after many GOP leaders feared double-digit losses.” The dispatch came with the clarity and verve for which Post reporters are known, with one key difference: It was generated by Heliograf, a bot that made its debut on the Post’s website last year and marked the most sophisticated use of artificial intelligence in journalism to date.
When Jeff Bezos bought the Post back in 2013, AI-powered journalism was in its infancy. A handful of companies with automated content-generating systems, like Narrative Science and Automated Insights, were capable of producing the bare-bones, data-heavy news items familiar to sports fans and stock analysts. But strategists at the Post saw the potential for an AI system that could generate explanatory, insightful articles. What’s more, they wanted a system that could foster “a seamless interaction” between human and machine, says Jeremy Gilbert, who joined the Post as director of strategic initiatives in 2014. “What we were interested in doing is looking at whether we can evolve stories over time,” he says.
After a few months of development, Heliograf debuted last year. An early version autopublished stories on the Rio Olympics; a more advanced version, with a stronger editorial voice, was soon introduced to cover the election. It works like this: Editors create narrative templates for the stories, including key phrases that account for a variety of potential outcomes (from “Republicans retained control of the House” to “Democrats regained control of the House”), and then they hook Heliograf up to any source of structured data—in the case of the election, the data clearinghouse VoteSmart.org. The Heliograf software identifies the relevant data, matches it with the corresponding phrases in the template, merges them, and then publishes different versions across different platforms. The system can also alert reporters via Slack of any anomalies it finds in the data—for instance, wider margins than predicted—so they can investigate. “It’s just one more way to get a tip” on a potential scoop, Gilbert says.
The Post’s main goal with the project at this point is twofold. First: Grow its audience. Instead of targeting a big audience with a small number of labor-intensive human-written stories, Heliograf can target many small audiences with a huge number of automated stories about niche or local topics. There may not be a wide audience for stories about the race for the Iowa 4th, but there is some audience, and, with local news outlets floundering, the Post can tap it. “It’s the Bezos concept of the Everything Store,” says Shailesh Prakash, CIO and VP of digital product development at the Post. “But growing is where you need a machine to help you, because we can’t have that many humans. We’d go bankrupt.”
Prakash and Gilbert take pains to stress that the system is not here to usher reporters into obsolescence. And that brings them to the second objective of Heliograf: Make the newsroom more efficient. By removing tasks like incessant poll coverage and real-time election results from reporters’ plates, Heliograf frees them up to focus on the stories that actually require human thought. “If we took someone like Dan Balz, who’s been covering politics for the Post for more than 30 years, and had him write a story that a template could write, that’s a crime,” Gilbert says. “It’s a huge waste of his time.”
So far, response from the Post newsroom has been positive. “We’re naturally wary about any technology that could replace human beings,” says Fredrick Kunkle, a Post reporter and cochair of the Washington-Baltimore News Guild, which represents the Post’s newsroom. “But this technology seems to have taken over only some of the grunt work.” Consider the election returns: In November 2012, it took four employees 25 hours to compile and post just a fraction of the election results manually. In November 2016, Heliograf created more than 500 articles, with little human intervention, that drew more than 500,000 clicks. (A drop in the bucket for the Post’s 1.1 billion pageviews that month, but it’s early days.)
Gilbert says the next step is to use Heliograf to keep the data in both machine- and human-written stories up-to-date. For instance, if someone shares a Tuesday story on Thursday, and the facts change in the meantime, Heliograf will automatically update the story with the most recent facts. Gilbert sees Heliograf developing the potential to function like a rewrite desk, in which “the reporters who gather information write more discrete chunks—here’s some facts, here’s some analysis—and let the system assemble them.”
With the rapid advances in AI technology driven by cheap computing power, Prakash sees Heliograf moving beyond mere grunt work. In time, he believes, it could do things like search the web to see what people are talking about, check the Post to see if that story is being covered, and, if not, alert editors or just write the piece itself. Of course, that’s where things could get sticky—when Facebook fired the human editors of its Trending module last year and let an algorithm curate the news, the world soon learned (falsely) that Megyn Kelly had been fired from Fox News. “Will there be controversy when the bot thinks this is important, and humans say this is important, and they’re the exact opposite thing?” Prakash asks. “It’s going to get interesting.”
The Post, like every other major news organization, is looking to tap new revenue streams, and it’s reportedly in talks to license out its CMS to clients like Tronc, a consortium that includes the Chicago Tribune, the Los Angeles Times, and dozens of other regional papers. As those newsrooms struggle with dwindling resources, it’s not hard to imagine a future in which AI plays a larger and larger role in creating journalism. Whether that’s good news for journalists and readers is another story.
In the Future, Robots Will Write News That’s All About You
HERE COME THE robot reporters. This week the AP announced it will use software to automatically generate news stories about college sports that it didn’t previously cover. Specifically, it’s turning to a content generation tool called Wordsmith, created by a Durham, North Carolina-based company called Automated Insights.
It’s latest case of big news organizations turning to algorithms to create content. The AP — which is an investor in Automated Insights — already uses Wordsmith to generate stories on corporate quarterly earnings reports. Meanwhile, automated content competitor Narrative Science provides similar services to publications such as Fortune and Big Ten Network. And a Los Angeles Times journalist used custom software to auto-generate a story minutes after an earthquake hit Los Angeles last year.
But is anyone actually reading any of this machine generated content? Automated Insights CEO Robbie Allen says that’s the wrong question to ask. Although the company generated over one billion pieces of content in 2014 alone, most of this verbiage isn’t meant for a mass audience. Rather, Wordsmith is acting as a sort of personal data scientist, sifting through reams of data that might otherwise go un-analyzed and creating custom reports that often have an audience of one.
For example, the company generates Fantasy Football game summaries for millions of Yahoo users each day during the Fantasy Football season, and it helps companies turn confusing spreadsheets into short, human readable reports. One day you might even have your own personal robot journalist, filing daily stories just for you on your fitness tracking data and your personal finances.
“We sort of flip the traditional content creation model on its head,” he says. “Instead of one story with a million page view, we’ll have a million stories with one page view each.”
Wordsmith essentially does two things. First, it ingests a bunch of structured data and analyzes it to find the interesting points, such as which players didn’t do as well as expected in a particular game. Then it weaves those insights into a human readable chunk of text. You can think of it as a highly complex form of Mad Libs — one that takes an understanding of both data and writing to create.
Allen came up with the idea eight years ago, back when he was working as an engineer for Cisco. Allen, who has written ten books, wanted to create something new, so he decided to combine his passion for computer science, writing, and sports analysis into a company called StatSheet.
“The traditional approach of hiring a lot of writers wasn’t attractive to me,” he says. “What’s exciting about sports recaps is that 90 percent of what you do is write about the numbers.”
Soon, however, Allen realized that the idea could be applied to any quantitative data — not just sports. So the company changed its name to Automated Insights to bring its technology to a wide range of industries, including finance, health care and, of course, journalism.
Today Wordsmith can only work with structured, quantitative data — the sort of things you find in well formatted spreadsheets and databases. Allen says there’s certainly potential for other companies to create software that can go further in automating research or writing by summarizing lengthy texts, rewriting press releases, or sifting through unstructured documents for insights. But he doubts that Automated Insights will stray from its roots in quantitative in the foreseeable future.
Last month the company was acquired by private equity firm Vista Equity Partners, which also owns the sports data company STATS and business intelligence company TIBCO. By partnering with Vista’s other companies, Allen says Automated Insights will have more than enough work to keep them busy. “It’s kind of a no brainer for us,” he says. “We have so much opportunity ahead of us in structured data, why take on a space that people have struggled with for years?”
In the meantime, expect to see more stories written for a very particular audience: you, and you alone.