The next war for the U.S. military will involve destroying #fakenews. DARPA’s plan involves a new program called Semantic Forensics which involves using technologies to automatically find and label fake text, audio, images, and video in hopes of keeping “large-scale, automated disinformation attacks” from ever happening. DARPA’s Broad Agency Announcement documents include this table looking at how they believe the program will work.

Screenshot from DARPA's Broad Agency Announcement of the SemaFor project

It appears the government will be using four Technical Areas to develop the SemaFor program.

The first TA puts together the algorithms used to detect, attribute, and characterize falsified news. DARPA’s BAA includes asking the algorithms to “analyze the content of media assets with respect to a purported source to determine if the purported source is correct.” The main goal is looking at media asset content to decided if it falsified with malicious intent “to significantly alter its tone, polarization, content, or real-world impact.”

The second TA will take TA1’s algorithms and develop a single method or score to detect falsified news. The goal for TA2’s algorithms is to automatically disseminate which media reports need to be examined by an analyst to determine whether it’s real or fake.

“This work will support TA2’s development of algorithms that leverage scores and evidence from the TA1 performers to prioritize falsified media for human review,” the author of DARPA’s BAA stresses while also noting TA2 will work with hackers to make sure the system is as secure as possible. “Such prioritization is critical for scaling up to real-world volumes of media.”

It’s TA3’s work which is the most interesting. This group appears to be the primary beta testers and evaluators of the SemaFor project by testing the algorithms with humans and fake news and social media posts. They’ll collect news and social media posts, then falsify a portion of the collected information. It will be TA3 which creates the data to test against humans and looks at their responses.

“News articles should span a range of local, national, and international events with a particular focus on stories where falsification could have significant real-world impact,” DARPA’s description of the project states while also asking for context including URLs, author, and media outlet. “Social media assets should also focus on local, national, and international events where falsification could have a significant real-world impact…All collected or falsified assets should be multi-modal, containing at least two media modalities. TA3 proposers should describe the content their collection and falsification strategies will focus on, and how that content will inform the evaluation design.”

TA3 will also try to prevent any knowledge of what’s been falsified from leaking to the outside.

TA4 is more making sure the SemaFor project is ready for the present and the future, plus troubleshooting. The crews will come up with potential problems and work with TA2 and 3 on fixing the issues.

“TA4 performers will deliver [state-of-the-art] challenges (and supporting threat models if relevant) to DARPA and TA3 starting at month 4 of the program and then at least every 6 months following for the duration of the program,” DARPA’s proposal states while noting TA4 will make sure other groups have a clear understanding of SemaFor issues. “Support for the hackathons will involve working with TA3 to curate additional generated or manipulated media for the challenge problems. If existing media is not sufficient to support the challenge, TA4 will work with TA3 to generate new media to support the challenge.”

The curious part will be TA4’s study of how humans decipher fake news along with their response to it. One would guess the work will probably involve looking at the role of confirmation bias within news reading or viewing and how people respond to learning if something is real or fake. It will also look at how a computer program could be used to help detect fake news.

The justification for the DARPA SemaFor program is to make sure humans can spot fake news quicker. DARPA is rather confident SemaFor will work because it would force those who create fake news to be perfect. “A comprehensive suite of semantic inconsistency detectors would dramatically increase the burden on media falsifiers, requiring the creators of falsified media to get every semantic detail correct, while defenders only need to find one, or a very few, inconsistencies.”

Skepticism remains. Syracuse University professor Jennifer Grygiel seemed to tell Bloomberg the idea was sound but wanted Congress to pass legislative oversight. She noted social media was being used to influence elections and found it interesting DARPA was looking at the issue. Grygiel appears to be in favor of social media regulation by saying, “Educating the public on media literacy, along with legislation, is what is important.”

She has a point on public education, but there are still other issues worth investigating. The chief question is who determines what news is fit to air or broadcast? The common notion is news entities should determine what stories end up published. Yet, multiple different newsgroups can take different angles on a story as is their wont. What Fox News may see as an important detail in a story, CNN or MSNBC may not. The same goes for local or online entities covering stories or getting quotes from people who may have witnessed an event. Sometimes ‘facts’ end up changing due to the circumstances of the event or the information given out. Even fact check websites have bias hence why so many of them exist.

The bigger question is whether a government computer program should be used to help people ‘determine’ what’s real or not. Facebook and Twitter’s algorithms are awful partially due to the agenda of those who work in their halls. It’s why alternatives exist. Doesn’t the government also have an agenda and narrative? Is it not full of individuals who may decide to follow policy or not?

This must be considered before trusting a government-created computer program to decide what’s fit to print, view, or post and what’s not.