Article Image Fake News and Benford**Q**s Law

IPFS

Fake News and Benford's Law

Written by Subject: Media -**QQ**Fake News**QQ**

Fake News and Benford's Law

1.) Benford's Law Applies to Online Social Networks - https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135169

Benford's Law states that, in naturally occurring systems, the frequency of numbers' first digits is not evenly distributed. Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford's Law applies to social and behavioral features of users in online social networks. Using social data from five major social networks (Facebook, Twitter, Google Plus, Pinterest, and LiveJournal), we show that the distribution of first significant digits of friend and follower counts for users in these systems follow Benford's Law. The same is true for the number of posts users make. We extend this to egocentric networks, showing that friend counts among the people in an individual's social network also follows the expected distribution. We discuss how this can be used to detect suspicious or fraudulent activity online and to validate datasets.

Introduction

Benford's Law states that, in naturally occurring systems, the frequency of numbers' first digits is not evenly distributed. Numbers beginning with a "1" are far more common than numbers beginning with "9"—more than six times as frequent. The exact frequency P predicted for a digit d is given by this formula:

Benford's Law is frequently used in forensic accounting, where a distribution of first digits that is outside the expected distribution may indicate fraud [1]. Research has also shown that it applies to genome data [2], scientific regression coefficients [3], election data [4, 5], the stock market [6], and even to JPEG compression [7].

We conducted an analysis over five of the most popular social networking websites and found that Benford's Law applies to the social network structure in all of them. Specifically, the first significant digit (FSD) of users' friend and follower counts on Facebook, Twitter, Google Plus, Pinterest, and LiveJournal all follow Benford's Law. Users' numbers of posts also conform to Benford. To our knowledge, this is the first time Benford's Law has been applied to social networks. We show that exceptions to this rule can uncover configurations within social media systems that lead to unexpected results.

We also show that, for any individual, the distribution of friend counts within his or her egocentric network also follows Benford's Law. When the expected distribution is violated, it indicates unusual behavior. A preliminary analysis of over 20,000 Twitter accounts showed that the 100 users whose egocentric networks deviated most strongly from the Benford's Law distribution were all engaged in suspicious activity.

We discuss how these results lead to the possibility of Benford's Law being used to detect malicious or irregular behavior on social media. We also show that it could be used to validate the sampling in social media datasets.

Rest of article HERE

2.) Combination of Natural Laws (Benford's Law and Zipf's Law) for Fake News Detection - https://link.springer.com/chapter/10.1007/978-3-030-15210-9_3

Abstract

With the increase in the number of character assassination and fake news recently happening in Nigeria, we combine Zipf's law and Benford's law to analyse and detect fake news. The problem of fake news has become one of the most prominent issues in Nigeria recently. In this chapter, the challenges fake news poses to Nigeria is briefly presented. Due to these challenges, we propose the combination of Benford's law and Zipf's law in news analysis such that the hybrid of the two laws will obey the Power law for real news and deviate for fake news. We carried out various tests on different real news sources and the result shows that real news obeys the Power law. We, therefore, propose that fake news should not obey the Power law even though we could not test on fake news sources because of the lack of verified fake news dataset.

Rest of article HERE

3.) Date Science of Fake News - https://datawarrior.wordpress.com/2017/08/11/data-science-of-fake-news/

People have been upset about the prevalence of fake news since the election season last year. Election has been a year, but fake news is still around because the society is still politically charged. Some tech companies vowed to fight against fake news, but, easy to imagine, this is a tough task.

On Aug 9, 2017, Data Science DC held an event titled "Fake News as a Data Science Challenge, " spoken by Professor Jen Golbeck from University of Maryland. It is an interesting talk.

Fake news itself is a big problem. It has philosophical, social, political, or psychological aspects, but Prof. Golbeck focused on its data science aspect. But to make it a computational problem, a clear and succinct definition of "fake news" has to be present, but it is already challenging. Some "fake news" is pun intended, or sarcasm, or jokes (like The Onion). Some misinformation is shared through Twitter or Facebook not because of deceiving purpose. Then a line to draw is difficult. But the undoubtable part is that we want to fight against news with malicious intent.

To fight fake news, as Prof. Golbeck has pointed out, there are three main tasks:

detecting the content;

detecting the source; and

modifying the intent.

Statistical tools can be exploited too. She talked about Benford's law, which states that, in naturally occurring systems, the frequency of numbers' first digits is not evenly distributed. Anomaly in the distribution of some news can be used as a first step of fraud detection. (Read her paper.)

There are also efforts, Fake News Challenge for example, in building corpus for fake news, for further machine learning model building.

However, I am not sure fighting fake news is enough. Many Americans are not simply concerned by the prevalence of fake news, but also the narration because of our ideological bias. Sometimes we are not satisfied because we think the news is not "neutral" enough, or, it does not fit our worldview.

The slides can be found here, and the video of the talk can be found here.

1-yei3Y5Z9jKbyAUFD5mgD_g

"Fake News as a Data Science Challange," Data Science DC (Aug 9, 2017). [Meetup] [slides on Google Drive] [Video on Facebook]

Jennifer Golbeck. [HTML]

Benford's Law. [Wikipedia]

Jennifer Golbeck, "Benford's Law Applies to Online Social Networks," PLoS ONE 10.8: e0135169 (2015). [PLoS]

Fake News Challenge. [HTML]

Featured image taken from http://www.livingroomconversations.org/fake_news

Rest of article HERE

4.) How Benford's Law Reveals Suspicious Activity on Twitter - https://www.technologyreview.com/2015/04/21/168363/how-benfords-law-reveals-suspicious-activity-on-twitter/

The counterintuitive distribution of digits in certain data sets turns out to be a powerful tool for detecting strange behavior on social networks.

by April 21, 2015

Back in the 1880s, the American astronomer Simon Newcomb noticed something strange about the book of logarithmic tables in his library—the earlier pages were much more heavily thumbed than later ones implying that people looked up logarithms beginning with "1" much more often than "9."

After some investigation, his concluded that in any list of data, numbers beginning with the digit "1" must be much more common than numbers beginning with other digits. He went on to formulate mathematical rationale behind this phenomenon, which later became known as Benford's law, after the physicist Frank Benford who discovered it independently some 50 years later.

Benford's law is highly counterintuitive. After all, it is not immediately clear why numbers beginning with "1" should be more common than others. Indeed, the law predicts that in data that conform to this rule, numbers with the first digit "1" should occur about 30 percent of the time while numbers beginning with the digit "9" should make up less than 5 percent of the total.

That turns out to be generally true for a wide range of data sets and, indeed, almost any data set that spans several orders of magnitude. That includes populations of towns, stock-market prices, physical constants, numbers in an issue of Reader's Digest, and so on.

Although bizarre, Benford's law turns out to be hugely useful for detecting financial fraud. The idea is that if people make up figures, the first digits in the data should be distributed fairly uniformly. Indeed, whenever there is an external influence over people's behavior, the possibility arises of a deviation from Benford's law.

Of course, a data set that deviates from Benford's law is not proof of fraud, only an indication that further investigation is required.

But while statisticians have looked for Benford's law in many data sets, they have never applied it to the world of social networks. Today that changes thanks to the work of Jennifer Golbeck at the University of Maryland in College Park. She shows that not only does Benford's law apply to many data sets associated with social networks, but that deviations from this law are clearly linked to suspicious activity online.

Rest of article HERE

5.) Connected | Official Trailer | Netflix - https://www.youtube.com/watch?v=B-aZrftUPlk&feature=emb_logo

Science reporter and host Latif Nasser investigates the fascinating and intricate ways that we are connected to each other, the world and the universe at large.

Rest of article HERE

AzureStandard