Friday, May 18, 2012

Identifying biased Wikipedia articles... without looking at their content!

Data is often worthless without metadata.
But sometimes metadata is even more useful than data!

I created ASE, a tool to spot unnoticed biased and low-quality Wikipedia articles.
To assess articles, ASE does not even need to look at the data!
ASE just looks at the metadata, especially article history.

Context: To see how Wikipedia articles look like, check Special:Random many times. As you can see, 99.9% of Wikipedia articles are either good or have been tagged as needing references, copyedit, or other particular attention (banners at the top of articles).

ASE's goal is to spot the remaining 0.1%, which means articles that:

  • Are biased , or not of good quality (less than C on the article quality scale).
  • Have not been already tagged as needing references, copyedit, or any other.
How does ASE do this without even looking at the article?

ASE's axiom is that article quality increases with participation.
Articles that have been written by only a single editor, have a higher probability of being biased and of lower quality.
So basically the strategy is to spot articles that have been written by one editor.
Bots are filtered out, because they can not really be counted as participation.
Experience is also taken into account.

There are false positives, but accuracy is pretty good, and many Wikipedians have used ASE to spot and fix thousands of articles:

ASE is open source, of course.
If you feel like editing Wikipedia, have a look at the list of spotted articles and tips.
Good editing!
Nicolas Raoul
@nicolas_raoul