Skip to content

Improve Knowledge Integrity

We are working to extend the verifiability of content and increase resilience to misinformation.
Abstract concrete texture
Image by Scott Webb

Overview

The strategic direction of "Knowledge as a Service" envisions a world in which platforms and tools are available to allies and partners to "organize and exchange free, trusted knowledge beyond Wikimedia". Achieving this goal requires not only new infrastructure for representing, curating, linking, and disseminating knowledge, but also efficient and scalable strategies to preserve the reliability and integrity of this knowledge. Technology platforms across the web are looking at Wikipedia as the neutral arbiter of information, but as Wikimedia aspires to extend its scope and scale, the possibility that parties with special interests will manipulate content, or bias to go undetected, becomes material.

We have been leading projects to help our communities represent, curate, and understand information provenance in Wikimedia projects more efficiently. We are conducting novel research on why editors source information, and how readers access sources; we are developing algorithms to identify statements in need of sources and gaps in information provenance; we are designing data structures to represent, annotate, and analyze source metadata in machine-readable formats as well as tools to monitor in real time changes made to references across the Wikimedia ecosystem.

More information can be found in our white paper.

Recent updates

Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection

A new vandalism detection system for Wikidata using graph-linguistic fusion (Wikidata Revert Risk).

A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia

Quantifies the cross-lingual patterns of the perennial sources list, a collection of reliability labels for web domains identified and agreed upon by Wikipedia editors.

Fair multilingual vandalism detection system for Wikipedia

The next generation of ML tools for Knowledge Integrity, providing a fair multilingual vandalism detection system, now in production.

Templates and Trust-o-meters: Towards a widely deployable indicator of trust in Wikipedia

A study on designing widely deployable trust indicators for readers of Wikipedia.

Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia

A dataset of articles with reliability concerns on English Wikipedia for training language models to detect content reliability issues.

Tracking Knowledge Propagation Across Wikipedia Languages

A dataset of inter-language knowledge propagation in Wikipedia.

Research pages

Slides

Videos

Publications