Overview
The strategic direction of "Knowledge as a Service" envisions a world in which platforms and tools are available to allies and partners to "organize and exchange free, trusted knowledge beyond Wikimedia". Achieving this goal requires not only new infrastructure for representing, curating, linking, and disseminating knowledge, but also efficient and scalable strategies to preserve the reliability and integrity of this knowledge. Technology platforms across the web are looking at Wikipedia as the neutral arbiter of information, but as Wikimedia aspires to extend its scope and scale, the possibility that parties with special interests will manipulate content, or bias to go undetected, becomes material.
We have been leading projects to help our communities represent, curate, and understand information provenance in Wikimedia projects more efficiently. We are conducting novel research on why editors source information, and how readers access sources; we are developing algorithms to identify statements in need of sources and gaps in information provenance; we are designing data structures to represent, annotate, and analyze source metadata in machine-readable formats as well as tools to monitor in real time changes made to references across the Wikimedia ecosystem.
More information can be found in our white paper.
Recent updates
Resources and links
Research pages
- Characterizing Wikipedia Citation usage
- Identification of Unsourced Statements
- Sockpuppet detection in Wikimedia projects
- Understanding the context of citations in Wikipedia
Slides
- WikiCite: Wikidata as a structured repository of bibliographic data
- Unlocking references from the literature: The Initiative for Open Citations
- WikiCite: Citations needed for the sum of all human knowledge
- Connecting the sum of all human knowledge, one edit at a time
- WikiCite: The journey and the road ahead
Videos
- WikiCite: Wikidata as a structured repository of bibliographic data
- WikiCite: Citations for the sum of all human knowledge
- Wikidata: Verifiable, linked open knowledge that anyone can edit
- Wikipedia's role in the dissemination of scholarship
Publications
- Mykola Trokhymovych, Lydia Pintscher, Ricardo Baeza-Yates, and Diego Saez-Trumper. 2025. Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection. Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics (ACL '25 Industry).
- Aitolkyn Baigutanova, Diego Saez-Trumper, Miriam Redi, Meeyoung Cha, and Pablo Aragón. 2023. A Comparative Study of Reference Reliability in Multiple Language Editions of Wikipedia. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23). https://doi.org/10.1145/3583780.3615254
- Mykola Trokhymovych, Muniza Aslam, Ai-Jou Chou, Ricardo Baeza-Yates, and Diego Saez-Trumper. 2023. Fair multilingual vandalism detection system for Wikipedia. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23).
- Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper, Ai-Jou Chou, Miriam Redi, Changwook Jung, and Meeyoung Cha. 2023. Longitudinal Assessment of Reference Quality on Wikipedia. In Proceedings of The Web Conference 2023 (WWW '23). https://doi.org/10.1145/3543507.3583218
- Andrew Kuznetsov, Margeigh Novotny, Jessica Klein, Diego Saez-Trumper, and Aniket Kittur. 2022. Templates and Trust-o-meters: Towards a widely deployable indicator of trust in Wikipedia. CHI '22: CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3517523
- KayYen Wong, Miriam Redi, and Diego Saez-Trumper. 2021. Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia. SIGIR '21. https://doi.org/10.1145/3404835.3463253
- Rodolfo Valentim, Giovanni Comarela, Souneil Park, and Diego Saez-Trumper. 2021. Tracking Knowledge Propagation Across Wikipedia Languages. Proceedings of the Fifteenth International AAAI Conference on Web and Social Media (ICWSM '21).
- Mykola Trokhymovych and Diego Saez-Trumper. 2021. WikiCheck: An end-to-end open source Automatic Fact-Checking API based on Wikipedia. 30th ACM International Conference on Information and Knowledge Management (CIKM '21).
- Pablo Aragón and Diego Sáez-Trumper. 2021. A preliminary approach to knowledge integrity risk assessment in Wikipedia projects. MIS2'21: Misinformation and Misbehavior Mining on the Web Workshop held in conjunction with KDD 2021.
- Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, and Robert West. 2020. Quantifying Engagement with Citations on Wikipedia. In Proceedings of The Web Conference 2020 (WWW '20). https://doi.org/10.1145/3366423.3380300
- Diego Saez-Trumper. 2019. Online Disinformation and the Role of Wikipedia.
- Miriam Redi, Besnik Fetahu, Jonathan Morgan, and Dario Taraborelli. 2019. Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability. In Proceedings of The Web Conference 2019 (WWW '19). https://doi.org/10.1145/3308558.3313618
- Dario Taraborelli, Lydia Pintscher, Daniel Mietchen, and Sarah Rodlund. 2017. WikiCite 2017 Report. figshare. https://doi.org/10.6084/m9.figshare.5648233
- Dario Taraborelli, Jonathan Dugan, Lydia Pintscher, Daniel Mietchen, and Cameron Neylon. 2016. WikiCite 2016 Report. figshare. https://doi.org/10.6084/m9.figshare.4042530
