Draft Topic

This package contains a set of utilities and assets for predicting topics to new drafts based on Wikiprojects on English Wikipedia.

In this package, you’ll find the feature lists used to train models for the supported wiki. There’s a set of command-line utilities that are used to perform data pipeline operations specific to training and testing draft t opic models.

See the API Reference for low level details.

Changelog

All notable changes to this project will be documented in this file.

[0.4.1]

Added

  • Add Github Action that build & pushes to PYPI index

[0.4.0]

Changed

  • Rebuild to add compatibility for revscoring 2.11

[0.3.0]

Added

  • Adds feature lists for arwiki, cswiki, enwiki, kowiki, and viwiki

  • Adds fetch_draft_text, fetch_article_text, and taxo_label utilities

Changed

  • Use fasttext 100 cell vectors for enwiki

[0.2.0]

Added

  • Added extract_from_text utility.

  • Added fetch_text script to text for a list of page titles.

  • Feature extraction rule using revscoring extract.

  • Gradient boosting config.

  • Added Word Vectors feature lists.

  • Added mid_level_wp to arguments.

  • Added fetch_page_wikiprojects script to label pages with all wikiprojects.

  • Parser code for generating mapping of mid-level topics to wikiprojects.

  • Exception handling for request failures.

  • Added requirement for revscoring v2.5.1

  • Added requirement for mediawiki-utilities v0.4.18.

  • Parser Tests.

  • Release Criteria.

Fixed

  • Headings regex bug.

Changed

  • Escape angular brackets.

  • Refactored logic for request processing.

  • Use Pytest for testing.

  • Dataset output file contains a date now.

Removed

  • mediawiki-utilities==0.4.18

[0.1.1] - 2017-09-05

Added

  • Bootstrap code.

  • WikiProjects Parsing Script.

Indices and tables