Utilities

This module implements a set of utilities for extracting topic labels from English Wikipedia using the WikiProject taxonomy.

Draft Topic CLI

drafttopic

$ drafttopic -h

This script provides access to a set of utilities for extracting features
and building draft topic predictors.

* add_central_africa -- Adds "Geography.Regions.Africa.Central Africa" to
                        the labels manually.
* balance_sample -- Generates an approximately balances sample of each
                    label
* extract_from_text -- Extracts features from raw text
* fetch_article_text -- Gathers current article text for each labeling
                        observation from a MediaWiki API
* fetch_draft_text -- Gathers first revision article text for each labeling
                      observation from a MediaWiki API
* taxo_label -- Labels a set of observations based on their
                WikiProject templates
* write_labels -- Extracts all labels from a wikiprojects labeled dataset
                  and writes them out to config

Usage:
    drafttopic (-h | --help)
    drafttopic <utility> [-h | --help]
Options:
    -h | --help  Prints this documentation
    <utility>    The name of the utility to run

Sub-utilities

extract_from_text

fetch_draft_text

fetch_article_text

taxo_label

write_labels