Thesis - Self Assignment Matrix

11 Feb 2018

Working project title: Wikibabel

14 Week Project Summary: Develop a working prototype of a website for an alternate universe closely resembling our own. The site will be generated using machine learning trained on a data dump of the Wikipedia corpus. The project will be an online platform, intended for the general public, that is aesthetically similar to the existing Wikipedia and will be used in a similar way.

Install Mediawiki
Time: 2 hours
Form/Media: website
Concept/Question: create sandbox; research and implement spam-prevention techniques

Research Mediawiki templates
Time: 3 hours
Form/Media: online research, list of templates (notes)
Concept/Question: Which templates are required for a wiki? Which additional templates would be useful for my project

Install Mediawiki templates
Time: 1.5 hours
Form/Media: website
Concept/Question: install Mediawiki templates

Ontology of pages
Time: 3 hours
Form/Media: research, sketchbook
Concept/Question: create ontology of types of pages to create (people, places, etc)

Pick 1 topic and create page outline
Time: 2 hours
Form/Media: online research, sketchbook
Concept/Question: pick one topic from the ontology (people), research Wikipedia to find which sections appear most commonly for that topic (B Obama, M Obama), create an online of common sections for generated pages (Early Life, Career)

Granular corpus preparation
Time: 3 hours
Form/Media: online research, text file
Concept/Question: for each section in the outline, determine how to extract that data from Wikipedia data dump, and create corpora for each section

Preliminary Training
Time: 1 week
Form/Media: HPC cluster
Concept/Question: train corpora on HPC, sample and evaluate for readability and believability; adjust parameters based on results

Production Training
Time: 3 days
Form/Media: HPC cluster
Concept/Question: generate pages based on findings from preliminary training

Mediawiki Import
Time: 3 hours
Form/Media: text editor, website
Concept/Question: import generated pages into my Mediawiki installation

