Programming A to Z - Final Proposal

19 Nov 2017

For the A2Z final, I'm planning to continue working on the alternate Wikipedia text generation project I started for last semester's Nature of Code final. The project uses the Torch-RNN machine learning library for recursive neural networks to train the entirely of Wikipedia (via their data dumps) and then generate an output, ideally the same length as the current Wikipedia.

My goal is to create an alternate version of Wikipedia that introduces enough randomness to create new geographies, celebrities, mathematical concepts, etc, but to also be familiar enough that this new world will seem entirely plausible.

Since this project is large enough in scope to become my thesis project, I will have to focus on a few parts for the final. One area is that I would like to work on finding the optimal parameters for training the corpus. The other area I plan to focus on is finding the best option for "publishing" the output online. I suspect MediaWiki will work best, since it's what Wikipedia itself uses and it already has that Wikipedia aesthic, but I still need to find an optimal way for bulk creating pages that can take the output and upload it with minimal clean-up.

  • Email:
    Twitter: @coblezc
  • CC-BY