25 Oct 2017
For the A2Z midterm, I continued working my assignment from week 3 to get the article links on the front page of various news sites (left-leaning, right-leaning), grab the text of those articles, and then use named entity recognition to extract the names of places referenced in those articles.
I got the code working to extract the articles, and also to use spaCy to extract the locations. The code is available as a Gist.
While the midterm assignment was to create a Twitter bot (or Alexa skill), I'm not sure a Twitter bot is the best medium for this project. One, the 140 character limit required cutting out some of the locations, especially if I also added the article title to the beginning. Two, this project screams for a map - a list of locations just makes sense to be rendered as a map. Three, it would be more interesting to track changes over time - it's one thing to see what locations are being mentioned in the news at this moment, but it's more revealing to see how those locations have changed over time.