I’ve seen Markov chain text generators of various kinds every once in a while on the web, and eventually I got the urge to make my own word generator. I was talking hockey with a friend at the time, specifically about surnames of NHL players. A unique variety of nationalities is represented in the NHL, northern European and Slavic in particular, so there’s some great atypical (for the US) names on the jerseys. So hockey player surnames is where I started. I made some special modifications to the basic Markov chain state machine in order to create more realistic name-like output: curtailing unusually short or long sequences, assuring that generated sequences are not extremely similar to input “training” sequences, etc. I gathered the entire history of players from nhl.com and condensed it into a plain text list of over 6,000 names. The results are satisfactory. Some examples:
Those may not be interesting if you’re unfamiliar with the NHL, so I’ve also run a list of minerals through the machine:
- Salt Vinite
Get the code and data here.