In my last post, I began to explore the inner workings of Wordle. As a friend pointed out, it has all the hallmarks of a good game, because even understanding the decision trees didn’t diminish the fun I had playing it this morning as usual.
I added a bunch of stuff to my script to allow for any solver algorithm to test every five-letter word and get the overall performance of any given strategy. Strategy is kind of an interesting thing with respect to this game, as there are a finite number of answers and they don’t shift too much over time. So there isn’t really any advantage to writing a general-purpose strategy that works on all 11 million five letter combos when there are only 8,938 words in the list.
Dictionaries
It’s probably worth going into some detail on what the potential word lists even are. Thanks to some sleuthing by Tom, we can actually see the answer list Wordle uses inside its code. The New York Times article on the Wordle phenomenon also tells us helpfully that the creator’s partner, Palak Shah, went through the full list and selected about 2,500 words familiar enough to be used as answers.
This is a similar approach to the New York Times’ own Spelling Bee, which is at least in my opinion aggravatingly arbitrary about which words it will accept. Luckily in Wordle’s case, fewer words is better, and it does accept the larger list in guesses, even though that means 80%+ of valid guesses can never be the answer. (That’s okay, since we already learned that gray letters are our friends, probably worth exploring more deeply.)
Our candidate dictionaries, therefore, are something like the following:
- Collins Scrabble Words, the official word list used by Scrabble players in most of the world (notably, not the US.) As the article describes, this word list consists of lists merged together from British and American sources. Wikipedia says this list has 12,972 five-letter words, although I haven’t counted them yet.
- OTCWL2, the shorter dictionary I chose for my solver script. This dictionary has 8,938 five-letter words, omitting the most obscure (and offensive) choices while leaving a lot of latitude to produce guesses.
- Wordle’s full list of valid guesses, which has 10,657 options. I’m not sure what the source is, but it didn’t match with any of the other word lists I had available. (Perhaps some manual removals for various reasons?)
- Palak Shah’s list, the hand-picked selection of 2,315 words (that’s six years, four months of daily games) that comprises all possible answers. WARNING: this list goes through the answers IN ORDER. Unless you want to ruin Wordle for yourself forever, don’t read it. I alphabetized the version attached to my script to avoid this horror, but not before realizing in checking the list that I’d spoiled the next several days for myself. (Appeal to Mr. Wardle — could you at least hide a hash algorithm in there to make it slightly harder to see all the answers?)
I don’t know how aligned the actual answer list is with word frequencies. Is it cheating on the solver’s part to only guess words that might be the answer, if that word list is missing some common words that might otherwise be good guesses? (I think this probably skirts the line between general- and specific-purpose?)
Solver Strategies
Well, as I’ve just learned, the best solver strategy would be to calculate the number of days since June 19, 2021, index to that line of the answer list, and guess that. Also eating my words somewhat here about the value of a general-purpose algorithm, since a specific-purpose algorithm now has perfect performance in the real world.
But let’s elevate ourselves from the actual Wordle to the platonic ideal of Wordle, where the answer could be any five-letter word from the list, and look at some actual strategies. I came up with a couple of obvious approaches to start.
- Naive — use the same strategy to pick each next guess. For example, always choose the first valid answer, the last valid answer, or the valid answer from the center of remaining possibilities. (You could also choose non-deterministically, but that won’t give consistent metrics over runs.)
- Naive with naive optimization — choose the next word without reference to anything except the list, but pick a word most or least similar to previous guesses by some criterion. For example, always choose the word with the fewest seen letters.
- See all letters as quickly as possible — some people use a strategy to ignore the information from the first word and guess five new letters for the second word. My solver isn’t designed to do anything except use all information from previous clues, so I’m not directly testing this, but it has some interesting characteristics. For example, floor barring lucky guesses is always 3, even if you could reasonably get it in 2. But on average it should perform better because it avoids the long tail of 1 unknown letter and many possibilities (i.e., 🟩🟩⬜🟩🟩: CAFES, CAGES, CAKES, CANES, CAPES, CARES, CASES…) This statement should be equivalent to the statement “solver always plays in hard mode”, unless there are bugs. I haven’t verified that.
- Letter distribution — choose the next word so as to maximize the most common letters remaining.
- Decision tree pruning — choose the next word which eliminates the most possibilities given the available information. Recursively: do this all the way to the leaf nodes and pick the most promising tree. This is fairly standard game theory.
When getting at the objective “difficulty” of Wordle, the performance of naive strategies is pretty interesting. Let’s also define performance to give ourselves some sort of baseline. These statistics will apply to a given strategy’s results across the entire list of five-letter words (
- Average, median, and range — it seems the most basic metric is just the average and median number of guesses the solver takes. It’s also useful to know the minimum and maximum number of guesses the solver may take, in case it has incredibly poor (or good) outlier stats.
- Number (or percent) of failures — since the goal of Wordle is to guess the word in six or fewer guesses, the frequency with which the strategy can’t do that is also interesting.
- Number (or percent) of successes in n=4,3,2… — it’s also good to know the distribution of successes across the three major levels. 1 guess is a fluke; 5 and 6 are successful but not especially exciting.
Total for these sequences is 8,938, the number of five-letter words in my arbitrary word list selection.
Naive Strategy Statistics
I ran three naive strategies to assess baseline performance. There’s no real calculation here — a player using this strategy would have perfect knowledge of how to play the game, perfect knowledge of the corpus, and yet select answers that are likely bad for other reasons — too obscure, poor remaining letter distribution, no educated guesses at the presence of double letters, et al. Note that despite this, all strategies will always use all information from previous clues to make their next guess, which makes them more effective than one might guess.
In the next post, I’ll go over the statistics for the naive strategies and begin to address the question “how ‘hard’ is Wordle?” As a spoiler, these naive solutions all have success rates in the 80% range, meaning that having perfect knowledge and terrible strategy still allows you to win most of the time.