Category: Wordle

Analysis on the workings of Wordle, optimal strategies, and exploring the answer tree.

The Map of Wordle

I’ve been busy doing all kinds of Wordle analysis — as I mentioned in the last post I’ve been testing various strategies to assess the “difficulty” of Wordle. This led to some interesting speculation about which word is best to guess first. Obviously, the best word to guess first is the correct word, but you can’t do that better than chance (unless you have the oracle, in which case you’re cheating.)

Another interesting question I had is, how many possible responses are there to a guess? The entire set is { green, yellow, nothing } across five letters, so 243. However, five of those possibilities are invalid, since one yellow can’t happen — if you have four of the letters right, by definition the fifth letter cannot be out of position.

So I mapped every possible guess to every possible answer to look at how many responses each first guess gives, on average, against every answer. This is an answer set of (2315 * 2315), or about 5.3 million possible guess/answer combinations. I did a bunch of analysis I’ll share later, but then I got an interesting idea. Can we visually plot this map?

The image above is exactly that — a 2315×2315 image showing every guess along the X axis and every answer along the Y axis. I simply averaged the (r,g,b) values across the letters. So five yellows becomes pure yellow, the correct answer becomes pure green, no correct letters is black, and all other combinations generate some color in the yellow/green spectrum. The idea was simple enough. I quickly ran through the file and generated a P3 PPM, then converting it to PNG on the command line.

I expected to see a bright green diagonal line and sparser areas away from it. I also reasoned that guess/answer pairs would be sort of symmetrical, in that guessing A for B would give something similar to guessing B for A.

What I did not expect was this rich map of words colliding into the diagonal axes, pronounced squares coming out of the diagonal axis, and yellow and green swirls showing words that are probably related. This isn’t a map of how closely related words are to each other in any sense other than letters in their positions, and it only uses the words that can be actual Wordle answers, not the full guess set. So it really is a map of how Wordle starts, the very top of the decision tree in two dimensions.

I want to look at it more closely and talk about what generates the columns, but I was taken aback by the simple beauty of this image and wanted to share it before doing anything else. Enjoy 🙂

UPDATE: Since this image represents one word in each axis with a single pixel, I realized that the sizes of the squares probably correlates to the letters of the alphabet. I applied a hue modification to each letter combination, which produced this:

Same image — one pixel per guess, but colors have been tinted by the first letters of the words they represent.

This image is less helpful from a word perspective, but it sure is pretty!

Wordle Strategies Analysis (Part I)

In my last post, I began to explore the inner workings of Wordle. As a friend pointed out, it has all the hallmarks of a good game, because even understanding the decision trees didn’t diminish the fun I had playing it this morning as usual.

I added a bunch of stuff to my script to allow for any solver algorithm to test every five-letter word and get the overall performance of any given strategy. Strategy is kind of an interesting thing with respect to this game, as there are a finite number of answers and they don’t shift too much over time. So there isn’t really any advantage to writing a general-purpose strategy that works on all 11 million five letter combos when there are only 8,938 words in the list.

Dictionaries

It’s probably worth going into some detail on what the potential word lists even are. Thanks to some sleuthing by Tom, we can actually see the answer list Wordle uses inside its code. The New York Times article on the Wordle phenomenon also tells us helpfully that the creator’s partner, Palak Shah, went through the full list and selected about 2,500 words familiar enough to be used as answers.

This is a similar approach to the New York Times’ own Spelling Bee, which is at least in my opinion aggravatingly arbitrary about which words it will accept. Luckily in Wordle’s case, fewer words is better, and it does accept the larger list in guesses, even though that means 80%+ of valid guesses can never be the answer. (That’s okay, since we already learned that gray letters are our friends, probably worth exploring more deeply.)

Our candidate dictionaries, therefore, are something like the following:

  • Collins Scrabble Words, the official word list used by Scrabble players in most of the world (notably, not the US.) As the article describes, this word list consists of lists merged together from British and American sources. Wikipedia says this list has 12,972 five-letter words, although I haven’t counted them yet.
  • OTCWL2, the shorter dictionary I chose for my solver script. This dictionary has 8,938 five-letter words, omitting the most obscure (and offensive) choices while leaving a lot of latitude to produce guesses.
  • Wordle’s full list of valid guesses, which has 10,657 options. I’m not sure what the source is, but it didn’t match with any of the other word lists I had available. (Perhaps some manual removals for various reasons?)
  • Palak Shah’s list, the hand-picked selection of 2,315 words (that’s six years, four months of daily games) that comprises all possible answers. WARNING: this list goes through the answers IN ORDER. Unless you want to ruin Wordle for yourself forever, don’t read it. I alphabetized the version attached to my script to avoid this horror, but not before realizing in checking the list that I’d spoiled the next several days for myself. (Appeal to Mr. Wardle — could you at least hide a hash algorithm in there to make it slightly harder to see all the answers?)

I don’t know how aligned the actual answer list is with word frequencies. Is it cheating on the solver’s part to only guess words that might be the answer, if that word list is missing some common words that might otherwise be good guesses? (I think this probably skirts the line between general- and specific-purpose?)

Solver Strategies

Well, as I’ve just learned, the best solver strategy would be to calculate the number of days since June 19, 2021, index to that line of the answer list, and guess that. Also eating my words somewhat here about the value of a general-purpose algorithm, since a specific-purpose algorithm now has perfect performance in the real world.

But let’s elevate ourselves from the actual Wordle to the platonic ideal of Wordle, where the answer could be any five-letter word from the list, and look at some actual strategies. I came up with a couple of obvious approaches to start.

  • Naive — use the same strategy to pick each next guess. For example, always choose the first valid answer, the last valid answer, or the valid answer from the center of remaining possibilities. (You could also choose non-deterministically, but that won’t give consistent metrics over runs.)
  • Naive with naive optimization — choose the next word without reference to anything except the list, but pick a word most or least similar to previous guesses by some criterion. For example, always choose the word with the fewest seen letters.
  • See all letters as quickly as possible — some people use a strategy to ignore the information from the first word and guess five new letters for the second word. My solver isn’t designed to do anything except use all information from previous clues, so I’m not directly testing this, but it has some interesting characteristics. For example, floor barring lucky guesses is always 3, even if you could reasonably get it in 2. But on average it should perform better because it avoids the long tail of 1 unknown letter and many possibilities (i.e., 🟩🟩⬜🟩🟩: CAFES, CAGES, CAKES, CANES, CAPES, CARES, CASES…) This statement should be equivalent to the statement “solver always plays in hard mode”, unless there are bugs. I haven’t verified that.
  • Letter distribution — choose the next word so as to maximize the most common letters remaining.
  • Decision tree pruning — choose the next word which eliminates the most possibilities given the available information. Recursively: do this all the way to the leaf nodes and pick the most promising tree. This is fairly standard game theory.

When getting at the objective “difficulty” of Wordle, the performance of naive strategies is pretty interesting. Let’s also define performance to give ourselves some sort of baseline. These statistics will apply to a given strategy’s results across the entire list of five-letter words (

  • Average, median, and range — it seems the most basic metric is just the average and median number of guesses the solver takes. It’s also useful to know the minimum and maximum number of guesses the solver may take, in case it has incredibly poor (or good) outlier stats.
  • Number (or percent) of failures — since the goal of Wordle is to guess the word in six or fewer guesses, the frequency with which the strategy can’t do that is also interesting.
  • Number (or percent) of successes in n=4,3,2… — it’s also good to know the distribution of successes across the three major levels. 1 guess is a fluke; 5 and 6 are successful but not especially exciting.

Total for these sequences is 8,938, the number of five-letter words in my arbitrary word list selection.

Naive Strategy Statistics

I ran three naive strategies to assess baseline performance. There’s no real calculation here — a player using this strategy would have perfect knowledge of how to play the game, perfect knowledge of the corpus, and yet select answers that are likely bad for other reasons — too obscure, poor remaining letter distribution, no educated guesses at the presence of double letters, et al. Note that despite this, all strategies will always use all information from previous clues to make their next guess, which makes them more effective than one might guess.

In the next post, I’ll go over the statistics for the naive strategies and begin to address the question “how ‘hard’ is Wordle?” As a spoiler, these naive solutions all have success rates in the 80% range, meaning that having perfect knowledge and terrible strategy still allows you to win most of the time.

Analyzing (and Solving) Wordle

Good morning, empty Wordle grid.

I published some proof-of-concept Python code today while thinking about the inner workings of Wordle. Plenty has already been written about the game itself, its origin and appeal, and some solvers have been written as well. While my verbal brain super appreciates that, I’ve been doing a lot of Sudoku lately and the similarities in problem solving were fascinating. I got thinking really hard about how Wordle is put together from a math and probability standpoint.

What are the building blocks?

Wordle combines elements of a lot of games I enjoy — Mastermind, Wheel of Fortune, crosswords, and so on. But it’s really none of these, and that’s what makes it so interesting.

Superficially, it seems most similar to Mastermind. But with Mastermind, all potential combinations are equally likely, in theory, anyway. When playing against a human opponent, they may choose patterns they expect the player to fail to think of, and there’s some Roshambo-like strategy in faking someone out. Unlike Wordle, though, no one agrees ahead of time that only certain patterns are acceptable. In fact, there’s no way (apart from Sudoku and its variants, more on that later) to achieve that ahead of time except on something like a language corpus. So there are immediately interesting constraints that everyone can understand without a complicated ruleset.

It’s also like Hangman or Wheel of Fortune in one interesting way. Obviously neither of those games have the “not in this position” mechanic, but they do have another thing that other word games are less likely to have, leading to the thing I find really interesting about this game.

Negative information

Wordle is interesting because even guessing words that don’t match at all provides useful information about what does remain. Wheel of Fortune and Hangman have this concept, in that you know some things that won’t fill the available letters, This is the source of the humor for most of the Wheel of Fortune “fail” videos — a lot of times the contestant’s guess can’t be accurate, because it contains letters that have already been guessed.

Wordle has another interesting feature that tripped me up in designing the solver — you don’t know how many of a given letter there are unless you guess that many copies. In Wheel of Fortune, naming a letter immediately places all of those letters and by definition you know there are no more. Some of the harder Wordle solves result from needing to guess a double letter — and this is also why you could still run out of turns while guessing entirely on negative information.

However, consider the following sequence. The answer is NYMPH, which conveniently has no always-vowels.

  1. READS ⬛⬛⬛⬛⬛ 698 possibilities
  2. FRUIT ⬛⬛⬛⬛⬛ 123 possibilities
  3. BLOCK ⬛⬛⬛⬛⬛ 2 possibilities

The only words that fit these criteria are NYMPH and PYGMY — despite us never even guessing a single letter in any position! I wouldn’t even use the latter word in casual speech, so there’s only one real answer here. We derived this entirely through negative information.

Questions

Ultimately, what drove my interest is that it seems to be totally possible to get any given word in six guesses. Is that really the case? Are there sequences that you could reasonably guess with perfect information that wouldn’t get you there? What if you just pick a random or naive strategy and apply it blindly?

Rather than solving single words on a daily basis, I want to solve all the words. I’d love to understand the nooks and crannies of the English language that create really absurd patterns, like guessing NYMPH with no correct letters. I also wonder how this extends to longer patterns — does the game get harder, and then easier at higher levels? How difficult could it be to come up with a 15 letter word when there are so few to choose from?

As with many things, the creator seems to have hit on something that works really well — 5 letters in 6 guesses. What kind of tuning could be done to “prove” this is an optimal combination of a certain difficulty? Can you make it harder or easier (aside from the perfect knowledge aspect embodied in Wordle’s “hard mode”)? As I mentioned above, are there words that are just harder or easier depending on your strategy?

I left a couple of exercises for me or the reader to continue with in the readme — reproduced here:

  • How “good” is a given guess against the current word? Against all Wordle words? Against any English word?
  • What strategies are most effective across these sets?
  • How “hard” is this game? What’s the average tree depth to get to a single word?
  • How many ambiguous words have the same results right up to the solution (i.e., LIGER-TIGER)? What’s the maximum tree depth of ambiguous words?
  • Can naive solvers win every time with 6 guesses? What’s the min/max for any given algorithm?
  • What words minimize initial guesses? (Cracking the Cryptic suggested RISEN/OCTAL to cover top letter frequencies. Tom’s solver produced RAISE as the best first guess.)

No doubt someone else will do this work before I have a chance to get to all of it! If you’re one of those people, please take my scripts with my blessing and let me know how it goes!