In [1] an optimization model is proposed for the following problem:
Using a list of 5-letter words, try to find a set of 5 words such that each letter is used at most once.
Data
The list of 5-letter words is from the wordle [2] game. Words with duplicate letters are removed. A partial list is:
---- 1035 SET wwords: large list of 5-letter words
abdom, abend, abets, abhor, abide, abies, abyes, abilo, abime, abysm, abled, abler, ables
ablet, ablow, abmho, abner, abnet, abode, abody, abohm, aboil, abord, abort, abote, about
above, abret, abrim, abrin, abris, abrus, absey, absit, abstr, abune, abuse, abush, abuts
acedy, acerb, ached, achen, acher, aches, achor, acidy, acids, acier, acies, acyls, acing
ackey, acker, aclys, acmes, acned, acnes, acoin, acold, acone, acorn, acost, acoup, acred
acres, acrid, acryl, acron, acrux, acted, actin, acton, actor, actos, actus, acute, adcon
The total number of 5-letter words in this list is: 10,175.
For our model, we need to know if character \(c\) is present in word \(w\). For this, we form a 2-dimensional set \(\color{darkblue}{\mathit{inWord}}_{w,c}\). A partial view looks like this:
---- 1053 SET inWord character c is in word w
a b c d e f g h i j
abdom YES YES YES
abend YES YES YES YES
abets YES YES YES
abhor YES YES YES
abide YES YES YES YES YES
abies YES YES YES YES
Model
A simple model to find a maximum subset of words that have unique letters can be:
Covering Model |
---|
\[\begin{align}\max& \sum_w \color{darkred}x_w \\ & \sum_{w|\color{darkblue}{\mathit{inWord}}(w,c)} \color{darkred}x_w \le 1 && \forall c \\ & \color{darkred}x_w \in \{0,1\} \end{align}\] |
Note that this model just finds one solution. This is the reported solution:
---- 1082 VARIABLE x.L select word
brick 1.000, fldxt 1.000, nymph 1.000, quags 1.000, vejoz 1.000
---- 1082 VARIABLE count.L = 5.000number of selected words
I am not familiar with all these words. E.g. I had to look up: fldxt (an abbreviation for fluid extract).
This solution is found very quickly. The total solution time is about 0.14 seconds (and needed 0 nodes).
All solutions
An obvious question is: how many sets of 5 words with unique letters do exist? We can use the solution pool for that. This is a technology present in most of the more advanced MIP solvers, to find and report more than one solution. When I do that, we see that the total number of 5-word solutions is 831. The solution time to find all these solutions is about a minute (78 seconds).
An alternative approach would be to add a so-called no-good cut to the problem to forbid the current solution and resolve the model. For a small number of solutions, this is a great strategy [3]. But in this case, the number of solutions is large. We would need to solve 832 MIP problems to conclude there are 831 solutions. The last model would be infeasible: that is the stopping criterion. Each subsequent model is a bit more expensive to solve (not uniformly so but on average).
An early attempt to solve this problem is documented in [4]:
According to Parker, executing the Parker algorithm on a laptop took about a month.
References
Appendix: GAMS model
$ontext
Using a large collection of words, form a subset of 5 words such that each letter is not used more than once.
$offtext
$set inc words.inc $set url https://raw.githubusercontent.com/aopt/OptimizationModels/main/Wordle/%inc%
*------------------------------------------------- * 5-letter Words *-------------------------------------------------
* download file $if not exist %inc% $call curl -O %url%
set w 'words: large list of 5-letter words'/ $include %inc% / ; display w;
scalar numWords 'number of words in our collection'; numWords = card(w); display numWords;
*------------------------------------------------- * Derived data * Characters in words *-------------------------------------------------
sets c 'characters'/a*z/ p 'position'/1*5/ inWord(w,c) 'character c is in word w' ;
* ord(string,n) = ascii number of string[n] inWord(w,c) = sum(p$(ord(w.tl,p.val)=ord(c.tl)), 1); display inWord;
*------------------------------------------------- * Optimization Model *-------------------------------------------------
binaryvariable x(w) 'select word'; variable count 'number of selected words';
equations counting 'number of different words with unique characters' cover(c) 'number of selected words containing char c' ;
counting.. count =e= sum(w, x(w)); cover(c).. sum(inWord(w,c), x(w)) =l= 1;
model m /all/; solve m maximizing count using mip; display x.l,count.l;
* remove if you want to see all solutions $stop
*------------------------------------------------- * How many solutions? *-------------------------------------------------
$onecho > cplex.opt SolnPoolAGap = 0.5 SolnPoolIntensity = 4 PopulateLim = 10000 SolnPoolPop = 2 solnpoolmerge solutions.gdx $offecho
m.optfile=1; option threads=0;
* not strictly needed, but seems to help count.fx = round(count.l);
Solve m using mip maximizing count ;
set index0 'superset of solution indices'/soln_m_p1*soln_m_p10000/ index(index0) 'used solution indices' ; parameter xall(index0,w) 'all solutions' ;
execute_load"solutions.gdx",index,xall=x; display xall;
scalar numSolutions 'number of solutions'; numSolutions = card(index); display numSolutions;
|