Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hacking Scrabble (part 1) (zephod.com)
95 points by zephod on Dec 13, 2011 | hide | past | favorite | 50 comments


Note: apparently he's using the SOWPODS (combined british & north american) 2 letter words. The tournament word list used in north america only contains 101 2 letter words:

http://en.wikipedia.org/wiki/Official_Tournament_and_Club_Wo...

Here's the TWL 2 letter word list:

http://wordsolver.net/?tpl=twl2


I'd vote that memorizing a hundred 2 letter words (many of which you likely already know) is easier than memorizing 26 mnenomics. or, you can just know a bunch and guess every once in awhile, gambling that your opponents also don't know all 101 for certain and don't want to risk challenging the one you put down;)


actually, once you move to even the low-division tournament level, pretty much everyone knows all the 2 letter words.


it's called CSW ("collins scrabble words") rather than sowpods now, since collins replaced chambers as the dictionary publisher.


I wonder if you could devise a series of phrases where every consecutive pair of letters in the phrase is a valid two-letter word. Then it might be possible to mentally retrieve the entire list by remembering a handful of phrases, instead of one-per-letter.

[Edit: Turns out you can get kinda close: https://gist.github.com/1474006

This is really quickly scratched together (and there may be bugs), but there don't appear to be that many valid words left completely out when using this strategy against an american dictionary of 3k words. I'll bang on it more later, or someone else feel free to do better]

[Edit: After seeing @msluyter's comment, I adjusted to look for only the American 2-letter words. Ended up with much better results. This list of words (and possibly a subset of them that I'm not checking for yet):

  'MELINE', 'ESOP', 'HEMAD', 'SHERE', 'WER', 'UNENAMORED', 'MYOHEMATIN', 'REHONOR', 'HEMATOSIN', 'ANEMATOSIS', 'TORET', 'HEMATID'
Covers all but these valid words:

  'AA', 'OE', 'ZA', 'QI', 'UH', 'AW', 'AY', 'BY', 'XU'
]


On the topic of Scrabble, can anybody provide some insight into how they'd go about architecting the database for a game like WordSquared? http://wordsquared.com/

It's similar to Scrabble ... but it has no playing-area boundaries. I'm curious what the best way to represent this type of game-board data in a database would be and was hoping somebody smarter than me could help make me smarter.


You're pretty free in how you want to represent the database, because you're unlikely to ever want to query (quickly) based on the contents of the tile, so representation doesn't much matter. My first cut would be a 64x64 tile, indexed by x & y from an arbitrary 0, each of which contains a 64x64 = 4096 (4K) string representation of the contents of the tile. There will be lots of spaces, so my first cut at efficiency would be to simply do inline RLE and embed ASCII numbers in the stream that represent the number of spaces, ie, "4096" = empty tile, "2047a2048" represents a tile with one a in the middle, etc. (This preserves some human readability, where gzip does not, but you could use that too.) But frankly, at 4K+noise per tile, and no compelling need to ever refer to the entire table at once, even the stupid string representation will get you a long way before it bothers any serious DB server; that's enough to store about 250-ish million tiles in a single terabyte (a bit of fudge in there for overhead). (And the live set would be small enough to fit in very little ram so I'm not too worried about that.)

By the time that representation is your Biggest Problem, you'll be able to afford someone to do something more clever.


I'm sure there is a better way to do it, but I would think the simplest solution would be to start from the center and have an x and y offset for each square reaching out from there.

Center is 0,0 - Square to the right is 0,1, etc.

Then you store xoffset, yoffest, and tile value in a database as squares are filled in.


personally i would uses some sort of tree structure. Maybe a quadtree would be convenient.


I upvoted this in hopes of generating some discussion about whether there might be a better way to do this. Here's one idea:

With very few exceptions, all of these words will contain at least one vowel. This suggests that we can (for the most part) compress the data from: (1) a list of 24 lists of words, to (2) a list of 6 sets of 2 lists of words.

So now "a" looks like this: bdefhjklmnptyz -a- abdeghilmnrstwxy

Now you can apply the same mnemonic trick to each of these word lists as the author does in the article, but you wind up only having to memorize 12 lists instead of 24, plus a slight overhead for non-vowel words (there are 4).

This can perhaps be improved further by removing redundancy somehow (though maybe not, as the redundant structure might be easiest to check mentally).


Author here. To be honest I posted it in the first place to hopefully generate some discussion about whether there might be a better way to do this(!). I think memorising 26 mnemonics is possible (and i disagree with the below commenter who says 124 individual words might be easier...) but perhaps the information can be further compressed. Can you elaborate on how your method works a little? How are the two lists derived?


Oh sorry, classic case of not realizing not everyone hears your thoughts :-)

For "a", there's a list where "a" is the first letter, and a list where "a" is the second. What I wrote out above is an attempt to show the structure. The letters on the left are possible prefixes for "a", the letters on the right are possible suffixes. Then I stuck "-a-" itself in the middle because that seems like a good visualization.


There's a very good review of what it takes to build a state-of-the-art AI Scrabble player in the following paper (appears to be open-access, apologies if not):

World-championship-caliber Scrabble, Brian Sheppard, Artificial Intelligence, 134(1–2):241–275, 2002. http://www.sciencedirect.com/science/article/pii/S0004370201...

There's more to it than you might think. In particular, I was initially surprised about how much of the game is about rack management. Although I know this won't be a surprise to people who've played Scrabble seriously.


I don't really think it is worth generating these mnemonics. A lot of the words in that list are really common words so you're making it harder for yourself by re-remembering them in mnemonic form.

I think you should remove all the words you know first, as this will make it much easier to remember. After that if you want to use the mnemonic approach, then if that works for you then great.

Although, I learned most of the weird ones by just playing against a computer which used them a lot. Eventually you remember them all. Incidentally, this way you actually learn a couple interesting new words. Three-toed sloth anyone?


This is very similar to anamonics[1], which are usually used for remembering longer words.

Here is a massive list of them: http://www.poslarchive.com/math/scrabble/anamonics-twl2.html

[1] http://en.wikipedia.org/wiki/Anamonic


I think these may be easier to memorize these if the mnemonic started with the letter in question, where possible.


What about instead using a little base 2 to memorize? So 'b' can follow 'a' or 'o' and be followed by 'a', 'e', 'i' or 'o'. Take a as 16, e as 8, i as 4, o as 2 and u as 1. So if you memorized '18b30' you could decode that in your mind to the right combinations for 'b'. Obviously this doesn't work for the combinations with two consonants and the words ending in 'y', but it's a start. I think you'd pretty quickly learn common values like 30 is 'aeio'.


A while ago I put together http://twoletterscrabblewords.com/ to visualize the two letter pairs that were possible. If you click on any of the tiles you'll see the valid pairs, unless the tile is grey, in which case there are no pairs for it.

I was hoping that it would be useful as a study tool for people who are more visually inclined - I wouldn't have considered using mnemonics originally.


Looking forward to the conclusion of this! Interesting and as a side benefit might help me finally get a leg up on playing my wife in Scrabble :-).


It's not that hard to memorize the 2-letter words. Most of them do have a meaning that makes sense. So it's easier to remember

aa = lava

ab = your abs

ad = tv ads

ae = scottish "one"

ag = agricultural

ah = ah that feels good

ai = a sloth

al = a tree

am = i am happy

an = an apple

ar = arrr i'm a pirate

as = as you were

at = where you at

aw = aww that's cute

ax = the weapon

ay = ay caramba

than it is to memorize "abdeghilmnrstwxy".


I agree that memorizing the words rather than the mnemonics is both easier and more interesting.

But if you can only memorize two words, go for "za" and "qi" - they come in really handy to get rid of a q or z. (za is an absurd short form of pizza, and qi is Chinese life energy, same as chi.)

I also recommend the handy words qat (the stimulant leaf) and waqf (an Islamic charitable donation). After you're challenged unsuccessfully on waqf, you can get away with anything :-)


Comparing his approach vs yours I can't help but think a: birthdays mangle wax is easier to remember than that list.


But it is easier to check the the GP's solution:

    return definitions.get(candidate) != null;
vs

    foreach c in definitions(candidate.charAt(0)) {
        if c == candidate.charAt(1)) return true;
    }
    return val;
That leaves aside the general usefulness of knowing the definitions of words.


Agreed. You really don't have to be a savant to memorize these. Only about half of them are non-obvious. Another quarter are just spellings of letters. And the last quarter might be obscure, but just playing a few games using the list and gaining all those points and I guarantee you'll have them down.


Funny I've been thinking about just this .. .my gf is far too good ... I'll probably never beat her but getting within a reasonable percentage seems reasonable. Maybe writing some minigames would help. Not just for learning two letter words, but helping with stuff like being able to quickly spot popular prefixes and suffixes..


How about approaching the problem from the suffix point of view?

Instead of having 25 prefix having suffixes, you have 20 suffixes having prefixes.

The prefix technique has an average of 4.96 suffixes per prefix, and the suffix one has 6.2 prefixes on average for each suffix.

Maybe this would be easier to deal with?


Looking at your Python script. Might want to learn about the built-in set() functionality. For exmaple:

def uniqueletters(s): return set(c for c in s)

def intersect(word, charlist): return len(uniqueletters(word).intersection(charlist))


In fact:

max(wordlist, key=(lambda w: intersect(w, charlist))

should replace a good portion of the create_mnemonic function. But now I'm just showing off. Just letting you know that Python is much cooler than you know.

Haskell, however, could probably do this in a few lines.


I've got a pull request sitting on GitHub showing me exactly this. Awesome! I'm about eight months into Python but my knowledge isn't particularly deep.


It's unfortunate that some Scrabble players are more likely to know dozens of obscure two-letter words than how to spell "mnemonic".


A very good point. Poor quality control... I can fix that now :-)


Sorry, that probably came out more snotty than it should have.

My beef with Scrabble is that it really isn't a word game, so much as an exercise in memorizing arbitrary strings of symbols. It's unfortunate, because it doesn't really encourage people to develop their vocabulary, as in "learning new words and their meanings", which is generally useful. Rather, it encourages people to learn what arbitrary combinations are legal in a given word list.

Ad absurdum - at one point, the world Scrabble champion was a fellow who didn't actually speak English.


I'm a competitive Scrabble player. While you're absolutely correct that learning arbitrary strings of letters is a part of the game, this doesn't imply that developing one's vocabulary isn't.

As a result of the game, I've learned tons of words whose meanings are completely unknown to me (mbaqanga, eolipile, aboideau) and tons of words that have enhanced my vocabulary (prexy, glaire, oomiak).


Off topic, but I can't resist: the (a)eolipile is one of the coolest objects in history. It is literally the first steam-powered machine: http://en.wikipedia.org/wiki/Aeolipile . Its invention suggests that perhaps if things had gone a bit differently the Romans could have made steam power practical 1400 years before the Industrial Age.


Thanks, that's really sweet. Now I know this word indirectly because of Scrabble.


When you think about it, for any any Scrabble-like game to be played competitively, it'll ultimately have to have a codified word list. (Every word needs to be judged as legal or illegal.) Make that list too small and confined to "normal" vocabulary and you make the game vulnerable to being "solved" by players who can memorize everything and/or, you make the game less complex/interesting.

Make the list too large, and you open the game up precisely to the this sort of critique -- "it's all about learning arbitrary words like AA or ZEK." Personally, I'm happy with the latter, as it makes the game deep and rewarding (I actually enjoy looking up words like "zek.") But (competitive) Scrabble is its own world that needn't appeal to everyone.

If the arbitrariness of the Scrabble tournament word list bothers you, you might try house rules with your friends that limit words to those that are universally accepted. This is what I do when I play with my mom. I just know that I can't play QI or UNAU and play a less competitive, friendlier game.


While not as impressive as playing competitive Scrabble in a foreign language, I thought it was fascinating in the documentary about competitive Scrabble, Word Wars[1], that one of the finalists in an American tournament was a British champion who had to keep track of which words were allowed in the British Scrabble dictionary versus the American Scrabble dictionary used in the tournament.

[1]http://www.imdb.com/title/tt0390632/


They should have some kind of "definition" challenge - if you don't know the definition of a word, you can't use it.

The trouble is nailing it down. You can't expect people to remember the dictionary definition word-for-word, but then how close is "close enough", and who decided? OTOH, if you just require people to use the word in a sentence, even ruling out variants like "I want to play the word "quux" next", all a player would have to do would be to remember that a word is, e.g. a noun, and say "I saw a xyzzy on the way to work yesterday."


That is incredible. I'm not really advocating that way of playing - I never really play a word I can't define. But I'm trying to justify an exception for the two-letter-words because they're so useful. They're a bit of a special case - knowing them keeps a game feeling enjoyable and flowing pleasantly.

As for 'learning new words and their meanings', clearly Scrabble isn't that game. But to be fair that isn't its stated aim, either. If you know any good examples I'd love to play them.


True - they're indispensable. I have a coworker who plays Scrabble at a competitive level and has memorized every 2, 3, 4 and 5 letter word in The Word List, and will randomly shout out, on hearing words like "tandoor", that there are 3 anagrams of it - tornado, odorant, and something else I can't even remember. It's an amusing parlour trick, but seems a very arbitrary, non-transferable skill he's developed.

He also has a couple of funny anecdotes about words that aren't allowed - like "picowatt", which rather offended our collective sensibilities as electrical engineers.

You're quite right that Scrabble is not about definitions, per se. But Scrabble does describe itself as a "word game", and it really isn't - it's more of a memorization with a bit of math kind of game.


This reminds me of the documentary "Wordplay" about competitive crossword puzzle players. Without a formal proof, my guess is that scrabble is faster to solve that a hard crossword puzzle for a computer. It's amazing that the human mind can stitch together patterns like these and internalize that in a meaningful way. A cool extension of this project would be to write a scrabble tool that generates arbitrary strategies that would be useful for a human player. So instead of generating the best play for a move, the program would recognize that "when the board is like X, then try and use strategy Y"

Awesome project @zephod!


It depends what you mean by solve. Scrabble, as a two-player game, has a significant branching factor compounded by the fact that the tile drawing makes it both a stochastic game as well as one with hidden information. Maven http://en.wikipedia.org/wiki/Maven_(Scrabble) is an AI that can play Scrabble, and it's good, but it's certainly not "solved".

On the other hand, crossword puzzles are single-player games that use natural language wordplay to create a challenge. Some AI work has been done in this field, which can use a Web search engine (like Google) to try and determine the answer to each clue based on the clue (as well as any candidate words/letters already filled in). The hard part is understanding the clue. Look to systems like IBM's Watson to see how those kinds of problems can be solved.

"Solution" in a single player game, like a crossword puzzle, is solved in exactly one way: when all the rows and columns are filled with legal words (usually -- unless the author rigged it to allow clues to have multiple solutions, like the election day crossword in the NYT in 1996). A "solution" in a two-player game is when, no matter what move you make, your opponent knows perfect play. For example, checkers is solved: http://en.wikipedia.org/wiki/Chinook_(draughts_player)


quackle [http://people.csail.mit.edu/jasonkb/quackle/] is an open-source scrabble ai that can currently play on par with all but the very best human players. which is not the same thing as scrabble being "solved" of course, but it does bode well for the state of the art.


I am pretty sure Maven was playing at the same level circa 2001, so not surprised to hear other AIs are doing just as well in 2011.

I guess it's kind of like chess: it's almost impossible to beat a computer, so it's effectively solved. Except when they play each other :)


  Ad absurdum - at one point, the world Scrabble champion was a fellow who didn't actually speak English.
That's not too surprising unless he played in English. Scrabble has editions for many languages.


The guy being allied about is Panupol Sujjayakorn (http://en.wikipedia.org/wiki/Panupol_Sujjayakorn). He is Thai. In Thailand, scrabble is used to learn English.

Unlike with Chinese, it is not impossible to make a Thai scrabble, but I doubt there is one, given the size of its alphabet (http://en.wikipedia.org/wiki/Thai_alphabet)


i've been to the king's cup tournament in bangkok a couple of times. there's a children's tournament run in parallel in the same venue, and it was a truly amazing sight to see thousands of kids squaring off across scrabble boards. i truly wish school scrabble would take off in india; as it is, this year we didn't even manage to find two kids to go for the world youth scrabble championship :(


He did play in English, actually. That's what made it so ridiculous.


he (i'm assuming you're referring to panupol) knew some english; he just didn't speak it very well. but you're right, at a tournament level scrabble is a lot more about memorisation and strategy than it is about vocabulary.


this was pretty much the way i memorised the 2s, except i did the mnemonics by hand. (i've written a lot of little apps like this but from the inverse point of view - i type in a hand-generated mnemonic and it tests it for correctness, and tells me what letters are missing or incorrectly there)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: