I've been on an International Phonetic Alphabet kick lately (thanks to the linguistics class I'm taking) and have been contemplating the possibility of IPA Scrabble. As far as I know, it's never been done officially (by Hasbro or whatever.) Want to help make it happen?

Update 2008-10-25: Cascadilla Press may have something like this. Still looking into whether it fits the frequency and value model of Scrabble.

Update 2008-11-2: Cascadilla Press does indeed have a magnetic IPA Scrabble set. The rules are a little different (mainly to accommodate the difference between Latin-character English and IPA English) and the tiles are not designed for a regular Scrabble board, but the principle is there. They determined the tile values through gameplay, though it is skewed to work best for linguistics students, e.g. the less familiar symbols are worth more, and schwa is worth more to encourage multisyllabic words.

Update 2010-10-28: It has been two years, and I haven't done a lick of work towards making this happen. I hope somebody else takes this idea and runs with it!

Update 2014-02-15: Travis Feldman has a Kickstarter for Pijin, a phonetic grid spelling game. The creator has made some interesting decisions in trying to balance linguistic accuracy with playability; notably, IPA was rejected because having to use a lookup table made the game too tedious. (For most people!)

Update 2015-10-15: I see that Pijin made it -- congrats to them! I've also heard from the creator of the Soundable app, which supports IPA.

Unofficial, of course

I don't expect Hasbro to come out with an official IPA version. After all, the market is pretty small. Additionally, IPA does not provide the advantage of a unified tile set for every language. There would be a different distribution of tiles for each language, and the distribution in turn affects the point value of each symbol, which is written on the tile.

And that's where the trouble starts.

Frequencies and point values

English tile frequency-value graph

When designing the game that would become Scrabble, Alfred Mosher Butts used the front page of the New York Times to determine the frequency of letters in the English language. A similar process is needed to determine the symbol frequencies and values for IPA, but with the additional step of transliteration from Standard American English orthography into English IPA:

  1. Find a large corpus of Standard American English.
  2. Strip out all the words that are not present in TWL (or SOWPODS), the most common Scrabble dictionaries.
  3. Transliterate each word into broad phonetic transcription. (No distinguishing marks for aspiration, nasalization, vowel length, stresses, or syllabic consonants.)
  4. Count the number of times each symbol appears, deriving a distribution fraction that represents that symbol's share of the language.
  5. Normalize and round the distribution fractions to add to 100. This will be the number of tiles each symbol occupies in the 100-tile set.
  6. Take the inverse of the distribution fractions, multiply by 100, and round. These will be the point values for the symbols.
  7. Fudge the numbers around a little bit. Particularly pay attention to symbols that can be added to the beginning or end of a word ([s], [z], [t], [d]) to make a new word—consider lowering their distribution in the tile set. (Mr. Butts did this with the S tile in Scrabble.)

That's the long way, but I'm sure you'd get good (and maybe publishable?) results.

Any volunteers?

I'd like to make this happen. Here are ways you could help:

  • Find out if the analysis work's already been done. I suspect that someone, somewhere has researched the phone or phoneme distribution of Standard American English and published a paper about it. However, I lack the l33t research skills to determine which journals & keywords would be the most likely candidates.
  • Write a program to filter a corpus of English for valid words.
  • Transliterate those words! Or find an existing English orthography -> IPA translator.
  • Argue about the proper way to fudge the numbers.

I could certainly do all of this myself, but then it would never actually get done. Plus, it would be so much more fun as a group effort.

    http://search.cpan.org/~acg/Scrabble-Dict-0.01/lib/Scrabble/Dict.pm http://search.cpan.org/~bricas/WWW-Wikipedia-1.94/lib/WWW/Wikipedia.pm http://en.wiktionary.org/wiki/dictionary http://en.wikipedia.org/wiki/Template:IPA/doc

    u! ?? l?v ð?s ??di?. w?n pr?bl?m w?d bi ð?t j?d h??ft? d?sæd b?fo??h??nd ??t d?l?k ju w? g?ne juz.

    apparently also http://download.wikimedia.org/

    Huh. My first post may be caught in spam filter.

    How would the rules and the distribution handle regional accents?

    @jkao#4: Thanks, retrieved it. Jay's comment was filtered too. :-P

    @Jay#2: I apologize on behalf of my blogging software, it seems to have eaten your IPA. :-(

    regional accents no problem, if you just used as close to GenAm as possible. you could also have challenges (as in Scrabble) where you get them to say the word and then nitpick how they transcribed it based on their own pronunciation. If it's wrong, lose a turn.

    I was just thinking about the same idea. I was going to look at the scrabble card game for the presentation as it would be easier to produce something like that as a PDF and distribute it widely.

    A lot of work involved if you produce your own dictionary, but players could be required to bring a dictionary of their choice with IPA pronunciation.

    Fun idea. Not as much work as it sounds.

    Butss' original survey wasn't very comprehensive. So I did something which was equally idiosyncratic.

    1. Count the words in the King James Bible
    2. Use the cmudict (Carnegie-Mellon University english/phonetic dictionary) to translate it into phonemes
    3. Count phonemes

    This is what I got:

    AH0: 61973 N: 58396 D: 50633 T: 49903 S: 49182 R: 42111 L: 39479 Z: 26803 M: 24494 P: 22290 K: 21960 EH1: 21774 AY1: 20764 F: 20401 IH1: 18807 IY1: 18782 B: 17873 ER0: 17851 EY1: 17808 AE1: 16990 AH1: 14446 AO1: 13746 V: 12871 HH: 12711 OW1: 11554 W: 11402 DH: 10758 IY0: 10389 AA1: 9861 G: 9439 NG: 8713 IH0: 8122 TH: 7992 JH: 7831 AW1: 7364 ER1: 6915 SH: 6898 UW1: 5831 Y: 3582 CH: 3311 IH2: 2384 EH2: 2146 UH1: 2018 OY1: 1765 AY2: 1381 OW0: 1364 EH0: 1278 UW0: 1173 AO2: 819 AE0: 730 AA2: 684 AE2: 664 ZH: 599 OW2: 491 UW2: 468 EY2: 467 IY2: 461 AA0: 324 UH0: 303 AO0: 278 AY0: 167 AW0: 145 AH2: 130 ER2: 112 EY0: 101 UH2: 54 AW2: 8 OY2: 2 E21: 0 OY0: 0

    Someone will have to convert these into IPA (which the CMU dict doesn't use). They use this:

    AA odd AA D AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY CH cheese CH IY Z D dee D IY DH thee DH IY EH Ed EH D ER hurt HH ER T EY ate EY T F fee F IY G green G R IY N HH he HH IY IH it IH T IY eat IY T JH gee JH IY K key K IY L lee L IY M me M IY N knee N IY NG ping P IH NG OW oat OW T OY toy T OY P pee P IY R read R IY D S sea S IY SH she SH IY T tea T IY TH theta TH EY T AH UH hood HH UH D UW two T UW V vee V IY W we W IY Y yield Y IY L D Z zee Z IY ZH seizure S IY ZH ER

    So now you have the frequency data.

    I'm an Australian, and I'd like to have the options to play in a variety of dialects (British, Australian, etc).

    Also, the website boardgamegeek.com has scrabble tiles for about 30 different languages. I've printed out the Hebrew and Greek ones, stuck them to pasteboard, and cut them apart. Takes a whole afternoon, but it's fine if you have a friend who will help, or something else to do (listen to a new CD, for example).

    Oh, and no, I have no idea what the numbers are.

