IPA Scrabble?
I've been on an International Phonetic Alphabet kick lately (thanks to the linguistics class I'm taking) and have been contemplating the possibility of IPA Scrabble. As far as I know, it's never been done officially (by Hasbro or whatever.) Want to help make it happen?
Update 2008-10-25: Cascadilla Press may have something like this. Still looking into whether it fits the frequency and value model of Scrabble.
Update 2008-11-2: Cascadilla Press does indeed have a magnetic IPA Scrabble set. The rules are a little different (mainly to accommodate the difference between Latin-character English and IPA English) and the tiles are not designed for a regular Scrabble board, but the principle is there. They determined the tile values through gameplay, though it is skewed to work best for linguistics students, e.g. the less familiar symbols are worth more, and schwa is worth more to encourage multisyllabic words.
Update 2010-10-28: It has been two years, and I haven't done a lick of work towards making this happen. I hope somebody else takes this idea and runs with it!
Unofficial, of course
I don't expect Hasbro to come out with an official IPA version. After all, the market is pretty small. Additionally, IPA does not provide the advantage of a unified tile set for every language. There would be a different distribution of tiles for each language, and the distribution in turn affects the point value of each symbol, which is written on the tile.
And that's where the trouble starts.
Frequencies and point values
When designing the game that would become Scrabble, Alfred Mosher Butts used the front page of the New York Times to determine the frequency of letters in the English language. A similar process is needed to determine the symbol frequencies and values for IPA, but with the additional step of transliteration from Standard American English orthography into English IPA:
- Find a large corpus of Standard American English.
- Strip out all the words that are not present in TWL (or SOWPODS), the most common Scrabble dictionaries.
- Transliterate each word into broad phonetic transcription. (No distinguishing marks for aspiration, nasalization, vowel length, stresses, or syllabic consonants.)
- Count the number of times each symbol appears, deriving a distribution fraction that represents that symbol's share of the language.
- Normalize and round the distribution fractions to add to 100. This will be the number of tiles each symbol occupies in the 100-tile set.
- Take the inverse of the distribution fractions, multiply by 100, and round. These will be the point values for the symbols.
- Fudge the numbers around a little bit. Particularly pay attention to symbols that can be added to the beginning or end of a word ([s], [z], [t], [d]) to make a new wordâ€â€consider lowering their distribution in the tile set. (Mr. Butts did this with the S tile in Scrabble.)
That's the long way, but I'm sure you'd get good (and maybe publishable?) results.
Any volunteers?
I'd like to make this happen. Here are ways you could help:
- Find out if the analysis work's already been done. I suspect that someone, somewhere has researched the phone or phoneme distribution of Standard American English and published a paper about it. However, I lack the l33t research skills to determine which journals & keywords would be the most likely candidates.
- Write a program to filter a corpus of English for valid words.
- Transliterate those words! Or find an existing English orthography -> IPA translator.
- Argue about the proper way to fudge the numbers.
I could certainly do all of this myself, but then it would never actually get done. Plus, it would be so much more fun as a group effort.
![[feed]](http://www.brainonfire.net/blog/wp-content/themes/cleaner/images/feed-14sq.png)
jkao says:
http://search.cpan.org/~acg/Scrabble-Dict-0.01/lib/Scrabble/Dict.pm
http://search.cpan.org/~bricas/WWW-Wikipedia-1.94/lib/WWW/Wikipedia.pm
http://en.wiktionary.org/wiki/dictionary
http://en.wikipedia.org/wiki/Template:IPA/doc
Jay says:
u! ?? l?v ð?s ??di?. w?n pr?bl?m w?d bi ð?t j?d h??ft? d?sæd b?fo??h??nd ??t d?l?k ju w? g?ne juz.
jkao says:
apparently also http://download.wikimedia.org/
jkao says:
Huh. My first post may be caught in spam filter.
Trevor Stone says:
How would the rules and the distribution handle regional accents?
Tim McCormack says:
@jkao#4: Thanks, retrieved it. Jay's comment was filtered too. :-P
Tim McCormack says:
@Jay#2: I apologize on behalf of my blogging software, it seems to have eaten your IPA. :-(
Phonetic Scrabble says:
[...] Haven’t found actual phonetic Scrabble yet, but did find someone speculating about IPA Scrabble. IPA is the International Phonetic Alphabet, and it’s what we used in music school to [...]
Patrick says:
regional accents no problem, if you just used as close to GenAm as possible. you could also have challenges (as in Scrabble) where you get them to say the word and then nitpick how they transcribed it based on their own pronunciation. If it's wrong, lose a turn.
Trevor G. says:
I was just thinking about the same idea. I was going to look at the scrabble card game for the presentation as it would be easier to produce something like that as a PDF and distribute it widely.
A lot of work involved if you produce your own dictionary, but players could be required to bring a dictionary of their choice with IPA pronunciation.
Fun idea. Not as much work as it sounds.
Tim Nelson says:
Butss' original survey wasn't very comprehensive. So I did something which was equally idiosyncratic.
1. Count the words in the King James Bible
2. Use the cmudict (Carnegie-Mellon University english/phonetic dictionary) to translate it into phonemes
3. Count phonemes
This is what I got:
AH0: 61973
N: 58396
D: 50633
T: 49903
S: 49182
R: 42111
L: 39479
Z: 26803
M: 24494
P: 22290
K: 21960
EH1: 21774
AY1: 20764
F: 20401
IH1: 18807
IY1: 18782
B: 17873
ER0: 17851
EY1: 17808
AE1: 16990
AH1: 14446
AO1: 13746
V: 12871
HH: 12711
OW1: 11554
W: 11402
DH: 10758
IY0: 10389
AA1: 9861
G: 9439
NG: 8713
IH0: 8122
TH: 7992
JH: 7831
AW1: 7364
ER1: 6915
SH: 6898
UW1: 5831
Y: 3582
CH: 3311
IH2: 2384
EH2: 2146
UH1: 2018
OY1: 1765
AY2: 1381
OW0: 1364
EH0: 1278
UW0: 1173
AO2: 819
AE0: 730
AA2: 684
AE2: 664
ZH: 599
OW2: 491
UW2: 468
EY2: 467
IY2: 461
AA0: 324
UH0: 303
AO0: 278
AY0: 167
AW0: 145
AH2: 130
ER2: 112
EY0: 101
UH2: 54
AW2: 8
OY2: 2
E21: 0
OY0: 0
Someone will have to convert these into IPA (which the CMU dict doesn't use). They use this:
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
EH Ed EH D
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
HH he HH IY
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER
So now you have the frequency data.
I'm an Australian, and I'd like to have the options to play in a variety of dialects (British, Australian, etc).
Also, the website boardgamegeek.com has scrabble tiles for about 30 different languages. I've printed out the Hebrew and Greek ones, stuck them to pasteboard, and cut them apart. Takes a whole afternoon, but it's fine if you have a friend who will help, or something else to do (listen to a new CD, for example).
Tim Nelson says:
Oh, and no, I have no idea what the numbers are.