Recent Activity

Comments

Recent Earthquakes in Japan

Since Japan has been experiencing increased earthquake activity since the Tōhoku earthquake and tsunami on March 11, I put together the chart below for each daily earthquake (blue dots) and the weekly average (red line) based on data from the JMA. The chart shows daily earthquakes registering MMS 1.0 or higher.

Weekly average is the total magnitude for the week, with the date plotted as the midpoint. I simply add all the seismic moments (M0 in the formula below) for the week, divide by 7 and convert that to MMS. The scale is logarithmic, such that one quake registering a 9.0 releases 1000 times as much energy one measuring 7.0 does, which likewise releases 1000 times as much energy as one registering 5.0 does. It's not a perfect measure, but it does provide some perspective on directionality.


View LaTeX

Unfortunately, the chart is missing data for early April.

A chart showing recent earthquakes in Japan

You can download the spreadsheet here.

Sources (in Japanese)

  1. http://www.jma.go.jp/jp/quake/quake_local_index.html (4/4 – current)
  2. http://www.seisvol.kishou.go.jp/eq/shindo_db/db_map/indexemg.html (3/11 – 3/31)
  3. http://www.seisvol.kishou.go.jp/cgi-bin/shindo_db.cgi (1/1 – 3/11)

0 comments

Reinventing the Wheel

This is just a little post about an alternative system for writing Japanese that I came up with in my spare time, mostly to prove to myself that a language with thousands of characters and a tiny handful of syllables could be reproduced faithfully and accurately. Rather than describing every last detail, let's just say it uses the same basic shape to represent each consonant and the same accent mark for each vowel. I made an online proof-of-concept IME to illustrate how it works.

Try typing in things like hiragana or DAINIPPONKOKU. In the latter, the caps implies an on'yomi kanji reading, which cun forces to be one symbol apiece to further enhance efficiency. That is, for all Chinese words, each kanji, regardless of its reading, can be replaced by a single letter.

Examples

My name in cun
My name (ブライアン・リー・マッケルビー Buraian Rī Makkerubī)

Opening lines of Snow Country
The opening lines of Yasunari Kawabata's Snow Country (国境の長いトンネルを抜けると雪国であった。 Kunizakai no nagai tonneru o nukeru to yukiguni de atta.)

Some Notes on Syllables in Japanese

Japanese syllables are broken into moras. A mora is like a unit of time, such that a short syllable contains one mora and a long syllable contains two moras. For example, the word Nippon 日本 ("Japan") can be broken into two syllables (Nip + pon) and four moras (Ni + p + po + n).

WordDefinitionMorasLength
kokoここherekoko2 moras
kokko国庫national treasurykokko3 moras
kōko公庫public loan corporationkooko3 moras
kōkō高校high schoolkookoo4 moras
konkonコンコンknocking soundkonkon4 moras

We can neatly group these moras together in a table. Note that some consonants undergo changes in a few places (t + i = chi, t + u = tsu), and that some entries in the "d" row were deleted because they are pronounced the same as those in the "z" row:

aiueoyayuyo
-aiueoyayuyo
kkakikukekokyakyukyo
ggagigugegogyagyugyo
ssashisusesoshashusho
zzajizuzezojajujo
ttachitsutetochachucho
dda--dedo---
nnaninunenonyanyunyo
mmamimumemomyamyumyo
hhahifuhehohyahyuhyo
bbabibubebobyabyubyo
ppapipupepopyapyupyo
rrarirureroryaryuryo
Others: wa, n, [geminate], [long vowel]

In cun, the above is represented thusly:

The main moras in cun

Click the above image to see the entire alphabet and almost all possible symbols. What should stand out is how organized it is in comparison to hiragana.

To make a consonant geminate, you repeat the consonant part of the letter in the same amount of space, so for example ıc (あか aka) becomes ıε (あっか akka). To nasalize a vowel, you add a horizontal line, such that ı (あ a) becomes ī (あん an). Long vowels depend on which vowel it is, but you can generally see that in the IME.

But in any case, that's 103 moras (13 rows × 8 columns - 5 duplicates + 4 special moras) necessary to describe every distinct unit of time in Japanese. Yet the Japanese are expected to learn 1,006 kanji in primary school and another 939 in secondary school. By contrast, the basic unit in English is a syllable, of which approximately 5,000 see actual use (Duanmu 205). Despite the complexities of English, however, only around 13% of words possess pronunciations different from their spellings. And of course, there's no reason someone couldn't use kanji for disambiguation, as they do in Korean (and increasingly sparingly, as people can usually understand from context).

Anyway, I wouldn't propose this is as a spelling reform, but it's nonetheless interesting how inertia can outweigh a more logical approach.

  1. Tamaoka, Katsuo and Makioka, Shoga. "Frequency of Occurrence for Units of Phonemes, Morae, and Syllables Appearing in a Lexical Corpus of a Japanese Newspaper," Behavior Research Methods, Instruments, & Computers 36, no. 3 (2004), 531-547. http://brm.psychonomic-journals.org/content/36/3/531.full.pdf+html
  2. Duanmu, San. Syllable Structure: The Limits of Variation. Oxford: Oxford University Press, 2008. http://books.google.com/books?id=K5HR4oMYIlUC

0 comments

Bracketology, Part 2

Part 1 here

So after the first weekend of play, there seems to be a statistically dubious relationship between computed probabilities and actual results, and if you exclude near-certain games (i.e. the 1-16 and 2-15 games), correlation is actually negative as probability increases from 50%:

Backtesting kenpom

So thus far we have little backtesting evidence of predictive reliability. That said, here's an update of the bracket with kenpom probabilities:

kenpom projections from the Sweet Sixteen onward

I've updated the original spreadsheet with stuff like a Monte Carlo sim for the final four rounds, which gives the following probabilities for each team of taking home the championship:

TeamProbabilityVegas Odds
(less vig)
Ohio St.32%25%
Kansas20%25%
Duke19%11%
Wisconsin10%4%
San Diego St.5%4%
Brigham Young4%3%
Kentucky3%5%
North Carolina2%6%
Florida2%4%
Connecticut2%4%
Florida St.1%2%
Richmond0%0%
Marquette0%2%
Arizona0%2%
Butler0%2%
Virginia Commonwealth0%0%

Also, on the issue of the Big East's massive underperformance relative to perennially inflated expectations, it's worth noting that this is the norm for the conference. Tracking conferences over the past decade and adjusting for the size of the conference, the Big East just hasn't been that successful:

hi

Much of the reason the Big East has a reputation for being such a tough conference, one filled with teams that can beat each other, is because the conference really is filled with that many good-but-not-great teams. One team makes it through the slugfest against several Round of 32-caliber teams as is immediately anointed as "battle-tested." Nowhere is this more evident than this year, where the Big East fielded a record 11 tournament teams, but only 2 made it to the Sweet Sixteen, and even then, only got there because they played another Big East team in the Round of 32.

All of this leads me to believe that indicators (RPI, kenpom's ratings) are being skewed heavily by playing a large number of merely good opponents. The toughest road is playing in a conference with both weak and strong opponents, yet at least with the RPI the formula would just focus on the average. For example, assuming opponents have equal SOS, beating a 20-0 team and then a 10-10 team would be the same as beating two 15-5 teams. Also, losing by 30 to a 20-0 team and beating a 10-10 team by 1 have the same effect on your RPI. Naturally, I wanted to move away from the Really Poor Indicator, but while the kenpom figures are very interesting and the site is one of the best statistical breakdowns of sports out there, the computed probabilities don't seem to matter much come tourney time.

Bottom line: Maybe the best predictive indicator for the NCAA Tournament is high-quality wins, with lower-quality games being much less irrelevant. I don't know how you determine that yet, but you get every team's best shot in the tourney, and profoundly above-average teams don't seem capable of handling it.

0 comments

Bracketology

I made a little spreadsheet that I plan to update to look at the probabilities generated at kenpom.com for NCAA Tournament games. It includes the bracket shown in the pic below, a simulator for hypothetical games and an easy way to update it using summary data from kenpom. As for kenpom, Ken Pomeroy generates ratings for each team based on their offensive and defensive efficiency, adjusted for the caliber of their opponents, which can then be turned into probabilities of victory or final score estimates. Think of it as a more relevant version of the RPI.

You can download the spreadsheet here.

And here are the current results if the favorite wins in every match-up, with games color-coded in grades of green (clear favorite) through red (no favorite):

What's most notable is that certain match-ups you wouldn't expect to be close have evenly-matched teams. Clemson, despite playing in the play-in game, could be very dangerous for West Virginia. Belmont could be a difficult out for Wisconsin. In fact, once we get to the 5-12 matchups, the probability of the higher-seeded team winning drops off a cliff:

Match-upHigher Seed
Win %
#1 vs. #1698%
#2 vs. #1591%
#3 vs. #1488%
#4 vs. #1382%
#5 vs. #1256%
#7 vs. #1054%
#8 vs. #951%

Also, the four play-in games were also not projected to be tremendously balanced, with one team having a probability of winning of 66‒74% in each match-up. Of the four, there was one "upset," VCU over USC.

Estimates

kenpom includes estimates for scheduled games, which means I had to create an estimate as around half of all tourney match-ups are hypothetical at this point. His formula includes some adjustment for home and away games that I haven't looked into, but since tourney games are all technically neutral-site, from what I gather the victory % estimate is:

Victory Probability

yourVictoryPct = yourRating * (1 - opponentRating) / (yourRating + opponentRating - 2 * yourRating * opponentRating)

where rating is a value from 0 to 1.

Possessions

possessions = yourTempo * opponentTempo / d1AverageTempo

where tempo is an estimate of possessions per game. d1Average indicates an average for all of Division I basketball.

Score

score = yourOffense * opponentDefense * d1AverageOffense / 1000000 * possessions

where offense and defense are kenpom's adjusted offensive and defensive efficiency ratings, and I adjust the losing team's score such that it is no greater than the winning team's score minus 1.

0 comments

Chrome Extension

The official download page is here.

So, for what it's worth, I drew a bit on what I've been developing for Excel to make a quick quote service app for Google Chrome. It also pulls quotes from Bloomberg, Google and Yahoo!, and then adds some charting capabilities.

We'll see how much time I have to work on it and improve things, but I made it because there doesn't seem to be a great variety of finance apps that can get you information on the fixed income, futures and stock markets. So, there you go, a simple little exercise that puts everything you need (and a lot of things you don't) a button click away:

chrome quotes popup window

GOOG's chart in Google Finance

10 comments