Saturday, May 10, 2008

Why I oughtn't be allowed out on low sleep

I was reading Charles Petzold's blog again, and it was talking about Desk Set, which is my favorite movie of my mom's favorite movies (creepy yet?), and I realized that the reason I think Spencer Tracy is cool is because in that movie he plays a computer salesman. Also, it's definitely the best appearance of a computer in a movie ever, including 2001, which is great, but Hal would never really do that, and WarGames, because, seriously, WOPR would have just realized that the human that told it to play Tic-Tac-Toe was a Communist and needed to be killed. SO unrealistic.

Monday, May 5, 2008

The Joy of Stylometry

From Wikipedia:

The primary stylometric method is the writer invariant: a property of a text which is invariant of its author. An example of a writer invariant is frequency of function words used by the writer.

In one such method, the text is analyzed to find the 50 most common words. The text is then broken into 5,000 word chunks and each of the chunks is analyzed to find the frequency of those 50 words in that chunk. This generates a unique 50-number identifier for each chunk. These numbers place each chunk of text into a point in a 50-dimensional space. This 50-dimensional space is flattened into a plane using principal components analysis (PCA). This results in a display of points that correspond to an author's style. If two literary works are placed on the same plane, the resulting pattern may show if both works were by the same author or different authors.

Early efforts were not always successful: in 1901, one researcher attempted to use John Fletcher's preference for "'em," the contractional form of "them," as a marker to distinguish between Fletcher and Philip Massinger in their collaborations—but he mistakenly employed an edition of Massinger's works in which the editor had expanded all instances of "'em" to "them."

In the early 1960s, Rev. A. Q. Morton produced a computer analysis of the fourteen Epistles of the New Testament attributed to St. Paul, which showed that six different authors had written that body of work. A check of his method, applied to the works of James Joyce, gave the result that Ulysses was written by five separate individuals, none of whom had any part in A Portrait of the Artist as a Young Man.