I love this paper on communication styles in the Linux kernel:
Training a classifier to distinguish two particular authors, leaders in the Linux development community, based on lexical choices. Use of "sorry", "thanks", "actually", "never" and expletives are most discriminating.
It makes me wonder whether this would also be an interesting characteristic of one mailing list compared to another, in addition to distinguish individual authors. "Where does your open source community fall on the Actually-Thanks Spectrum (TM)?"
I'd love to see this as part of BigBang, particularly if that kind of lexical analysis or Bayes classification would be useful for lots of research questions.
BigBang-dev mailing list