I love this paper on communication styles in the Linux kernel:
Training a classifier to distinguish two particular authors, leaders in the Linux development community, based on lexical choices. Use of "sorry", "thanks", "actually", "never" and expletives are most discriminating.
It makes me wonder whether this would also be an interesting characteristic of one mailing list compared to another, in addition to distinguish individual authors. "Where does your open source community fall on the Actually-Thanks Spectrum (TM)?"
I'd love to see this as part of BigBang, particularly if that kind of lexical analysis or Bayes classification would be useful for lots of research questions.
Hello BigBang developers,
I'm Harsh Gupta, a fifth year undergrad student at Indian Institute of
Technology Kharagpur, studying Mathematics and Computing. I was an
intern at CIS India  this summer and used BigBang to do diversity
analysis on the participants of the Encrypted Media Extensions (EME)
debate happening at W3C . I met Sebastian at SciPy in July where he
told me about the plans to use to BigBang to analyze ICANN. I'm in
general interested in the social + political dimensions of technology.
Where can I read more details of the research you plan to carry out, and
what are the ways I can get involved?
I'm hoping to churn out a thesis out the work, a part of which I need to
submit by the end of this semester.
Many of the new notebooks from the DMI Summer School are designed to work
with a subset of ICANN email data having to do with human rights.
Ideally, what gets included in the core BigBang repository is easy for
people to started with. That's why all the other notebooks have used just a
few SciPy mailing lists.
I'm wondering whether we should include the ICANN data in the core BigBang
I don't think there's a privacy issue with that, though maybe somebody else
might have a reason to object.
It would also be a strong signal that BigBang is now intended to be used to
analyze Internet governance, not just open source communities.
I've merged in the outstanding pull requests to the core BigBang repo here:
As a next step, I'm going to merge in the changes from Niels's fork that we
used a the DMI Summer School:
The process for this is going to be somewhat labor intensive, as it will
require a local merge between two widely diverged forks. So stay tuned for
changes on this in the coming weeks.
It's my pleasure to make a number of announcements regarding the BigBang
*Successful DMI Summer School Workshop*
A working group of roughly 12 people used BigBang to study the impact of
civil society in promoting human rights in ICANN at the Digital Methods
Initiative Summer School
<https://www.digitalmethods.net/Dmi/SummerSchool2016> this summer. Huge
thanks to Niels ten Oever, Davide Beraldo, and DATACTIVE
<https://data-activism.net/> for their hard work and support in making this
The hard work for this workshop happened on Niels' BigBang fork. Check it
In addition to the opportunity to use BigBang as a tool for data science
education, this initiative was an important turning point for BigBang's
anticipated use cases and development.
*Release v0.2 "Tulip Revolution" Upcoming*
The next planned release of BigBang will be version 0.2, "Tulip
Revolution". The purpose of this release will be to consolidate
contributions made in preparation, during, and following the workshop.
*Anticipated change in governance structure and primary repository*
With the contributions of DATACTIVE, there's great promise in BigBang's
growth as a community project.
These changes will require some changes to the governance structure. Over
the course of the work towards, we will be transitioning from a "Benevolent
model governance model to a "Consensus Based Democracy
This is to give more of the contributors ownership over the project as we
I will be drafting community bylaws during the transition. Voting rights
will be, as is standard, tied to commit access rights.
As part of this change, the primary repository for BigBang will shift from
my personal GitHub account to the datactive organization on GitHub:
*What happens next*
These changes are big. They will happen slowly over the course of the next
couple of months. As current Benevolent Dictator, I want to put the project
on the surest footing moving forward. So please stay tuned for further
What happens next is merging together all the great work that's happened
If you think you will have time to work on this, please let me know and
we'll work out a system for delegating tasks.