Hello everyone,
I wanted to make a quick announcement: today I've finished the merge from:
* the nllz master branch used at the DMI Summer School, to
* the sbenthall master branch, current the project "trunk"
Thanks so much to Davide and Niels for their contributions via the summer
school!
I the process of this work, I removed some of the more unfinished and
duplicate notebooks, but most of the notebooks were merged in as-is.
This puts off some important decisions to be made about data and notebooks
in the project trunk, but I think it's better if these decisions are made
with more community dialogue. I'll raise a number of proposals for this as
we approach a 0.2 release.
All best,
Seb
I love this paper on communication styles in the Linux kernel:
http://www.opensym.org/os2016/proceedings-files/p101-schneider.pdf
Training a classifier to distinguish two particular authors, leaders in the Linux development community, based on lexical choices. Use of "sorry", "thanks", "actually", "never" and expletives are most discriminating.
It makes me wonder whether this would also be an interesting characteristic of one mailing list compared to another, in addition to distinguish individual authors. "Where does your open source community fall on the Actually-Thanks Spectrum (TM)?"
I'd love to see this as part of BigBang, particularly if that kind of lexical analysis or Bayes classification would be useful for lots of research questions.
Thanks,
Nick
Hello BigBang developers,
I'm Harsh Gupta, a fifth year undergrad student at Indian Institute of
Technology Kharagpur, studying Mathematics and Computing. I was an
intern at CIS India [1] this summer and used BigBang to do diversity
analysis on the participants of the Encrypted Media Extensions (EME)
debate happening at W3C [2]. I met Sebastian at SciPy in July where he
told me about the plans to use to BigBang to analyze ICANN. I'm in
general interested in the social + political dimensions of technology.
Where can I read more details of the research you plan to carry out, and
what are the ways I can get involved?
I'm hoping to churn out a thesis out the work, a part of which I need to
submit by the end of this semester.
[1]: http://cis-india.org/
[2]: https://github.com/sbenthall/bigbang/pull/259
--
Harsh Gupta
mail(a)hargup.in
Many of the new notebooks from the DMI Summer School are designed to work
with a subset of ICANN email data having to do with human rights.
Ideally, what gets included in the core BigBang repository is easy for
people to started with. That's why all the other notebooks have used just a
few SciPy mailing lists.
I'm wondering whether we should include the ICANN data in the core BigBang
repository.
I don't think there's a privacy issue with that, though maybe somebody else
might have a reason to object.
It would also be a strong signal that BigBang is now intended to be used to
analyze Internet governance, not just open source communities.
Thoughts?
- s
Hello,
I've merged in the outstanding pull requests to the core BigBang repo here:
https://github.com/sbenthall/bigbang
As a next step, I'm going to merge in the changes from Niels's fork that we
used a the DMI Summer School:
https://github.com/nllz/bigbang
The process for this is going to be somewhat labor intensive, as it will
require a local merge between two widely diverged forks. So stay tuned for
changes on this in the coming weeks.
All best,
Seb
It's my pleasure to make a number of announcements regarding the BigBang
project.
*Successful DMI Summer School Workshop*
A working group of roughly 12 people used BigBang to study the impact of
civil society in promoting human rights in ICANN at the Digital Methods
Initiative Summer School
<https://www.digitalmethods.net/Dmi/SummerSchool2016> this summer. Huge
thanks to Niels ten Oever, Davide Beraldo, and DATACTIVE
<https://data-activism.net/> for their hard work and support in making this
happen.
The hard work for this workshop happened on Niels' BigBang fork. Check it
out!
https://github.com/nllz/bigbang
In addition to the opportunity to use BigBang as a tool for data science
education, this initiative was an important turning point for BigBang's
anticipated use cases and development.
*Release v0.2 "Tulip Revolution" Upcoming*
The next planned release of BigBang will be version 0.2, "Tulip
Revolution". The purpose of this release will be to consolidate
contributions made in preparation, during, and following the workshop.
*Anticipated change in governance structure and primary repository*
With the contributions of DATACTIVE, there's great promise in BigBang's
growth as a community project.
These changes will require some changes to the governance structure. Over
the course of the work towards, we will be transitioning from a "Benevolent
Dictator <http://producingoss.com/en/producingoss.html#benevolent-dictator>"
model governance model to a "Consensus Based Democracy
<http://producingoss.com/en/producingoss.html#consensus-democracy>" model.
This is to give more of the contributors ownership over the project as we
move forward.
I will be drafting community bylaws during the transition. Voting rights
will be, as is standard, tied to commit access rights.
As part of this change, the primary repository for BigBang will shift from
my personal GitHub account to the datactive organization on GitHub:
https://github.com/datactive/bigbang
*What happens next*
These changes are big. They will happen slowly over the course of the next
couple of months. As current Benevolent Dictator, I want to put the project
on the surest footing moving forward. So please stay tuned for further
announcements.
What happens next is merging together all the great work that's happened
this summer.
If you think you will have time to work on this, please let me know and
we'll work out a system for delegating tasks.
All best,
Seb
As is clear from the issue tracker, I've been a bit negligent of the
BigBang project in the last year.
One sign of this is that I haven't done a 0.2 release despite having that
slated as a milestone due over a year ago.
Thanks to the new interest in the project, I've decided to scrap that
milestone. The next release, 0.2, is going to be dedicated to meeting the
needs of the Data Active group and the analysis of human rights language on
IETF and other related mailing lists.
https://github.com/sbenthall/bigbang/issues?q=is%3Aopen+is%3Aissue+mileston…
This sort of project-based release cycle seems like the most realistic path
forward at this point.
Milestone 0.2 is code named Tulip Revolution, which was apparently the name
of a real revolution in Kyrgyzstan in 2005 similar to Ukraine's Orange
Revolution, but in this case the name is chosen in honor of revolutionary
sentiment everywhere and also the Dutch.
Please assign any tickets that are pressing for work on these projects to
this milestone.
Cheers,
Seb
Hi BigBang dev,
I've been turning back to this project and trying to get the code on my machine up to date with the subsequent changes to BigBang; in particular, the Analyze Senders notebook.
This pull request (using changes from Niels and some fixes of my own) returns functionality for generating a matrix of similarities, using the new from_header_distance function. The notebook shows walking through this similarity, visualizing it with a color map, finding a cutoff for similarities and consolidating senders.
https://github.com/sbenthall/bigbang/pull/242
However, I see also that Seb was working on a separate function to do this with some graph functionality, in `resolve_sender_entities`. When I ran that function on my test mailing list, however, it didn't seem to consolidate anything. Maybe I'm misunderstanding how this function works, but it would be great to know, especially if it gets more accurate similarity calculations or does them faster.
Thanks,
Nick
The notebooks in the BigBang examples directory have been version 3 since
the project started.
In the most recent version of IPython, the notebooks have been updated to
v4 to include (among other things?) the language-agnostic Jupyter branding.
I'm going to update these notebooks in a commit soon, as per this issue:
https://github.com/sbenthall/bigbang/issues/174
Please update your IPython packages in your Python environment so that you
can use the new notebooks.
Thanks,
Seb