As is clear from the issue tracker, I've been a bit negligent of the
BigBang project in the last year.
One sign of this is that I haven't done a 0.2 release despite having that
slated as a milestone due over a year ago.
Thanks to the new interest in the project, I've decided to scrap that
milestone. The next release, 0.2, is going to be dedicated to meeting the
needs of the Data Active group and the analysis of human rights language on
IETF and other related mailing lists.
This sort of project-based release cycle seems like the most realistic path
forward at this point.
Milestone 0.2 is code named Tulip Revolution, which was apparently the name
of a real revolution in Kyrgyzstan in 2005 similar to Ukraine's Orange
Revolution, but in this case the name is chosen in honor of revolutionary
sentiment everywhere and also the Dutch.
Please assign any tickets that are pressing for work on these projects to
Hi BigBang dev,
I've been turning back to this project and trying to get the code on my machine up to date with the subsequent changes to BigBang; in particular, the Analyze Senders notebook.
This pull request (using changes from Niels and some fixes of my own) returns functionality for generating a matrix of similarities, using the new from_header_distance function. The notebook shows walking through this similarity, visualizing it with a color map, finding a cutoff for similarities and consolidating senders.
However, I see also that Seb was working on a separate function to do this with some graph functionality, in `resolve_sender_entities`. When I ran that function on my test mailing list, however, it didn't seem to consolidate anything. Maybe I'm misunderstanding how this function works, but it would be great to know, especially if it gets more accurate similarity calculations or does them faster.