Re: [Bigbang-dev] Default ICANN data in BigBang package?

16 Aug 2016

I agree that it doesn't seem like a good practice to include the data itself in the
research tool repository. Maybe we could just include a version of the script that would
download all the data needed for that group of notebooks? So for people getting started
with that set of notebooks, the instructions would be to clone the repository, run setup
as necessary, and then run a script that downloads the relevant mail.
I think with many of the notebooks now, there are function calls with URLs in place for
gathering the appropriate mail archive, and that if the archive is already downloaded, it
doesn't have to repeat the process.
...
  On Aug 16, 2016, at 11:44 AM, Sebastian Benthall
&lt;sbenthall(a)gmail.com&gt; wrote:
 Generally it's good not to have all the data one works with checked into version
control.
 Actually currently no data is checked into version control. When you install BigBang you
have to run the collect_mail scripts before getting anything out of the notebooks.
 If there's a project that uses BigBang for extensive analysis of data from a single
source, then it's probably best to keep that as a fork and have it update from the
core repository.
 What I'm wondering now is whether all, some, or none of the Summer School notebooks
should make it in as is. Currently there are many near-duplicate notebooks in the
examples/ directory, along with a lot of other stuff from previous uses of the software.
 Some hard work that's going to need to happen soon is pruning and standardizing the
stuff in that directory. Along the way we should come up with code quality guidelines and
standards for new notebooks.
 On Tue, Aug 16, 2016 at 11:23 AM, Niels ten Oever &lt;niels(a)article19.org
<mailto:niels@article19.org>> wrote:
 Hi Sebastian,
 We can include the ICANN data, and soon we should also be able to
 introduce IETF data :)
 Cheers,
 Niels
 Niels ten Oever
 Head of Digital
 Article 19
 www.article19.org <http://www.article19.org/>
 PGP fingerprint    8D9F C567 BEE4 A431 56C4
                    678B 08B5 A0F2 636D 68E9
 On 08/16/2016 12:31 PM, Sebastian Benthall wrote:
  Many of the new notebooks from the DMI Summer
School are designed to
 work with a subset of ICANN email data having to do with human rights.
 Ideally, what gets included in the core BigBang repository is easy for
 people to started with. That's why all the other notebooks have used
 just a few SciPy mailing lists.
 I'm wondering whether we should include the ICANN data in the core
 BigBang repository.
 I don't think there's a privacy issue with that, though maybe somebody
 else might have a reason to object.
 It would also be a strong signal that BigBang is now intended to be used
 to analyze Internet governance, not just open source communities.
 Thoughts?
 - s 
 _______________________________________________
 BigBang-dev mailing list
 BigBang-dev(a)lists.sudoroom.org
 https://sudoroom.org/lists/listinfo/bigbang-dev

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

Re: [Bigbang-dev] Default ICANN data in BigBang package?