Default ICANN data in BigBang package?

List overview All Threads
Download

newer

older

An Introduction

Merging in DMI Summer School...

Sebastian Benthall

16 Aug 2016 16 Aug '16

10:31 a.m.

Many of the new notebooks from the DMI Summer School are designed to work with a subset of ICANN email data having to do with human rights. Ideally, what gets included in the core BigBang repository is easy for people to started with. That's why all the other notebooks have used just a few SciPy mailing lists. I'm wondering whether we should include the ICANN data in the core BigBang repository. I don't think there's a privacy issue with that, though maybe somebody else might have a reason to object. It would also be a strong signal that BigBang is now intended to be used to analyze Internet governance, not just open source communities. Thoughts? - s

Attachments:

attachment.html (text/html — 846 bytes)

Show replies by thread

Niels ten Oever

16 Aug 16 Aug

11:23 a.m.

Hi Sebastian, We can include the ICANN data, and soon we should also be able to introduce IETF data :) Cheers, Niels Niels ten Oever Head of Digital Article 19 www.article19.org PGP fingerprint 8D9F C567 BEE4 A431 56C4 678B 08B5 A0F2 636D 68E9 On 08/16/2016 12:31 PM, Sebastian Benthall wrote:

...

Sebastian Benthall

11:44 a.m.

Generally it's good not to have all the data one works with checked into version control. Actually currently *no* data is checked into version control. When you install BigBang you have to run the collect_mail scripts before getting anything out of the notebooks. If there's a project that uses BigBang for extensive analysis of data from a single source, then it's probably best to keep that as a fork and have it update from the core repository. What I'm wondering now is whether all, some, or none of the Summer School notebooks should make it in as is. Currently there are many near-duplicate notebooks in the examples/ directory, along with a lot of other stuff from previous uses of the software. Some hard work that's going to need to happen soon is pruning and standardizing the stuff in that directory. Along the way we should come up with code quality guidelines and standards for new notebooks. On Tue, Aug 16, 2016 at 11:23 AM, Niels ten Oever <niels(a)article19.org> wrote:

...

Nick Doty

11:49 a.m.

...

On Aug 16, 2016, at 11:44 AM, Sebastian Benthall <sbenthall(a)gmail.com> wrote: Generally it's good not to have all the data one works with checked into version control. Actually currently no data is checked into version control. When you install BigBang you have to run the collect_mail scripts before getting anything out of the notebooks. If there's a project that uses BigBang for extensive analysis of data from a single source, then it's probably best to keep that as a fork and have it update from the core repository. What I'm wondering now is whether all, some, or none of the Summer School notebooks should make it in as is. Currently there are many near-duplicate notebooks in the examples/ directory, along with a lot of other stuff from previous uses of the software. Some hard work that's going to need to happen soon is pruning and standardizing the stuff in that directory. Along the way we should come up with code quality guidelines and standards for new notebooks. On Tue, Aug 16, 2016 at 11:23 AM, Niels ten Oever <niels(a)article19.org <mailto:niels@article19.org>> wrote: Hi Sebastian, We can include the ICANN data, and soon we should also be able to introduce IETF data :) Cheers, Niels Niels ten Oever Head of Digital Article 19 www.article19.org <http://www.article19.org/> PGP fingerprint 8D9F C567 BEE4 A431 56C4 678B 08B5 A0F2 636D 68E9 On 08/16/2016 12:31 PM, Sebastian Benthall wrote: _______________________________________________ BigBang-dev mailing list BigBang-dev(a)lists.sudoroom.org https://sudoroom.org/lists/listinfo/bigbang-dev

Sebastian Benthall

11:56 a.m.

I like this idea! For the summer school there was extra documentation on the wiki: https://github.com/nllz/bigbang/wiki I forgot that I had the ICANN data stored on the I School server for it. We may be able to use GitHub for storage of static files to make getting started with the notebooks even easier/less dependent on potentially finicky remote email archive servers. On Tue, Aug 16, 2016 at 11:49 AM, Nick Doty <npdoty(a)ischool.berkeley.edu> wrote:

...

I agree that it doesn't seem like a good practice to include the data itself in the research tool repository. Maybe we could just include a version of the script that would download all the data needed for that group of notebooks? So for people getting started with that set of notebooks, the instructions would be to clone the repository, run setup as necessary, and then run a script that downloads the relevant mail. I think with many of the notebooks now, there are function calls with URLs in place for gathering the appropriate mail archive, and that if the archive is already downloaded, it doesn't have to repeat the process. On Aug 16, 2016, at 11:44 AM, Sebastian Benthall <sbenthall(a)gmail.com> wrote: Generally it's good not to have all the data one works with checked into version control. Actually currently *no* data is checked into version control. When you install BigBang you have to run the collect_mail scripts before getting anything out of the notebooks. If there's a project that uses BigBang for extensive analysis of data from a single source, then it's probably best to keep that as a fork and have it update from the core repository. What I'm wondering now is whether all, some, or none of the Summer School notebooks should make it in as is. Currently there are many near-duplicate notebooks in the examples/ directory, along with a lot of other stuff from previous uses of the software. Some hard work that's going to need to happen soon is pruning and standardizing the stuff in that directory. Along the way we should come up with code quality guidelines and standards for new notebooks. On Tue, Aug 16, 2016 at 11:23 AM, Niels ten Oever <niels(a)article19.org> wrote: _______________________________________________ BigBang-dev mailing list BigBang-dev(a)lists.sudoroom.org https://sudoroom.org/lists/listinfo/bigbang-dev

3254

days inactive

3254

days old

bigbang-dev@sudoroom.org

Manage subscription

4 comments

3 participants

tags (0)

participants (3)

Nick Doty
Niels ten Oever
Sebastian Benthall