[Bigbang-dev] Default ICANN data in BigBang package?

Tue Aug 16 11:56:25 PDT 2016

I like this idea!

For the summer school there was extra documentation on the wiki:

https://github.com/nllz/bigbang/wiki

I forgot that I had the ICANN data stored on the I School server for it.

We may be able to use GitHub for storage of static files to make getting
started with the notebooks even easier/less dependent on potentially
finicky remote email archive servers.

On Tue, Aug 16, 2016 at 11:49 AM, Nick Doty <npdoty at ischool.berkeley.edu>
wrote:

> I agree that it doesn't seem like a good practice to include the data
> itself in the research tool repository. Maybe we could just include a
> version of the script that would download all the data needed for that
> group of notebooks? So for people getting started with that set of
> notebooks, the instructions would be to clone the repository, run setup as
> necessary, and then run a script that downloads the relevant mail.
>
> I think with many of the notebooks now, there are function calls with URLs
> in place for gathering the appropriate mail archive, and that if the
> archive is already downloaded, it doesn't have to repeat the process.
>
> On Aug 16, 2016, at 11:44 AM, Sebastian Benthall <sbenthall at gmail.com>
> wrote:
>
> Generally it's good not to have all the data one works with checked into
> version control.
>
> Actually currently *no* data is checked into version control. When you
> install BigBang you have to run the collect_mail scripts before getting
> anything out of the notebooks.
>
> If there's a project that uses BigBang for extensive analysis of data from
> a single source, then it's probably best to keep that as a fork and have it
> update from the core repository.
>
> What I'm wondering now is whether all, some, or none of the Summer School
> notebooks should make it in as is. Currently there are many near-duplicate
> notebooks in the examples/ directory, along with a lot of other stuff from
> previous uses of the software.
>
> Some hard work that's going to need to happen soon is pruning and
> standardizing the stuff in that directory. Along the way we should come up
> with code quality guidelines and standards for new notebooks.
>
> On Tue, Aug 16, 2016 at 11:23 AM, Niels ten Oever <niels at article19.org>
> wrote:
>
>> Hi Sebastian,
>>
>> We can include the ICANN data, and soon we should also be able to
>> introduce IETF data :)
>>
>> Cheers,
>>
>> Niels
>>
>>
>> Niels ten Oever
>> Head of Digital
>>
>> Article 19
>> www.article19.org
>>
>> PGP fingerprint    8D9F C567 BEE4 A431 56C4
>>                    678B 08B5 A0F2 636D 68E9
>>
>> On 08/16/2016 12:31 PM, Sebastian Benthall wrote:
>> > Many of the new notebooks from the DMI Summer School are designed to
>> > work with a subset of ICANN email data having to do with human rights.
>> >
>> > Ideally, what gets included in the core BigBang repository is easy for
>> > people to started with. That's why all the other notebooks have used
>> > just a few SciPy mailing lists.
>> >
>> > I'm wondering whether we should include the ICANN data in the core
>> > BigBang repository.
>> >
>> > I don't think there's a privacy issue with that, though maybe somebody
>> > else might have a reason to object.
>> >
>> > It would also be a strong signal that BigBang is now intended to be used
>> > to analyze Internet governance, not just open source communities.
>> >
>> > Thoughts?
>> >
>> > - s
>>
>>
> _______________________________________________
> BigBang-dev mailing list
> BigBang-dev at lists.sudoroom.org
> https://sudoroom.org/lists/listinfo/bigbang-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sudoroom.org/pipermail/bigbang-dev/attachments/20160816/ac8946a7/attachment.html>