Hi all,
I hope this email finds you all very well!
In preparation for the IETF hackathon I have spent three days
downloading ~33 GB of IETF mailinglist archives from
ftp.ietf.org/ietf-mail-archive/
I will bring it on a disk to the hackathon in London, but am also gonna
make it available as a zipfile from a server, which should allow for
much quicker download. Will share the IP address here soon.
I will also put an archive for all ICANN mailinglists there.
Am still looking for a way to create csv's from the archives to be able
to directly use the archives in bigbang (see discussion underneath with
Sebastian). All input appreciated!
Best,
Niels
-------- Forwarded Message --------
Subject: Re: quick q
Date: Wed, 14 Mar 2018 22:03:36 +0100
From: Niels ten Oever <niels(a)article19.org>
To: Sebastian Benthall <sbenthall(a)gmail.com>
Hi Sebastian,
Not sure if I am doing correct what you're saying, but:
On 03/14/2018 08:20 PM, Sebastian Benthall wrote:
> You may have trouble getting all 33 gigs into memory at the same time.
> I've never tried that.
>
> Have you tried creating an Archive object for just one group, as it
> illustrated in the example notebooks?
>
If I use for instance:
$ python2 bin/collect_mail.py -u
https://www.ietf.org/mail-archive/text/ietf/
I get infinite chardet errors and ends in:
DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
tzinfo.utcoffset() returned 1440; must be in -1439 .. 1439
'ascii' codec can't encode character u'\xe4' in position 1084: ordinal
not in range(128)
Can't export data. Aborting.
So this was not a durable way to get all the mailinglists for the
hackathon, so that is why I used wget to get them.
So now I am looking for a way to make them easily usable for the
participants, but am not sure how to do this.
Not sure which example notebook you meant I can do this with, all the
ones I looked through actually need a csv, or try to download the list
themselves.
Cheers,
Niels
> I believe that when it creates one from raw email it will generate a
> .CSV file of the same data for you.
>
> On Mar 14, 2018 1:41 PM, "Niels ten Oever" <niels(a)article19.org
> <mailto:niels@article19.org>> wrote:
>
> In other words, over the past three days I downloaded all these:
>
> https://www.ietf.org/mail-archive/text/
> <https://www.ietf.org/mail-archive/text/>
>
> And now I would like to import them in BigBang, but not sure what
> command to use.
>
> When I try to use the notebooks they are asking for csv's.
>
> Cheers,
>
> Niels
>
> Niels ten Oever
>
> Article 19
> www.article19.org <http://www.article19.org>
>
> PGP fingerprint 2458 0B70 5C4A FD8A 9488
> 643A 0ED8 3F3A 468A C8B3
>
> On 03/14/2018 06:31 PM, Niels ten Oever wrote:
> > Hiya Seb,
> >
> > All good? I have a quick question. Do you know how I can import
> > emaillists that I already have downloaded? In other words, how do I
> > create csv's of the 33 GB of mailinglists I just harvested :)
> >
> > Hope all is well! I think I will be churning on this stuff this night,
> > so maybe expect some mails later ;) xx
> >
> > ~n.,
> >
>
>
Hello,
It seems as if a few emails from this list did not make it into the archive
because of an error in the import. This is no big deal; those emails
weren't that important.
http://lists.ghserv.net/pipermail/bigbang-dev/2016-October/thread.html
But I thought it would be a good opportunity to send a reminder about why
we use mailing lists with public archives.
The reason is that this is important for handling community growth. Our
archival record is what will make it easier for new contributors to get
involved and get an understanding of the history of the project. Please see
Karl Fogel as a reference
http://producingoss.com/en/growth.html#using-archives
All best,
Seb
It's my pleasure to share with you this draft of the BigBang Governance
Process bylaws:
https://github.com/datactive/bigbang/wiki/Governance
I hope it's also self-explanatory as possible. Please don't hesitate to
raise any questions and concerns about it. I encourage you to review the
references in the Precedent section for more information about where
particular policies have come from.
I expect that other forms of self-regulation will come up over time. For
example, details of how we do code review or manage a release cycle. I
think these decisions can be deferred and left off the bylaws for now, and
documented later as guidelines.
All best,
Seb
Hello,
I have just changed the software license of BigBang from GPLv2 to AGPL-3.0.
Soon, I won't have the ability to make such a decision unilaterally. This
sort of decision will only be made by community agreement. You may all soon
decide to reverse this decision and if so I won't block the change.
However, I wanted to "nudge" the project in this way and explain my
reasoning.
When we started the BigBang project years ago, we had to address the
question of software licensing. For Free Software principles, I wanted to
copyleft the software. This got push back from some potential contributors
in industry and people interested in partnering with the project
commercially.
In the end I decided to stay with the GPL license because of a project
mission that is related to Free Software: the educational mission of open
and reproducible research. One reason to build BigBang is to create a
platform for data science education that is available to everybody. Open
data from the projects that around foundational to the Internet, including
standards and protocols, as well as software, are part of the historical
record of how our world has become what it is. As much as any legal or
cultural history, an understanding of this technical history is essential
to our competence as world citizens. It is our shared inheritance. BigBang
is designed to make this technical history available to scholars and
students in the interest of an informed digital citizenry.
The GNU copyleft philosophy is aligned with this educational mission. My
hope is that as contributors to BigBang, we will be conscious of our work
being a contribution to global, civil, and collective self-understanding.
This self-understanding cannot be proprietary; it must be held in common.
GNU licenses prevent "forks" or downstream development work on the project
from being incorporated into proprietary systems. This will give us peace
of mind: our work will be harder to coopt for the kind of private interests
that would make our civil understanding of technology even more difficult.
We met an obstacle early on in committing to these goals through our
software license. This project began at UC Berkeley. And while Berkeley has
a long history of contributions to open source software, we discovered that
it was not an environment compatible with Free Software principles. I have
to give Nick Doty tremendous credit for his patient navigation of
Berkeley's bureaucracy to get an answer to some questions about
intellectual property. Among his findings were that Berkeley permits use of
GPLv2, but forbids use of GPLv3. This is most likely because Berkeley would
like to reserve the right to patent software created by its students, even
if that software is originally licensed open source.
Therefore one of the motivations for moving the BigBang project to
DATACTIVE infrastructure is to position the software as a project of the
University of Amsterdam, rather than of UC Berkeley. I have to ask our
contributors from UvA to follow up on this point. But my hunch is that UvA
is less aggressive about defending its privilege of software patenting than
UC Berkeley.
As the likelihood of Berkeley making any claim to patent contributions to
BigBang is in fact quite low, I must admit that this change in licensing
and the timing of it is perhaps as much a symbolic gesture than a
substantive one. Putting the project under the auspices of DATACTIVE is in
more ways than one a commitment of this project to the public interest. It
may even be a way of steering the project towards a more European vision of
the public interest than a Californian one. All these questions will
ultimately be up to the community to decide.
Thanks for your patience with this long email. More information about GNU
licenses can be found here:
https://www.gnu.org/licenses/rms-why-gplv3.htmlhttps://www.gnu.org/licenses/licenses.html#AGPL
Cheers,
Seb
As part of our transfer of the project over to Datactive, the official
website of the project has now changed. It's URL is:
http://datactive.github.io/bigbang/
This website can be edited by editing the Jekyll templates of this branch
of the git repository:
https://github.com/datactive/bigbang/tree/gh-pages
More info here: https://pages.github.com/
You will note that many of the links on this website are now *broken*,
since they point erroneously to the sbenthall/bigbang project.
These links have to be fixed as part of this ticket:
https://github.com/datactive/bigbang/issues/264
Is there anyone willing to take on this issue of updating the website?
All best,
Seb
*p.s. I've been copying the NOW DEPRECATED bigbang-dev(a)lists.sudoroom.org
<bigbang-dev(a)lists.sudoroom.org> on these changes, because I haven't wanted
to lose anybody in this period of rapid transition. However, soon I will
not send any more updates to the old list. If you are interested in
participating in BigBang development and have not already done so, please
subscribe to:*
*http://lists.ghserv.net/mailman/listinfo/bigbang-dev
<http://lists.ghserv.net/mailman/listinfo/bigbang-dev>*
*Thanks!*
Hello,
This is an *important* update regarding BigBang.
We are in the process of *transfering* the GitHub project to a new
location. This means we are moving the issues, milestones, and wiki of the
project. This is in order to facilitate moving the project infrastructure
into a place where it can be managed by the community. (Big thanks to
Harsh Gupta for suggesting this!)
The project currently located at :
https://github.com/sbenthall/bigbang
will be moved to the new location:
https://github.com/datactive/bigbang
Because there is already a *fork* of bigbang in the *datactive*
organization, this is a little tricky. I will be *deleting the fork* so
that there's no conflict when I *transfer the project*.
In preparation for this, I have copied the issues from datactive/bigbang to
sbenthall/bigbang, so that history is preserved.
I'll be doing these steps momentarily. This email is to make it clear what
I'm doing so nobody is surprised.
Hello everyone!
I'm excited to announce that we are officially changing our mailing list
hosts.
To stay involved in the BigBang project, please subscribe to this mailing
list:
*bigbang-dev(a)data-activism.net <bigbang-dev(a)data-activism.net>*
https://lists.ghserv.net/mailman/listinfo/bigbang-dev
This change is the first in several that mark the transition of core
BigBang infrastructure into the stewardship of the DATACTIVE
<https://data-activism.net/> research collective, and the expansion of our
community. Big thanks to Niels for setting up these lists.
The archives from the old lists have been transferred to the new Mailman
instance, hosted by GreenHost. Thanks to Sudo Room (Max, cc'ed) for setting
up the original lists.
Cheers,
Seb
Hello,
Thanks to Niels, we will soon be moving our official mailing lists off of
the Sudo Room mailman instance and onto Greenhost, ("the social enterprise
with high security and privacy awareness with which both Datactive and
Article19 have hosted all their services")
http://lists.ghserv.net/mailman/listinfo/bigbang-devhttp://lists.ghserv.net/mailman/listinfo/bigbang-user
Please sign up for the new dev list now!
We will continue to maintain this list until we have been able to update
all our websites and documentation to reflect the change.
We will also migrate the old mailing list archives to this new list.
Cheers,
Seb