Hi all,
I hope this email finds you all very well!
In preparation for the IETF hackathon I have spent three days
downloading ~33 GB of IETF mailinglist archives from
ftp.ietf.org/ietf-mail-archive/
I will bring it on a disk to the hackathon in London, but am also gonna
make it available as a zipfile from a server, which should allow for
much quicker download. Will share the IP address here soon.
I will also put an archive for all ICANN mailinglists there.
Am still looking for a way to create csv's from the archives to be able
to directly use the archives in bigbang (see discussion underneath with
Sebastian). All input appreciated!
Best,
Niels
-------- Forwarded Message --------
Subject: Re: quick q
Date: Wed, 14 Mar 2018 22:03:36 +0100
From: Niels ten Oever <niels(a)article19.org>
To: Sebastian Benthall <sbenthall(a)gmail.com>
Hi Sebastian,
Not sure if I am doing correct what you're saying, but:
On 03/14/2018 08:20 PM, Sebastian Benthall wrote:
> You may have trouble getting all 33 gigs into memory at the same time.
> I've never tried that.
>
> Have you tried creating an Archive object for just one group, as it
> illustrated in the example notebooks?
>
If I use for instance:
$ python2 bin/collect_mail.py -u
https://www.ietf.org/mail-archive/text/ietf/
I get infinite chardet errors and ends in:
DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
tzinfo.utcoffset() returned 1440; must be in -1439 .. 1439
'ascii' codec can't encode character u'\xe4' in position 1084: ordinal
not in range(128)
Can't export data. Aborting.
So this was not a durable way to get all the mailinglists for the
hackathon, so that is why I used wget to get them.
So now I am looking for a way to make them easily usable for the
participants, but am not sure how to do this.
Not sure which example notebook you meant I can do this with, all the
ones I looked through actually need a csv, or try to download the list
themselves.
Cheers,
Niels
> I believe that when it creates one from raw email it will generate a
> .CSV file of the same data for you.
>
> On Mar 14, 2018 1:41 PM, "Niels ten Oever" <niels(a)article19.org
> <mailto:niels@article19.org>> wrote:
>
> In other words, over the past three days I downloaded all these:
>
> https://www.ietf.org/mail-archive/text/
> <https://www.ietf.org/mail-archive/text/>
>
> And now I would like to import them in BigBang, but not sure what
> command to use.
>
> When I try to use the notebooks they are asking for csv's.
>
> Cheers,
>
> Niels
>
> Niels ten Oever
>
> Article 19
> www.article19.org <http://www.article19.org>
>
> PGP fingerprint 2458 0B70 5C4A FD8A 9488
> 643A 0ED8 3F3A 468A C8B3
>
> On 03/14/2018 06:31 PM, Niels ten Oever wrote:
> > Hiya Seb,
> >
> > All good? I have a quick question. Do you know how I can import
> > emaillists that I already have downloaded? In other words, how do I
> > create csv's of the 33 GB of mailinglists I just harvested :)
> >
> > Hope all is well! I think I will be churning on this stuff this night,
> > so maybe expect some mails later ;) xx
> >
> > ~n.,
> >
>
>