Hi all,
I hope this email finds you all very well!
In preparation for the IETF hackathon I have spent three days
downloading ~33 GB of IETF mailinglist archives from
ftp.ietf.org/ietf-mail-archive/
I will bring it on a disk to the hackathon in London, but am also gonna
make it available as a zipfile from a server, which should allow for
much quicker download. Will share the IP address here soon.
I will also put an archive for all ICANN mailinglists there.
Am still looking for a way to create csv's from the archives to be able
to directly use the archives in bigbang (see discussion underneath with
Sebastian). All input appreciated!
Best,
Niels
-------- Forwarded Message --------
Subject: Re: quick q
Date: Wed, 14 Mar 2018 22:03:36 +0100
From: Niels ten Oever <niels(a)article19.org>
To: Sebastian Benthall <sbenthall(a)gmail.com>
Hi Sebastian,
Not sure if I am doing correct what you're saying, but:
On 03/14/2018 08:20 PM, Sebastian Benthall wrote:
You may have trouble getting all 33 gigs into memory
at the same time.
I've never tried that.
Have you tried creating an Archive object for just one group, as it
illustrated in the example notebooks?
If I use for instance:
$ python2 bin/collect_mail.py -u
https://www.ietf.org/mail-archive/text/ietf/
I get infinite chardet errors and ends in:
DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
tzinfo.utcoffset() returned 1440; must be in -1439 .. 1439
'ascii' codec can't encode character u'\xe4' in position 1084:
ordinal
not in range(128)
Can't export data. Aborting.
So this was not a durable way to get all the mailinglists for the
hackathon, so that is why I used wget to get them.
So now I am looking for a way to make them easily usable for the
participants, but am not sure how to do this.
Not sure which example notebook you meant I can do this with, all the
ones I looked through actually need a csv, or try to download the list
themselves.
Cheers,
Niels
I believe that when it creates one from raw email it
will generate a
.CSV file of the same data for you.
On Mar 14, 2018 1:41 PM, "Niels ten Oever" <niels(a)article19.org
<mailto:niels@article19.org>> wrote:
In other words, over the past three days I downloaded all these:
https://www.ietf.org/mail-archive/text/
<https://www.ietf.org/mail-archive/text/>
And now I would like to import them in BigBang, but not sure what
command to use.
When I try to use the notebooks they are asking for csv's.
Cheers,
Niels
Niels ten Oever
Article 19
www.article19.org <http://www.article19.org>
PGP fingerprint 2458 0B70 5C4A FD8A 9488
643A 0ED8 3F3A 468A C8B3
On 03/14/2018 06:31 PM, Niels ten Oever wrote:
Hiya Seb,
All good? I have a quick question. Do you know how I can import
emaillists that I already have downloaded? In other words, how do I
create csv's of the 33 GB of mailinglists I just harvested :)
Hope all is well! I think I will be churning on this stuff this night,
so maybe expect some mails later ;) xx
~n.,