[Bigbang-dev] IETF archive

Niels ten Oever niels at article19.org
Wed Mar 14 14:36:41 PDT 2018


Hi all,

I hope this email finds you all very well!

In preparation for the IETF hackathon I have spent three days
downloading ~33 GB of IETF mailinglist archives from
ftp.ietf.org/ietf-mail-archive/

I will bring it on a disk to the hackathon in London, but am also gonna
make it available as a zipfile from a server, which should allow for
much quicker download. Will share the IP address here soon.
I will also put an archive for all ICANN mailinglists there.

Am still looking for a way to create csv's from the archives to be able
to directly use the archives in bigbang (see discussion underneath with
Sebastian). All input appreciated!

Best,

Niels


-------- Forwarded Message --------
Subject: Re: quick q
Date: Wed, 14 Mar 2018 22:03:36 +0100
From: Niels ten Oever <niels at article19.org>
To: Sebastian Benthall <sbenthall at gmail.com>

Hi Sebastian,

Not sure if I am doing correct what you're saying, but:

On 03/14/2018 08:20 PM, Sebastian Benthall wrote:
> You may have trouble getting all 33 gigs into memory at the same time.
> I've never tried that.
> 
> Have you tried creating an Archive object for just one group, as it
> illustrated in the example notebooks?
> 

If I use for instance:

$ python2 bin/collect_mail.py -u
https://www.ietf.org/mail-archive/text/ietf/

I get infinite chardet errors and ends in:

DEBUG:chardet.charsetprober:windows-1255 Hebrew confidence = 0.0
tzinfo.utcoffset() returned 1440; must be in -1439 .. 1439
'ascii' codec can't encode character u'\xe4' in position 1084: ordinal
not in range(128)
Can't export data. Aborting.

So this was not a durable way to get all the mailinglists for the
hackathon, so that is why I used wget to get them.

So now I am looking for a way to make them easily usable for the
participants, but am not sure how to do this.

Not sure which example notebook you meant I can do this with, all the
ones I looked through actually need a csv, or try to download the list
themselves.

Cheers,

Niels


> I believe that when it creates one from raw email it will generate a
> .CSV file of the same data for you.
> 
> On Mar 14, 2018 1:41 PM, "Niels ten Oever" <niels at article19.org
> <mailto:niels at article19.org>> wrote:
> 
>     In other words, over the past three days I downloaded all these:
> 
>     https://www.ietf.org/mail-archive/text/
>     <https://www.ietf.org/mail-archive/text/>
> 
>     And now I would like to import them in BigBang, but not sure what
>     command to use.
> 
>     When I try to use the notebooks they are asking for csv's.
> 
>     Cheers,
> 
>     Niels
> 
>     Niels ten Oever
> 
>     Article 19
>     www.article19.org <http://www.article19.org>
> 
>     PGP fingerprint    2458 0B70 5C4A FD8A 9488
>                        643A 0ED8 3F3A 468A C8B3
> 
>     On 03/14/2018 06:31 PM, Niels ten Oever wrote:
>     > Hiya Seb,
>     >
>     > All good? I have a quick question. Do you know how I can import
>     > emaillists that I already have downloaded? In other words, how do I
>     > create csv's of the 33 GB of mailinglists I just harvested :)
>     >
>     > Hope all is well! I think I will be churning on this stuff this night,
>     > so maybe expect some mails later ;) xx
>     >
>     > ~n.,
>     >
> 
> 



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://sudoroom.org/pipermail/bigbang-dev/attachments/20180314/efcd3d14/attachment.sig>


More information about the BigBang-dev mailing list