now this is really dangerous... Add metadata search functionality to
Wikileaks-released excessively classified diplomatic cables. Make this
comprehensible to people who aren't foreign policy geeks and the Arab
Spring will have been just warm-up practice.
it seems like it would be very useful to better understand what Wikileaks
means by their reverse engineering of the US gov metadata. it would be
_monumental_ to further enhance this treasure trove with some natural
language search and more sophisticated pattern recognition. Sudo-Leaks,
anyone?
[excerpt from
http://wikileaks.org/plusd/about/]
The Kissinger Cables <http://wikileaks.org/plusd/about/#tkc>
The Kissinger Cables comprise more than 1.7 million US diplomatic records
for the period 1973 to 1976. Dating from January 1, 1973 to December 31,
1976 they cover a variety of diplomatic traffic including cables,
intelligence reports and congressional correspondence. They include more
than 320,000 originally classified records, including 286,000 full US
diplomatic cables. There are more than 12,000 documents with the sensitive
handling restriction "NODIS", 'no distribution', and more than 9,000
labelled "Eyes Only". Full cables originally classed as "SECRET" total
more
than 61,000 and "CONFIDENTIAL" more than 250,000.
The records were reviewed by the United States Department of State's
systematic 25-year declassification process. At review, the records were
assessed and either declassified or kept classified with some or all of the
metadata records declassified. Both sets of records were then subject to an
additional review by the National Archives and Records Administration
(NARA). Once believed to be releasable, they were placed as individual PDFs
at the National Archives as part of their Central Foreign Policy Files
collection. Despite the review process supposedly assessing documents after
25 years there are no diplomatic records later than 1976. The formal
declassification and review process of these extremely valuable historical
documents is therefore currently running 12 years late.
The form in which these documents were at NARA was 1.7 million individual
PDFs. To prepare these documents for integration into the PlusD collection,
WikiLeaks obtained and reverse-engineered all 1.7 million PDFs and
performed a detailed analysis of individual fields, developed sophisticated
technical systems to deal with the complex and voluminous data and
corrected a great many errors introduced by NARA, the State Department or
its diplomats, for example harmonizing the many different ways in which
departments, capitals and people's names were spelled. All our corrective
work is referenced and available from the links in the individual field
descriptions on the PlusD text search interface:
https://search.wikileaks.org/plusd. For more information on what WikiLeaks
did to prepare the Kissinger Cables please see
here<http://wikileaks.org/plusd/about/#ptk>
.
Not all records from the period 1973-1976 have been obtained. NARA claims
diplomatic records for the period 1973 to 1976 chosen for content deletion
were of a ephemeral character. These records were identified by the "TAGS"
that were attached to them. TAGS ("Traffic Analysis by Geography and
Subject") refers to the content tagging system implemented by the
Department of State for its central foreign policy files in 1973. There are
geographic, organization and subject TAGS. This system was developed to
standardise search terms for departmental uses and was not static - TAGS
were added and deleted as necessary over time. At review, all cables that
only contained "temporary" TAGS, such as embassy logistical or staffing
requests, were permanently destroyed.
Tens of thousands of documents were irreversibly corrupted in this data set
due to technical errors when the documents were moved as computer systems
were upgraded, or so the US Department of State claims. This caused the
content of the document to be lost, though the metadata is still available.
These are often noted by a error message in the content of the document.
The documents lost in this manner are most documents from the following
periods:
- December 1, 1975 to December 15, 1975
- March 8, 1976 to April 2, 1976
- May 25, 1976 to July 1, 1976
You can see the absence of these weeks by constructing a Timegraph of
"TAGS" as this term occurs in the content of nearly every document:
https://search.wikileaks.org/plusd/graph
Top Secret documents are also not available. During a migration of records
the Department of State printed out all Top Secret documents for
"preservation purposes" and the electronic versions were destroyed
permanently. These documents now only exist as hardcopies and so are
unavailable online in any form, even if declassified.
The documents not deleted either remained classified (or were deemed
unreleasable for other reasons), or were declassified and publicly
released. For the former, a "withdrawal card" was provided giving some
limited metadata about the document, the fields of which that were decided
as releasable vary from document to document. This metadata provides some
information about the document, for example the date and destination, that
can be used for research purposes and also allows a detailed FOIA request
to be made for the document. These FOIA requests can be directed to NARA's
Special Access and FOIA staff. For more information about this, please see
their online guide here <http://www.archives.gov/foia/foia-guide.html>. You
will need the document number and the To and From information.
There are nine different "Types" of document included in the Kissinger
Cables. The majority are of type "TE" - telegram (cable), which are
official diplomatic messages sent between embassies and the US Secretary of
State conveying official information about policy proposals and
implementation, program activities, or personnel and diplomatic post
operations. From 1973 onwards diplomatic cables were mostly electronic,
therefore most cables made releasable include the body (content) of the
cable. However, the other types of documents are paper records, including
airgrams and diplomatic notes. These are stored on microfilm (from 1974
onwards, as the Department of State did not microfilm documents until then)
and so were not released with the full content of the documents, even if
marked for public release. Although the body of the message is not
available online the full index (metadata) is provided for those "P-reel"
documents that were marked for release. Even though the whole document has
not been digitised the metadata is still useful for research purposes and
the documents can be requested under the Freedom of Information Act. For
those documents on P-reel that were not declassified and released a P-reel
"withdrawal card" is provided giving limited metadata. To access P-reel
documents that have a withdrawal card you should follow the same FOIA
procedure as for Telegram withdrawal cards. For the content of P-reel
documents which have been released, the process depends slightly on which
year the document you are requesting was created, but all requests should
be directed to:archives2reference@nara.gov.