Yo's-
Interesting. A couple of things to keep in mind (speaking from more
than casual knowledge here):
One can start with unclassified material, apply analysis, and produce an
output that itself can become classified material. This is how OSI
works (open-source intelligence, where "open source" means "not
classified sources"). So it is possible that some of what Wikileaks has
produced, may be classifiable material. If it's deemed classified, it
may or may not be advisable to access even if publicly available (for
example if one ever applies for certain kinds of jobs for which one
might be subject to extensive screening questions). In my experience
one should either err on the side of caution or give thought to the
question of how one might defend one's actions if need be.
General rule in intel: Collection is easy, post-processing is tedious,
and analysis is hard. Wikileaks appears to have collected the material
lawfully from released archives, and post-processed it. What Eddan is
calling for here is to add software functionality that simplifies the
analytic task. Adding that functionality to software used for analysis
of material, shouldn't raise any controversies in and of itself,
especially if the software is useful for other tasks aside from
analyzing leaked data dumps.
Analysis requires a certain amount of training and a certain mindset.
By way of training, there is extensive unclassified published literature
we can add to our library if anyone's interested. Some of it is
technique-specific, most of it is general but still useful to read by
way of acquiring certain ways of thinking about data. Anyone else here
with cog sci background may find themselves amused at the degree to
which the US Gov is behind the times in that area: surely we can do
better, and we might consider a project to improve upon gov-recommended
techniques.
The intel mindset is a personality trait that may not be easy to
acquire, but one approximation is a combination of above-average pattern
sense (roughly equivalent to mild paranoia;-) combined with the ability
to doubt your own hypotheses and conclusions, the ability to empathize
with others (such as the subjects of one's inquiries, or at least being
able to approximate an understanding of their own personality traits),
and relentlessly apply Occam and other reasoning tools to sort the wheat
from the chaff. There is a tendency that must be overcome, to get stuck
in a groove of either overestimating or underestimating apparent
patterns in the data. Getting the balance right, is very difficult and
can be refined with training.
And of course one needs to adopt a scientific attitude of objectivity or
at least non-prejudice about one's subject matter, for example one can't
go into an exercise with predetermined notions about the motives of
individuals etc. (In other words, don't jump to conclusions based on
the names "Nixon" and "Kissinger.")
Very often it turns out that the key to an analytic exercise is not
something obvious in and of itself, such as ferreting out a damning
quote from a subject, but rather something that emerges in the
relationships between two or more pieces of data each of which is
unremarkable.
Anyway, if anyone's interested, we can discuss further. Though, this
week I'm majorly busy with work.
laters-
-G.
=====
On 13-04-08-Mon 2:53 AM, Eddan wrote:
now this is really dangerous... Add metadata search
functionality to
Wikileaks-released excessively classified diplomatic cables. Make this
comprehensible to people who aren't foreign policy geeks and the Arab
Spring will have been just warm-up practice.
it seems like it would be very useful to better understand what
Wikileaks means by their reverse engineering of the US gov metadata.
it would be _monumental_ to further enhance this treasure trove with
some natural language search and more sophisticated pattern
recognition. Sudo-Leaks, anyone?
[excerpt from
http://wikileaks.org/plusd/about/]
The Kissinger Cables <http://wikileaks.org/plusd/about/#tkc>
The Kissinger Cables comprise more than 1.7 million US diplomatic
records for the period 1973 to 1976. Dating from January 1, 1973 to
December 31, 1976 they cover a variety of diplomatic traffic including
cables, intelligence reports and congressional correspondence. They
include more than 320,000 originally classified records, including
286,000 full US diplomatic cables. There are more than 12,000
documents with the sensitive handling restriction "NODIS", 'no
distribution', and more than 9,000 labelled "Eyes Only". Full cables
originally classed as "SECRET" total more than 61,000 and
"CONFIDENTIAL" more than 250,000.
The records were reviewed by the United States Department of State's
systematic 25-year declassification process. At review, the records
were assessed and either declassified or kept classified with some or
all of the metadata records declassified. Both sets of records were
then subject to an additional review by the National Archives and
Records Administration (NARA). Once believed to be releasable, they
were placed as individual PDFs at the National Archives as part of
their Central Foreign Policy Files collection. Despite the review
process supposedly assessing documents after 25 years there are no
diplomatic records later than 1976. The formal declassification and
review process of these extremely valuable historical documents is
therefore currently running 12 years late.
The form in which these documents were at NARA was 1.7 million
individual PDFs. To prepare these documents for integration into the
PlusD collection, WikiLeaks obtained and reverse-engineered all 1.7
million PDFs and performed a detailed analysis of individual fields,
developed sophisticated technical systems to deal with the complex and
voluminous data and corrected a great many errors introduced by NARA,
the State Department or its diplomats, for example harmonizing the
many different ways in which departments, capitals and people's names
were spelled. All our corrective work is referenced and available from
the links in the individual field descriptions on the PlusD text
search interface:
https://search.wikileaks.org/plusd. For more
information on what WikiLeaks did to prepare the Kissinger Cables
please see here <http://wikileaks.org/plusd/about/#ptk>.
Not all records from the period 1973-1976 have been obtained. NARA
claims diplomatic records for the period 1973 to 1976 chosen for
content deletion were of a ephemeral character. These records were
identified by the "TAGS" that were attached to them. TAGS ("Traffic
Analysis by Geography and Subject") refers to the content tagging
system implemented by the Department of State for its central foreign
policy files in 1973. There are geographic, organization and subject
TAGS. This system was developed to standardise search terms for
departmental uses and was not static - TAGS were added and deleted as
necessary over time. At review, all cables that only contained
"temporary" TAGS, such as embassy logistical or staffing requests,
were permanently destroyed.
Tens of thousands of documents were irreversibly corrupted in this
data set due to technical errors when the documents were moved as
computer systems were upgraded, or so the US Department of State
claims. This caused the content of the document to be lost, though the
metadata is still available. These are often noted by a error message
in the content of the document. The documents lost in this manner are
most documents from the following periods:
* December 1, 1975 to December 15, 1975
* March 8, 1976 to April 2, 1976
* May 25, 1976 to July 1, 1976
You can see the absence of these weeks by constructing a Timegraph of
"TAGS" as this term occurs in the content of nearly every
document:https://search.wikileaks.org/plusd/graph
Top Secret documents are also not available. During a migration of
records the Department of State printed out all Top Secret documents
for "preservation purposes" and the electronic versions were destroyed
permanently. These documents now only exist as hardcopies and so are
unavailable online in any form, even if declassified.
The documents not deleted either remained classified (or were deemed
unreleasable for other reasons), or were declassified and publicly
released. For the former, a "withdrawal card" was provided giving some
limited metadata about the document, the fields of which that were
decided as releasable vary from document to document. This metadata
provides some information about the document, for example the date and
destination, that can be used for research purposes and also allows a
detailed FOIA request to be made for the document. These FOIA requests
can be directed to NARA's Special Access and FOIA staff. For more
information about this, please see their online guide here
<http://www.archives.gov/foia/foia-guide.html>. You will need the
document number and the To and From information.
There are nine different "Types" of document included in the Kissinger
Cables. The majority are of type "TE" - telegram (cable), which are
official diplomatic messages sent between embassies and the US
Secretary of State conveying official information about policy
proposals and implementation, program activities, or personnel and
diplomatic post operations. From 1973 onwards diplomatic cables were
mostly electronic, therefore most cables made releasable include the
body (content) of the cable. However, the other types of documents are
paper records, including airgrams and diplomatic notes. These are
stored on microfilm (from 1974 onwards, as the Department of State did
not microfilm documents until then) and so were not released with the
full content of the documents, even if marked for public release.
Although the body of the message is not available online the full
index (metadata) is provided for those "P-reel" documents that were
marked for release. Even though the whole document has not been
digitised the metadata is still useful for research purposes and the
documents can be requested under the Freedom of Information Act. For
those documents on P-reel that were not declassified and released a
P-reel "withdrawal card" is provided giving limited metadata. To
access P-reel documents that have a withdrawal card you should follow
the same FOIA procedure as for Telegram withdrawal cards. For the
content of P-reel documents which have been released, the process
depends slightly on which year the document you are requesting was
created, but all requests should be directed
to:archives2reference@nara.gov <mailto:to%3Aarchives2reference@nara.gov>.
_______________________________________________
sudo-discuss mailing list
sudo-discuss(a)lists.sudoroom.org
http://lists.sudoroom.org/listinfo/sudo-discuss