Jackie Li

Emoveo– Declasification Software

Proper access and restriction to information has been more important than ever. With the 113th Congress resurfacing the Freedom of Information Act (which they failed to revise) last December, the availability of proper access to government documents was brought to light. In a traditional sense, redaction has either been too much or too little which ends up being problematic in the security sense but also in an ethical sense of keeping the government responsible. Emoveo attempts at redaction while maintaining a proper balance of information such that the end product is still readable and sanitized.

The software will be a perl script with a built in methodology that the user follows. The user will be presented with keywords for the document and be able to redact document information that are semantically and contextually related to the keywords. The script attempts to use quick content analysis to help the user identify the parts of text to redact (paragraphs, sentence parts). The user can choose to eliminate entire sections or just parts of it. The program also has built in automation to quickly identify and remove proper nouns, phrases, country names, etc. The document starts in a text format and ends up in a redacted rtf document so that metadata is not stored in the document that might reveal what the redacted texts might be.

Machine learning will be necessary to categorize syntax structure. Because the text will vary in size and genre, the accuracy rate, removal of words that don’t provide information is a difficult problem to solve. This relies heavily on perceived social and cultural ideas on what words mean and how they fit into context. The program relies heavily on content and syntax analysis to perform well so that users will be able to perform the semantic and pragmatic analyses. I balance automation with user control in order to create a method that allows peers to review each redaction step as well as scrutinize the relatedness of the redacted text to the original text.

Bio:

My name is Jackie Li and my primary interest is implementation of language and security in various technologies (forensics (e-discovery), steganography, machine translation). I focus most of my language work on Chinese, Korean, and Japanese. These language have currently developed into a niche CJKV information processing field. I am also interested in Celtic languages mainly Irish Gaelic, Scots Gaelic, and Welsh. While I have studied Breton, Cornish, and Manx, I’m intrigued by the historical changes which have occurred in Irish, Scottish, and Welsh. I have had experience working at IT help desks and labs at the George Washington University. My jobs have allowed me to attempt my own programs from Arabic name identification, file converters, to identifying morphology of East Asian languages.

Project Video:

Documentation: