Homeland security and law enforcement agencies generate intelligence using manual or electronic surveillance. However, they face challenges, such as:

  • Large volume of raw data to sift through
  • Unreliable quality of Humint (Human Intelligence)
  • No real-time analysis to generate actionable intelligence

There is a perpetual debate about better way of collecting intelligence. Human intelligence and electronic intelligence, both have strengths and weaknesses. However, in situations where target has to be identified by its electronic communication, there is hardly a choice. The difficulty though, is that the amount of electronic data is massive and it's in the form of text (unstructured data) which has to be reviewed by humans and analyzed. This difficulty has paralyzed security agencies for many years.


Octillion solutions using artificial intelligence can identify a suspicious source using advanced unstructured data analytics techniques.

 

Sentiment Detection

Detecting sentiments in text is useful for a number of significant applications, such as determining citizens’ attitudes with respect to government policies or other public issues, especially tracking these attitudes or opinions as they change over time.  Commercial application can detect whether consumers have favorable or unfavorable opinions of certain products in the market, which helps to predict sales and measure effectiveness of advertising campaigns. 

Sentiment or attitude detection applies to any data base or stream of textual information:, especially user-generated content, such as blogs, web pages, bulletin boards, emails, chat rooms, etc.  The process for detecting attitudes and sentiments requires three basic steps:

  1. Collecting a training corpus of texts, human annotated as to their sentiment
  2. Building a set of patterns associated with positive, negative and neutral sentiments
  3. Training a statistical machine learning system, possibly augmented with a suite of interpretation rules that can classify new texts into the desired categories.

 

Most previous work in sentiment detection has skipped the second step and essentially used words instead of patterns.  For instance, words such as “great”, “desirable”, “better” are associated with positive sentiments, and words such as “awful”, “terribly”, “dangerous” have been associated with negative sentiments.  However, this simplistic approach, avoiding language analysis, has proven unreliable.  A text can say that a policy is “not at all desirable” (negative sentiment), or a product is “terribly good” (positive sentiment), which cannot be predicted from simple word lists.  Better is to extract patterns, such as: “not *3 <pos-word>” à negative sentiment (*3 = up to 3 words, and <pos-adjective> = any adjective in the positive list).  Our approach uses both patterns and words (the former with greater weight) to improve performance.

We also differ from earlier approaches in terms of measuring possible levels of sentiment (e.g. strongly negative vs. weakly negative), and in terms of tracking changes over time (e.g. opinion is gradually shifting towards the positive).

We have also found that for practical applications we need to combine sentiment detection with topic detection.  If a stream of texts contains mixed attitudes, it could be that the positive ones refer to one topic (or product or policy) and the negative ones about a different one.  Topic detection is a well-studied field also addressed by statistical learning methods, with which Carnegie Mellon University and Octillion Technology are very familiar.