Written by Karen E. Thuermer
MIT 2012 Volume: 16 Issue: 5 (June)
Social media offers great opportunities for understanding the pulse of a population: reactions to events, opinions on important issues, political sentiment, calls for protests and much more. Social media also provides early alerts for defense, intelligence and homeland security analysts about potential crises such as the next Arab Spring, military conflict or natural disaster.
If there has been a recent event, and individuals who live in a certain region are now angry at U.S. troops, the extent of their anger or displeasure can increase or decrease over time. Knowledge of those ebbs and flows could help improve the safety of those troops.
“Those of us watching on TV may think it is not so bad, whereas social media may show some very influential individuals with negative perceptions who are influencing others to think as they think,” said Rebecca Garcia, director, SAS Federal National Security Group. “This could jeopardize the safety of U.S. personnel if they are not aware of this line of thinking.”
But the sheer volume of the data can make it difficult to process and analyze. “Additionally, the amount of noise in the data—information irrelevant to the problem at hand—can be staggering,” said Dr. Robert McCormack, associate director of the analytics, modeling and simulation division at Aptima.
Disentangling key memes of interest from the ocean of noise is a difficult undertaking. The overwhelming profusion of user-generated, publicly accessible content, like that from tweets, on blogs and in many online communities, demands an automated solution.
Enter advanced analytic technologies. These technologies help find important topics and trends and help those who have a need to know understand their impact on the population.
There are obvious global applications for social data analytics, as was evident with events last year in Egypt and Libya, as well as the natural disasters in Japan and Haiti. The first global news about the breach of Osama bin Laden’s Pakistan compound came from a neighbor’s tweets. Social media networks have provided original on-thescene reporting of planned protests, demonstrations and operations.
“This technology’s potential to harness the ocean of publically available information on the Internet makes it particularly useful in social media applications,” commented Sean Love, geospatial business development director for Northrop Grumman. “Being able to hone in on specific information on a specific topic, without having to wade through petabytes of data, saves a significant amount of time and allows end users to focus the majority of their time on the mission instead of on data mining.”
Such analytical technology must effectively manage social media data in all its forms, be it structured, unstructured and/or semi-structured, including both video and audio content.
“For military and intelligence applications, the same needs apply—all the way from the military recruiter, who finds publically available data on recruitment issues important, to the frontline soldiers who want to know what the current sentiment is toward U.S. military presence in a specific town or region,” said Tony Jimenez, president and chief executive officer of MicroTech. “Social media data requires analysis that is often beyond the capability of an individual or even a group.”
The issue is sifting through the plethora of data to get to actionable information.
In addition to public or external conversation information for operations, internal operations can benefit from social media analytics as well. The Pew Research Center’s Internet and American Life Project now indicates that 65 percent of all adult Internet users are now using social networking sites.
“The military is a very large operation and could certainly incorporate social media into reaching out and engaging with service personnel via this medium that is now becoming so prolific,” Jimenez remarked. “Analysis of service personnel concerns, trends and issues, in the proper mindset, could yield far greater efficiencies and mission success.”
Consequently, social media analytics provides yet another opportunity for increased efficiency and support of operations with information discovery from the wealth of publicly available data.
A number of companies currently offer advanced analytic technologies for social media.
Northrop Grumman, for example, offers a tool that uses algorithms to search through publicly available information and then narrows that data into predetermined subjects, categories and other criteria. “That information is then sorted, providing the end-user with data that is relevant, focused and manageable,” said Love.
Northrop Grumman’s tool is designed to alert officials of potential crises, conflicts and social trends.
Aptima is developing a technology called Epidemiological Modeling of the Evolution of Messages (E-MEME), which combines advanced analytic techniques from natural language processing (NLP) with core concepts from epidemiological modeling. E-MEME applies NLP methods to scour large sets of Internet data sources and documents, extracting the key memes and topics propagating through blogs, news sites and real-time social platforms like Twitter. These techniques are used to characterize and quantify topics being discussed, such as “protests” and “elections.”
Mathematical epidemiological models plot how such ideas proliferate and spread among populations both geographically and over time. “Epidemiology provides us a starting point for understanding the problem, as well as a wealth of models and techniques for formally analyzing the data,” McCormack said.
On one level, McCormack explained, the aim of E-MEME is to provide the intelligence analyst with better information on the current situation of interest based on what is happening in social media, blogs and news. “If they are interested in protests, for example, E-MEME will provide prevalence of that topic in the media broken down by several dimensions, such as locations, groups or media type,” he said.
In addition, E-MEME provides information on past trends on topics, allowing an analyst to see, for example, if talk of protests in a particular location is on the rise. “Beyond that, the epidemiologically based models will provide the ability to measure susceptibility of different populations to various memes, based on historical data and other factors,” he said.
Additionally, intelligence analysts will be able to perform “what if” analyses, such as measuring the potential spread of memes or the likelihood that a particular region will adopt an idea.
MicroTech, which offers solutions to establish an effective social media practice, has found it helpful to offer scalable social media solutions in several different sizes and configurations that address the wide array of needs and requirements across government agencies, using a number of different hardware/software apps.
“Social Recon Mobile offers essential social media capabilities and includes software and hardware on a portable, easily transferable cart for rapid deployment and virtually instant social media mining capability,” Jimenez said.
Social Recon MicroPodd includes an accompanying mobile MicroPodd component that affords greater storage and more capability. This option offers a plug-in solution to existing infrastructure.
“Analysts can easily monitor and track what you deem important from their current locations and workstations,” he said.
Social Recon MicroCenter is a permanent solution, custom built onsite, with additional social mining capability that allows a deep dive across the social media community.
“As data centers continue to be virtualized, consolidated and made more efficient, this option affords a decided competitive advantage to those leveraging their own facilities for the creation of the social media functionality,” Jimenez said.
Lastly, Social Media as a Service (SMaaS) offers a hosted solution that is unique from other MicroTech solutions. SMaaS can be tailored to fit the needs of an organization and the functionality needed, be it indepth search and discovery, concept analysis, targeted analytics, and/or system alerting—all on specific topics and issues of interest.
“It’s particularly useful if you’re moving more toward an IT management strategy that allows for maximum flexibility, or you’re unable to make an investment in new equipment,” Jimenez said. “We offer analytics services aimed at providing a detailed electronic narrative with reporting on a daily, weekly or monthly basis, highlighting topics and issues of interest to you.”
The MicroTech Social Recon products manage and parse through data in all its digital formats. This includes topics and related searches done without a requirement for manual tagging, and able to overcome linguistic and language issues presented through the increasingly interconnected world. “For example, people often use different words (different semantics and syntax) to express the same idea,” Jimenez explained.
This problem becomes especially pronounced in a social media environment like Twitter, where the language is more conversational, replete with familiar expressions, slang and varying emotional undertones like sarcasm, excitement and disappointment, and stated so briefly that context is difficult to discern. The issue can be especially challenging in multilingual countries where online data can be in a number of other languages.
“Our Social Recon analysis tools provide results that are understandable and actionable,” he continued.
The tools can immediately provide contact with those who raise concerns, as well as permit engagement with them via the same social media tool with which they used to comment or discuss a topic on the social web.
“Likewise, those who offered incorrect or negative comments can also be contacted using our Social Recon tools and become engaged in a dialogue on whatever issues arise,” Jimenez said.
The tool can also identify cluster areas where a popular belief may be incorrect or there may be a proliferation of misinformation.
SAS Social Media Analytics (SMA) provides ways to look at specific topics of interest, decrease the amount of irrelevant information, and include the sentiment of an individual or millions of people. The tool can take information from any number of blogs, Twitter, Facebook or other publicly available social media sites of interest. Queries for specific topics or keywords can be set by the analyst, and the tool will continue to provide information 24 hours a day.
“When the analyst arrives at work they have new, up-to-the minute information and continue to receive updates throughout the day,” Garcia said.
The SMA solution offered by SAS also allows for multiple individuals to interact with the data based on similar areas of interest. Analysts can further manage the data being received through tools that can refine searches on the fly as they see information that is more or less relevant to their needs.
“There is also the capability to geo-locate the information,” Garcia added. “SAS is partnering with AGI to provide geospatial information to users based on the location of the social media user. This can be a critical asset to the warfighter when trying to assess a threat to troops or rescue someone who is in trouble and is unable to utilize traditional communication channels.”
SAS can analyze sentiment in 28 languages natively, with the 29th language, Farsi, in Beta testing. Languages are not translated into English but are assessed in their native form, which provides much more accurate sentiment scoring. “This is critical when assessing possible threats, since changes in mood can be subtle,” Garcia explained.
SAS is working with existing customers to build mood states for those who need to know when subtle changes occur.
“It’s rare for a person to go from very positive to very negative sentiment based on a single event,” she remarked. “So mood states allow for assessment of changes in opinion or feeling towards a topic over a period of time. This can help personnel in other countries be better informed about how specific behaviors or actions could create a positive or negative response among the civilian population.”
The goal would be for military members to have more positive interactions with civilians based on greater insights into their culture or based on past reactions to similar interactions.
Open Source Pitfalls
The primary advantage of open source data is the rate at which it refreshes. New information is constantly available. By the same token, the sheer amount of available data is a challenge.
“While technologies are being developed to ‘slim down’ how much data an end user is faced with, the data set is growing exponentially every year so those technologies must adapt to keep pace with that,” Love said.
Additionally, given inequalities in access to technology, social media does not necessarily provide a representative picture of the population at large. Some of the specific issues currently being addressed in the research community include analysis of multiple foreign languages and the unique idiosyncrasies of particular types of social media.
With respect to analysis of foreign languages, at a basic level the statistical techniques used for deriving topics are language independent.
“But, there are definitely difficult issues that arise when dealing with foreign languages,” said McCormack. “Tools like Google Translate and Yahoo! Babel Fish can give you a rough sense of the discussion, but fail to convey the more subtle nuances of more idiomatic languages.” This is an active area of research across the NLP community.
Spelling and lexical variation across different forms of media also poses a significant challenge. In Twitter especially, misspellings, abbreviations and stylistic spelling variations all make standard normalization techniques difficult. Automated clustering techniques become necessary in this case.
Garcia adds that there are other issues as well, such as how individuals can create new identities on blogs, Twitter or other sites. Individuals or groups can mask their identity and location based on security settings.
“Anyone can say anything about any person or subject, and it does not have to be accurate or true,” she said. “This type of information source requires confirmation and careful assessment of possible impacts if the comments are found to be even partially untrue.”
There are also the challenges of perception. Many individuals can witness an event and perceive very different things based on their angle of observation and personal bias.
Since social media is a forum where there is no real filter for bias, angle of observation or desire to mislead, Garcia noted, such a powerful tool must be used prudently. The analyst must make value judgments based on his or her experience, understanding and knowledge.
Social media is one data source, and is not more definitive than any other single source of data. It may be less definitive, depending on the reliability of the individual who is providing the information.
“Since that could be anyone in the world, the veracity of comments will likely be as divergent as the honesty of each individual on the planet, and still relies on our ability to correctly interpret the message,” she said.
Over the next five years, there will be a large number of new tools and approaches to leveraging that ever-increasing data set as more and more customers latch on to social media exploitation as a viable means for information gathering and analysis, Love predicts.
Jimenez contends that mobile and social applications will continue to grow and devices with increased capabilities will proliferate. “Augmented reality capabilities, such as geographical knowledge augmentation—where for example you can hold your phone up and see what stores, restaurants, and/or installations are in a certain direction— exist now, but they will become far more accurate and useful as the industry matures and evolves,” he said.
Social media is also starting to penetrate the enterprise. Organizations are implementing social communication tools both internally and externally in an effort to be better informed and break down silos that hinder growth and efficiencies. Organizations experiencing demographic changes and shifts to younger generations have already adopted these types of tools as a method to engage and communicate in ways that these individuals have already adapted and understand.
McCormack contends that as the Department of Defense and intelligence community move into more open source analytics, there will be an increase in demand for advanced analytics capable of answering both strategic and tactical questions.
“In terms of technology, we’ll start seeing increases in the use of distributed and cloud computing for dealing with the massive amounts of real-time streaming data,” McCormack added. “Adapting the analytic techniques, from the statistical language models to the dynamic trend analysis models, to these environments will likely be an active area of research.”
Finally, a lot of current work is focused on retrospective analysis of events in social media (such as the Arab Spring), due to the nascent analytical techniques.
“The true test of these tools in the next five years will be to see if they can usefully predict trends in social media before they become yesterday’s news,” he said. ♦