SNDM Project

Brief description

The main goal of the project was to investigate the approaches to mine, monitor and analyze the communications between the employees of a given company. The project resulted in the development of a prototype of a DLP-like system. The main functions of the prototype include mining of the information flow, building the users' profiles and their classification, and detection of unusual events inside the community.


The data collected from the corporate intranet instant messenger was used as an input.
Main characteristics of the dataset:

Some results

Messaging intensity statistics

The average daily message flow looks like this:

SNDM average inraday communication frequency

Using this chart as the baseline several text-based patterns can be revealed by indicating the subsets of specific messages. For example, adding to the chart above the statistics specifically for greetings and farewells, we got the following chart:

SNDM average inraday greetings communication frequency
Three peaks of "hello's" revealed the three shifts working cycle of the given organization

The search for the words "yesterday" and "tomorrow" gave the following chart:

SNDM average inraday frequency of communications about yesterday/tomorrow

Behaviour segmentation

Using the intraday activity curve the clusterisation of users was done, revealing two groups of users: the core "typical" users and the anomalous ones (with big deviations in the behaviour).

SNDM the core users and the anomalous ones
The closer examination brought out that everybody from anomalistic group was either IT-specialist or internal security department employee :)

Communication graph microclustering

As the next step of links thresholding was done, revealing the narrow communication circle of each user. Overlaps of such circles could form the stable closures (such as couples or more complex groups of users) or bring out some one-way dependencies between users.

The additional colorization of users was done based on the user's gender predicted by the naive bayes built on the basis of text and behaviour factors (the measured precision was about 89% while the recall was about 68%).

Stable closures

SNDM microclusters stable closures
Most of the revealed pairs were later classified by the community expert as married couples or people bound together by friendship or other out of office relationships.

One-way dependencies

SNDM microclusters  one-way dependencies

Microclusters dynamics

On the next step, the special multilayered layout of sequenced time slices of the communication graph was done to visualize the dynamics of the structures.

SNDM multilayered layout of sequenced time slices

This technique was helpful for investigating of the processes of couple formation and disruption.

Information percolation

Several approaches to cluster the users based on the percolation processes on the communication graph were studied.. Thus, several main clusters of information exchange were determined along with most important members of each of them.

SNDM Information percolation

Individual rhythms of communication

The usage of slightly modified authorlines formalism helped to discover several different communication styles among given set of users. Here are some of them:

Work communication

SNDM working rhythms
Key features:

Personal communication

SNDM personal rhythms
Key features:

"Search for a partner"

SNDM Search for a partner rhythms
Key features:

Nonreciprocal interest

SNDM Nonreciprocal interest
Key features:

Overall results