Research Program

Clique conducts research on the themes of flow, communities, anomalous structure and centrality. The research programme has three research strands. The central research strand addresses the development of new techniques for the analysis and visualisation of network data. In addition we have two application research strands tied to biological and social network data. For a detailed overview of our research interests, please download our summary document

Importantly for Clique, the research programme is driven by a set of challenges identified by our three industrial partners. Our industrial partners provide access to voluminous data and also information about the characteristics of real data. On the biological side our collaborators in the UCD Conway Institute and in the Krogan Lab at the University of California-San Francisco, provide similar access and expertise. In the coming years, research in network data analysis will be transformed by access to large-scale dynamic data resources.

Clique represents a research programme in network analysis that addresses issues in internet services, fraud detection and bioinformatics. Our specific work package details include:

WP 1.1: Matrix Analysis Techniques

The fundamental observation that drives social network analysis and which connects all the research challenges is that relationships between data entities are critically important to understand the interesting characteristics and features of the data. Thus, network or graph representations of the data and graph theoretic tools to analyse these representations are critically important to this research programme. This work-package focuses on the development of fundamental network analysis tools

WP 1.2: Probabilistic Techniques

An alternative to the matrix decomposition approach to network analysis addressed in WP1.1 are the probabilistic techniques coming from statistics. It is easier to handle complex forms of data in the probabilistic paradigm, but scaling these techniques to work with large networks is a challenge.

WP1.3: Visualisation

The interactive visual representation of abstract data, to aid in human exploration and understanding of it, is a key research challenge in bioinformatics and social network analysis. Network Visualisation is concerend with the sourcing, management, layout, drawing, viewing and interaction with relational data. Visualisation relies on a human to guide the application of methods, structuring of queries and control of the interaction in the pursuit of understanding. The focus of this workpackage is on the development of the fundamental algorithms, methods and interaction techniques required to visualise large and dynamic networks possessing latent structure. Such structures include centrality, flow, communities or anomalous structure.

WP1.4: Data Representation and Management

To research and analyze social networks and their graph properties, access to large-scale collections of real-world data is essential. This information can be collected directly if access to the data source exists (via an API or an exported data collection), but more frequently it will have to be gathered and indexed by “crawling” Web sources. Data collection for this project will focus on building graph representations which will involve maintaining and indexing, not only the information, but also the links between information entities and the semantic properties of these links. This will entail anonomizing data provided by the industrial partners and collecting datasets from the internet. Concerns about security, robustness, privacy and trustworthiness are intimately tied into issues of what information to store in social-networking data-bases and what information within these databases to make publically available. It therefore makes sense to examine these questions in tandem with data gathering and hence these issues will also be addressed in this workpackage.

WP2.1. Modelling information flow

Modelling the mechanisms for information flow in human communication networks is the major focus of this WP. Bulletin boards, blogs and wikis are part of the new generation of community-based Web applications over which ideas, opinions and knowledge are shared, argued about, defended and refined at an unprecedented scale. While Web technologies underpin these ‘information ecosystems’, we lack a formal understanding of the relationships, social structures and behaviours that allow some ideas to emerge and grow at the expense of others. This is crucial if online communities are to be developed and used as sources of commercial and social value in the knowledge-based economy.

WP2.2. Discovering structure in large networks

The real-world networks of interest to our industrial partners are typically very large, containing 10^6–10^9 nodes. Methods to deal with such large networks are the major theme of this workpackage. The research carried out in this work-package will feed off and into research in WP1.1, WP1.2, WP2.1 and WP2.3. The goals of the work-package may be summarised as the 1) development of efficient algorithms for community-finding in social networks of > 10^6 nodes; 2) development of parallel algorithms for community-finding; 3) incorporation of attribute information into large-scale community-finding algorithms; and 4) development of incremental methods for efficiently updating communities identified in dynamically changing graphs.

WP2.3. Discovering anomalous structure

Discovering remarkable or anomalous structure is a key research challenge in the analysis of financial data and in recommender and opinion-based systems. In the analysis of biological networks the identification of unusual motifs that are the basic modules of information processing is of great interest. In social networks and in the social web the discovery of network structure that is in some way false of fraudulent has also attracted a lot of research interest.

WP3.1. Challenges in biomolecular interaction networks

Large genomic datasets have predictive power to identify biological classes, but the number and quality of such sets is expected to increase dramatically as new technologies (e.g. deep sequencing, quantitative mass spectrometry) are applied in model organisms and clinical settings. For example, gene expression patterns have been used to predict cancer types or prognosis, while gene expression matrices obtained in different conditions have been used to classify genes. Matrices may be rectangular and asymmetric, e.g. of genes by conditions, or of genes by promoter motifs, or they may be square and symmetric, such as protein physical interactions, or gene functional pairwise interactions, and these are particularly informative for network structure

Social Network Analysis Group

Search

User login

Subscribe to Clique

Sponsors