Ours not to do and die, ours but to reason why.

Distributed System

FreeLauncher: Lossless Failure Recovery of Parameter Servers with Ultralight Replication

Modern distributed machine learning (ML) systems leverage large-scale computing infrastructures to achieve fast model training. For many servers jointly training a model, failure recovery becomes an important challenge when a training task could be accomplished in minutes rather than days. The state-of-the-art checkpointing mechanism cannot meet the need ofefficient recovery for large-scale ML, because its high cost prevents timely checkpointing and a server failure will likely causea substantial loss of intermediate results when the checkpointing intervals are comparable to the entire training times. We proposes FreeLauncher (FLR), a lossless recovery mechanism for large-scale ML which performs ultralight replication (instead of checkpointing) to guarantee all intermediate training results (parameters) to be timely replicated. Our keyinsight is that in the parameter-server (PS) architecture therealready exist multiple copies for each intermediate result notonly in the server but also in the workers, most of which arequalified for failure recovery. FLR addresses the challenges ofparameter sparsity (e.g., when training LDA) and staleness (e.g.,when adopting relaxed consistency) by selectively replicating thelatest copies of the sparse/stale parameters to ensure at leastkup-to-date copies to be existent, which can handle any k−1 failures by relaunching the failed servers with recovered parameters from workers. We implement FLR on Tensorflow. Evaluation results show that FLR achieves lossless failure recovery (almost requiring no recomputation) at little cost.

HotML: A DSM-based Machine Learning System

In big data era, social networks, such as Twitter, Weibo, Facebook, are becoming more and more popular worldwide. To help social networks analysis, many machine learning (ML) algorithms have been adopted, e.g. user classification, link prediction, sentiment analysis, recommendations, etc. However, the dataset could be so large that it might take even days to train a model on a machine learning system. Performance issues should be considered to boost the training process. In this paper, we proposed HotML, a general machine learning system. HotML is designed in the parameter server (PS) architecture where the servers manage the globally shared parameters organized in tabular structure, and the workers compute the dataset in parallel and update the global parameters. HotML supports high-level data abstraction and Map/Reduce-like data flow operations with user-friendly interface, flexible consistency models like SSP, fault tolerance including consistent server-side checkpoint and flexible worker-side checkpoint, and workload balancing. Experimental results show that HotML can reduce networking time by about 74%, and achieve up to 1.9× performance compared to the popular ML system, Petuum.

SPFC: Asynchronous Graph Processing System

The real-world demands of mining big data and smart data of graph structure have led to an active research of distributed graph processing. Many distributed graph processing systems adopt a vertex-centric programming paradigm. In these systems, messages are passed between vertices to propagate the latest states. The communication efficiency and the high overhead of synchronization are two key considerations of these systems. In this work, we propose a Slow Passing Fast Consuming (SPFC) approach which can effectively improve the overall performance of vertex-centric graph processing systems. In our approach, the message passing is slow but the consuming is fast. More specifically, at the message sender side, priority is given to those smart messages which contribute more to the algorithm convergence, and at the message receiver side, messages are consumed right after arriving without any delay and intermediate buffer. Besides, by using a two-phase termination check protocol, the global synchronous barrier can be completely eliminated. In addition, based on the slow message passing strategy, further performance improvement can be achieved with some accuracy loss by eliminating those messages which are less useful for algorithm convergence. We implement our approach based on Apache Giraph and evaluate it on a 12-machine cluster. The experimental results show that our method can effectively reduce the amount of message traffic and achieve up to an order of magnitude performance improvement compared with Giraph and GraphLab.

Machine Learning

BIGAT2VEC: Attributed Bipartite Network Embedding

This work investigates the modeling of attributes along with network structure for representation learning of the bipartite networks. Most of the attributed network representation learning (NRL) works consider the homogeneous type network only. However, these methods, when apply to bipartite type networks, may not be beneficial to learn an informative representation of nodes for predictive analysis. Hence, we propose a BIGAT2VEC framework that examines the internode relationships in the form of direct and indirect relations between two different as well as the same node type of bipartite network to preserve both structure and attribute context. In BIGAT2VEC, learning is enforced on two levels: 1. Direct inter-node relationship between nodes of different type (either through the edge or attribute similarities perspective) by minimizing the probabilities through KL divergence; 2. Indirect inter-node relationship within same node type (either through 2nd order neighborhood proximity and attributes similarities perspective) by employing shallow neural network model through maximizing the probabilities. These two levels are separately optimized, and we leverage its learned embeddings through late fusion to further execute the network mining tasks such as link prediction, node classification (multi-class and multilabel), and visualization. We perform extensive experiments on various datasets and evaluate our method with several baselines. The results show the BIGAT2VEC efficacy as compare to other (non)attributed representation learning methods.