The explosive increase in the number of images on the Internet has brought with it the great challenge of how to effectively index, retrieve, and organize these resources. Assigning proper tags to the visual content is key to the success of many applications such as image retrieval and content mining. Although recent years have witnessed many advances in image tagging, these methods have limitations when applied to high-quality and large-scale training data that are expensive to obtain. In this paper, we propose a novel semantic neighbor learning method based on user-contributed social image datasets that can be acquired from the Web’s inexhaustible social image content. In contrast to existing image tagging approaches that rely on high-quality image-tag supervision, we acquire weak supervision of our neighbor learning method by progressive neighborhood retrieval from noisy and diverse user-contributed image collections. The retrieved neighbor images are not only visually alike and partially correlated but also semantically related. We offer a step-by-step and easy-to-use implementation for the proposed method. Extensive experimentation on several datasets demonstrates that the performance of the proposed method significantly outperforms others.
The distance dynamics model is excellent tool for uncovering the community structure of a complex network. However, one issue that must be addressed by this model is its very long computation time in large-scale networks. To identify the community structure of a large-scale network with high speed and high quality, in this paper, we propose a fast community detection algorithm, the F-Attractor, which is based on the distance dynamics model. The main contributions of the F-Attractor are as follows. First, we propose the use of two prejudgment rules from two different perspectives: node and edge. Based on these two rules, we develop a strategy of internal edge prejudgment for predicting the internal edges of the network. Internal edge prejudgment can reduce the number of edges and their neighbors that participate in the distance dynamics model. Second, we introduce a triangle distance to further enhance the speed of the interaction process in the distance dynamics model. This triangle distance uses two known distances to measure a third distance without any extra computation. We combine the above techniques to improve the distance dynamics model and then describe the community detection process of the F-Attractor. The results of an extensive series of experiments demonstrate that the F-Attractor offers high-speed community detection and high partition quality.
Word embedding has drawn a lot of attention due to its usefulness in many NLP tasks. So far a handful of neural-network based word embedding algorithms have been proposed without considering the effects of pronouns in the training corpus. In this paper, we propose using co-reference resolution to improve the word embedding by extracting better context. We evaluate four word embeddings with considerations of co-reference resolution and compare the quality of word embedding on the task of word analogy and word similarity on multiple data sets. Experiments show that by using co-reference resolution, the word embedding performance in the word analogy task can be improved by around 1.88%. We find that the words that are names of countries are affected the most, which is as expected.
Link prediction is an important task that estimates the probability of there being a link between two disconnected nodes. The similarity-based algorithm is a very popular method that employs the node similarities to find links. Most of these types of algorithms focus only on the contribution of common neighborhoods between two nodes. In sociological theory relationships within three degrees are the strong ties that can trigger social behaviors. Thus, strong ties can provide more connection opportunities for unconnected nodes in the networks. As critical topological properties in networks, nodes degrees and node clustering coefficients are well-suited for describing the tightness of connections between nodes. In this paper, we characterize node similarity by utilizing the strong ties of the ego network (i.e., paths within three degrees) and its close connections (node degrees and node clustering coefficients). We propose a link prediction algorithm that combines topological properties with strong ties, which we called the TPSR algorithm. This algorithm includes TPSR2, TPSR3, and the TPSR4 indices. We evaluate the performance of the proposed algorithm using the metrics of precision and the Area Under the Curve (AUC). Our experimental results show the TPSR algorithm to perform remarkably better than others.
Truth discovery aims to resolve conflicts among multiple sources and find the truth. Conventional methods for truth discovery mainly investigate the mutual effect between the reliability of sources and the credibility of statements. These methods use real numbers, which have a lower representation capability than vectors to represent the reliability. In addition, neural networks have not been used for truth discovery. In this work, we propose memory-network-based models to address truth discovery. Our proposed models use feedforward and feedback memory networks to learn the representation of the credibility of statements. Specifically, our models adopt a memory mechanism to learn the reliability of sources for truth prediction. The proposed models use categorical and continuous data during model learning by automatically assigning different weights to the loss function on the basis of their own effects. Experimental results show that our proposed models outperform state-of-the-art methods for truth discovery.
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models.
Neural networks have been widely used for English name tagging and have delivered state-of-the-art results. However, for low resource languages, due to the limited resources and lack of training data, taggers tend to have lower performance, in comparison to the English language. In this paper, we tackle this challenging issue by incorporating multi-level cross-lingual knowledge as attention into a neural architecture, which guides low resource name tagging to achieve a better performance. Specifically, we regard entity type distribution as language independent and use bilingual lexicons to bridge cross-lingual semantic mapping. Then, we jointly apply word-level cross-lingual mutual influence and entity-type level monolingual word distributions to enhance low resource name tagging. Experiments on three languages demonstrate the effectiveness of this neural architecture: for Chinese, Uzbek, and Turkish, we are able to yield significant improvements in name tagging over all previous baselines.
In this paper, we present a new challenging task for emotion analysis, namely emotion cause extraction. In this task, we focus on the detection of emotion cause a.k.a the reason or the stimulant of an emotion, rather than the regular emotion classification or emotion component extraction. Since there is no open dataset for this task available, we first designed and annotated an emotion cause dataset which follows the scheme of W3C Emotion Markup Language. We then present an emotion cause detection method by using event extraction framework, where a tree structure-based representation method is used to represent the events. Since the distribution of events is imbalanced in the training data, we propose an under-sampling-based bagging algorithm to solve this problem. Even with a limited training set, the proposed approach may still extract sufficient features for analysis by a bagging of multi-kernel based SVMs method. Evaluations show that our approach achieves an F-measure 7.04% higher than the state-of-the-art methods.
This paper presents a survey of image synthesis and editing with Generative Adversarial Networks (GANs). GANs consist of two deep networks, a generator and a discriminator, which are trained in a competitive way. Due to the power of deep networks and the competitive training manner, GANs are capable of producing reasonable and realistic images, and have shown great capability in many image synthesis and editing applications. This paper surveys recent GAN papers regarding topics including, but not limited to, texture synthesis, image inpainting, image-to-image translation, and image editing.
Managing software packages in a scientific computing environment is a challenging task, especially in the case of heterogeneous systems. It is error prone when installing and updating software packages in a sophisticated computing environment. Testing and performance evaluation in an on-the-fly manner is also a troublesome task for a production system. In this paper, we discuss a package management scheme based on containers. The newly developed method can ease the maintenance complexity and reduce human mistakes. We can benefit from the self-containing and isolation features of container technologies for maintaining the software packages among intricately connected clusters. By deploying the SuperComputing application Strore (SCStore) over the WAN connected world-largest clusters, it proved that it can greatly reduce the effort for maintaining the consistency of software environment and bring benefit to achieve automation.
Time headway is an important index used in characterizing dangerous driving behaviors. This research focuses on the decreasing tendency of time headway and investigates its association with crash occurrence. An autoregressive (AR) time-series model is improved and adopted to describe the dynamic variations of average daily time headway. Based on the model, a simple approach for dangerous driving behavior recognition is proposed with the aim of significantly decreasing headway. The effectivity of the proposed approach is validated by means of empirical data collected from a medium-sized city in northern China. Finally, a practical early-warning strategy focused on both the remaining life and low headway is proposed to remind drivers to pay attention to their driving behaviors and the possible occurrence of crash-related risks.
The Extreme Learning Machine (ELM) is an effective learning algorithm for a Single-Layer Feedforward Network (SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical applications, its performance might be affected by the noise in the training data. To tackle the noise issue, we propose a novel heterogeneous ensemble of ELMs in this article. Specifically, the correntropy is used to achieve insensitive performance to outliers, while implementing Negative Correlation Learning (NCL) to enhance diversity among the ensemble. The proposed Heterogeneous Ensemble of ELMs (HE2LM) for classification has different ELM algorithms including the Regularized ELM (RELM), the Kernel ELM (KELM), and the L2-norm-optimized ELM (ELML2). The ensemble is constructed by training a randomly selected ELM classifier on a subset of the training data selected through random resampling. Then, the class label of unseen data is predicted using a maximum weighted sum approach. After splitting the training data into subsets, the proposed HE2LM is tested through classification and regression tasks on real-world benchmark datasets and synthetic datasets. Hence, the simulation results show that compared with other algorithms, our proposed method can achieve higher prediction accuracy, better generalization, and less sensitivity to outliers.
Tor is pervasively used to conceal target websites that users are visiting. A de-anonymization technique against Tor, referred to as website fingerprinting attack, aims to infer the websites accessed by Tor clients by passively analyzing the patterns of encrypted traffic at the Tor client side. However, HTTP pipeline and Tor circuit multiplexing techniques can affect the accuracy of the attack by mixing the traffic that carries web objects in a single TCP connection. In this paper, we propose a novel active website fingerprinting attack by identifying and delaying the HTTP requests at the first hop Tor node. Then, we can separate the traffic that carries distinct web objects to derive a more distinguishable traffic pattern. To fulfill this goal, two algorithms based on statistical analysis and objective function optimization are proposed to construct a general packet delay scheme. We evaluate our active attack against Tor in empirical experiments and obtain the highest accuracy of 98.64%, compared with 85.95% of passive attack. We also perform experiments in the open-world scenario. When the parameter k of k-NN classifier is set to 5, then we can obtain a true positive rate of 90.96% with a false positive rate of 3.9%.
With the rapid development of pervasive intelligent devices and ubiquitous network technologies, new network applications are emerging, such as the Internet of Things, smart cities, smart grids, virtual/augmented reality, and unmanned vehicles. Cloud computing, which is characterized by centralized computation and storage, is having difficulty meeting the needs of these developing technologies and applications. In recent years, a variety of network computing paradigms, such as fog computing, mobile edge computing, and dew computing, have been proposed by the industrial and academic communities. Although they employ different terminologies, their basic concept is to extend cloud computing and move the computing infrastructure from remote data centers to edge routers, base stations, and local servers located closer to users, thereby overcoming the bottlenecks experienced by cloud computing and providing better performance and user experience. In this paper, we systematically summarize and analyze the post-cloud computing paradigms that have been proposed in recent years. First, we summarize the main bottlenecks of technology and application that cloud computing encounters. Next, we analyze and summarize several post-cloud computing paradigms, including fog computing, mobile edge computing, and dew computing. Then, we discuss the development opportunities of post-cloud computing via several examples. Finally, we note the future development prospects of post-cloud computing.
Retrieving the most similar objects in a large-scale database for a given query is a fundamental building block in many application domains, ranging from web searches, visual, cross media, to document retrievals. State-of-the-art approaches have mainly focused on capturing the underlying geometry of the data manifolds. Graph-based approaches, in particular, define various diffusion processes on weighted data graphs. Despite success, these approaches rely on fixed-weight graphs, making ranking sensitive to the input affinity matrix. In this study, we propose a new ranking algorithm that simultaneously learns the data affinity matrix and the ranking scores. The proposed optimization formulation assigns adaptive neighbors to each point in the data based on the local connectivity, and the smoothness constraint assigns similar ranking scores to similar data points. We develop a novel and efficient algorithm to solve the optimization problem. Evaluations using synthetic and real datasets suggest that the proposed algorithm can outperform the existing methods.