IEEE 2018 : Natural Language Process
IEEE 2018: A Framework for Real-Time Spam Detection in Twitter
Abstract: With the increased popularity of online
social networks, spammers find these platforms easily accessible to trap users
in malicious activities by posting spam messages. In this work, we have taken
Twitter platform and performed spam tweets detection. To stop spammers, Google
SafeBrowsing and Twitter’s BotMaker tools detect and block spam tweets. These
tools can block malicious links, however they cannot protect the user in
real-time as early as possible. Thus, industries and researchers have applied
different approaches to make spam free social network platform. Some of them
are only based on user-based features while others are based on tweet based
features only. However, there is no comprehensive solution that can consolidate
tweet’s text information along with the user based features. To solve this
issue, we propose a framework which takes the user and tweet based features
along with the tweet text feature to classify the tweets. The benefit of using
tweet text feature is that we can identify the spam tweets even if the spammer
creates a new account which was not possible only with the user and tweet based
features. We have evaluated our solution with four different machine learning
algorithms namely - Support Vector Machine, Neural Network, Random Forest and
Gradient Boosting. With Neural Network, we are able to achieve an accuracy of
91.65% and surpassed the existing solution [1] by approximately 18%.
IEEE 2018: A Novel Mechanism for Fast Detection of Transformed Data
Leakage
Abstract: Data leakage is a growing insider threat
in information security among organizations and individuals. A series of
methods has been developed to address the problem of data leakage prevention
(DLP). However, large amounts of unstructured data need to be tested in the big
data era. As the volume of data grows dramatically and the forms of data become
much complicated, it is a new challenge for DLP to deal with large amounts of
transformed data. We propose an adaptive weighted graph walk model to solve
this problem by mapping it to the dimension of weighted graphs. Our approach
solves this problem in three steps. First, the adaptive weighted graphs are
built to quantify the sensitivity of the tested data based on its context.
Then, the improved label propagation is used to enhance the scalability for
fresh data. Finally, a low-complexity score walk algorithm is proposed to
determine the ultimate sensitivity. Experimental results show that the proposed
method can detect leaks of transformed or fresh data fast and efficiently.
IEEE 2018: A Workflow Management System for Scalable Data Mining
on Clouds
Abstract: The extraction of useful
information from data is often a complex process that can be conveniently
modeled as a data analysis workflow. When very large data sets must be analyzed
and/or complex data mining algorithms must be executed, data analysis workflows
may take very long times to complete their execution. Therefore, efficient
systems are required for the scalable execution of data analysis workflows, by
exploiting the computing services of the Cloud platforms where data is
increasingly being stored. The objective of the paper is to demonstrate how
Cloud software technologies can be integrated to implement an effective
environment for designing and executing scalable data analysis workflows. We
describe the design and implementation of the Data Mining Cloud Framework
(DMCF), a data analysis system that integrates a visual workflow language and a
parallel runtime with the Software-as-a-Service (SaaS) model. DMCF was designed
taking into account the needs of real data mining applications, with the goal
of simplifying the development of data mining applications compared to generic
workflow management systems that are not specifically designed for this domain.
The result is a high-level environment that, through an integrated visual
workflow language, minimizes the programming effort, making easier to domain
experts the use of common patterns specifically designed for the development
and the parallel execution of data mining applications. The DMCF’s visual
workflow language, system architecture and runtime mechanisms are presented. We
also discuss several data mining workflows developed with DMCF and the
scalability obtained executing such workflows on a public Cloud.
IEEE 2018: Machine Learning Methods for Disease Prediction with
Claims Data
Abstract: One of the primary challenges of
healthcare delivery is aggregating disparate, asynchronous data sources into
meaningful indicators of individual health. We combine natural language word
embedding and network modeling techniques to learn meaningful representations
of medical concepts by using the weighted network adjacency matrix in the GloVe
algorithm, which we call Code2Vec. We demonstrate that using our learned
embeddings improve neural network performance for disease prediction. However,
we also demonstrate that popular deep learning models for disease prediction
are not meaningfully better than simpler, more interpretable classifiers such
as XGBoost. Additionally, our work adds to the current literature by providing
a comprehensive survey of various machine learning algorithms on disease
prediction tasks.
Abstract: Prior to buying a
product, people usually inform themselves by reading online reviews. To make
more profit sellers often try to fake user experience. As customers are being
deceived this way, recognizing and removing fake reviews is of great
importance. This paper analyzes spam detection methods, based on machine
learning, and presents their overview and results.
IEEE 2017: NetSpam: a Network-based Spam Detection Framework for
Reviews in Online Social Media
Abstract: Nowadays, a big part of people rely on
available content in social media in their decisions (e.g. reviews and feedback
on a topic or product). The possibility that anybody can leave a review
provides a golden opportunity for spammers to write spam reviews about products
and services for different interests. Identifying these spammers and the spam
content is a hot topic of research and although a considerable number of
studies have been done recently toward this end, but so far the methodologies
put forth still barely detect spam reviews, and none of them show the
importance of each extracted feature type. In this study, we propose a novel framework,
named NetSpam, which utilizes spam features for modeling review datasets as
heterogeneous information networks to map spam detection procedure into a
classification problem in such networks. Using the importance of spam features
help us to obtain better results in terms of different metrics experimented on
real-world review datasets from Yelp and Amazon websites. The results show that
NetSpam outperforms the existing methods and among four categories of features;
including review-behavioral, user-behavioral, review linguistic,
user-linguistic, the first type of features performs better than the other
categories.
No comments:
Post a Comment