IEEE XPLORE PROJECT’S ABSTRACT 1

Xu Yang   Bouguettaya, A., Spirent Commun., Rockville, MD

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Feb. 2009, Volume: 21, Issue: 2, On page(s): 259-272, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10370658, Digital Object Identifier: 10.1109/TKDE.2008.157,

First Published: 2008-08-01, Current Version Published: 2008-12-30

ABSTRACT

M-services provide mobile users wireless access to Web services. In this paper, we present a novel infrastructure for supporting M-services in wireless broadcast systems. The proposed infrastructure provides a generic framework for mobile users to look up, access, and execute Web services over wireless broadcast channels. Access efficiency is an important issue in wireless broadcast systems. We discuss different semantics that have impact on the access efficiency for composite M-services. A multiprocess workflow is proposed for effectively accessing composite M-services from multiple broadcast channels based on these semantics. We also present and compare different broadcast channel organizations for M-services and wireless data. Analytical models are provided for these channel organizations. Practical studies are presented to demonstrate the impact of different semantics and channel organizations on the access efficiency.

ONLINE SCHEDULING SEQUENTIAL OBJECTS WITH PERIODICITY FOR DYNAMIC INFORMATION DISSEMINATION

Chih-Lin Hu   Ming-Syan Chen, Dept. of Commun. Eng., Nat. Central Univ., Jhongli

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Feb. 2009, Volume: 21, Issue: 2, On page(s): 273-286, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10370657, Digital Object Identifier: 10.1109/TKDE.2008.148,

First Published: 2008-07-18, Current Version Published: 2008-12-30

ABSTRACT

The scalability of data broadcasting has been manifested by prior studies on the base of the traditional data management systems where data objects, mapped to a pair of state and value in the database, are independent, persistent, and static against simple queries. However, many modern information applications spread dynamic data objects and process complex queries for retrieving multiple data objects. Particularly, the information servers dynamically generate data objects that are dependent and can be associated into a complete response against complex queries. Accordingly, the study in this paper considers the problem of scheduling dynamic broadcast data objects in a clients-providers-servers system from the standpoint of data association, dependency, and dynamics. Since the data broadcast problem is NP-hard, we derive the lower and the upper bounds of the mean service access time. In light of the theoretical analyses, we further devise a deterministic algorithm with several gain measure functions for the approximation of schedule optimization. The experimental results show that the proposed algorithm is able to generate a dynamic broadcast schedule and also minimize the mean service access time to the extent of being very close to the theoretical optimum.


STORING AND INDEXING SPATIAL DATA IN P2P SYSTEMS

Kantere, V.   Skiadopoulos, S.   Sellis, T.  Ecole Polytech. Fed. de Lausanne, Lausanne

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Feb. 2009, Volume: 21, Issue: 2, On page(s): 287-300, Location: Los Angeles, CA, USA, ISSN: 1041-4347

INSPEC Accession Number: 10370655, Digital Object Identifier: 10.1109/TKDE.2008.139,

First Published: 2008-07-15, Current Version Published: 2008-12-30

ABSTRACT

The peer-to-peer (P2P) paradigm has become very popular for storing and sharing information in a totally decentralized manner. At first, research focused on P2P systems that host 1D data. Nowadays, the need for P2P applications with multidimensional data has emerged, motivating research on P2P systems that manage such data. The majority of the proposed techniques are based either on the distribution of centralized indexes or on the reduction of multidimensional data to one dimension. Our goal is to create from scratch a technique that is inherently distributed and also maintains the multidimensionality of data. Our focus is on structured P2P systems that share spatial information. We present SpatialP2P, a totally decentralized indexing and searching framework that is suitable for spatial data. SpatialP2P supports P2P applications in which spatial information of various sizes can be dynamically inserted or deleted, and peers can join or leave. The proposed technique preserves well locality and directionality of space.

UNSUPERVISED MULTIWAY DATA ANALYSIS: A LITERATURE SURVEY

Acar, E.   Yener, B.,  Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009, Volume: 21, Issue: 1, On page(s): 6-20, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10324287, Digital Object Identifier: 10.1109/TKDE.2008.112,

First Published: 2008-06-06, Current Version Published: 2008-11-25

ABSTRACT

Two-way arrays or matrices are often not enough to represent all the information in the data and standard two-way analysis techniques commonly applied on matrices may fail to find the underlying structures in multi-modal datasets. Multiway data analysis has recently become popular as an exploratory analysis tool in discovering the structures in higher-order datasets, where data have more than two modes. We provide a review of significant contributions in the literature on multiway models, algorithms as well as their applications in diverse disciplines including chemometrics, neuroscience, social network analysis, text mining and computer vision.


COMPARING SCORES INTENDED FOR RANKING

Bhamidipati, N.L.   Pal, S.K., Data Min. & Res. Group, Yahoo! Software Dev. India Pvt. Ltd., Bangalore

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009, Volume: 21, Issue: 1, On page(s): 21-34, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10324288, Digital Object Identifier: 10.1109/TKDE.2008.111,

First Published: 2008-06-06, Current Version Published: 2008-11-25

ABSTRACT

Often, ranking is performed on the the basis of some scores available for each item. The existing practice for comparing scoring functions is to compare the induced rankings by one of the multitude of rank comparison methods available in the literature. We suggest that it may be better to compare the underlying scores themselves. To this end, a generalized Kendall distance is defined, which takes into consideration not only the final ordering that the two schemes produce, but also at the spacing between pairs of scores. This is shown to be equivalent to comparing the scores after fusing with another set of scores, making it theoretically interesting. A top k version of the score comparison methodology is also provided. Experimental results clearly show the advantages score comparison has over rank comparison.

ONLINE SKYLINE ANALYSIS WITH DYNAMIC PREFERENCES ON NOMINAL ATTRIBUTES

Wong, R.C.-W.   Jian Pei   Fu, A.W.-C.   Ke Wang., Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Kowloon

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009, Volume: 21, Issue: 1, On page(s): 35-49, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10324289, Digital Object Identifier: 10.1109/TKDE.2008.115,

First Published: 2008-06-17, Current Version Published: 2008-11-25

ABSTRACT

The importance of skyline analysis has been well recognized in multi-criteria decision making applications. All of the previous studies assume a fixed order on the attributes in question. However, in some applications, users may be interested in skylines with respect to various total or partial orders on nominal attributes. In this paper, we identify and tackle the problem of online skyline analysis with dynamic preferences on nominal attributes. We investigate how changes of orders in attributes lead to changes of skylines. We address two novel types of interesting queries: a viewpoint query returns with respect to which orders a point is (or is not) in the skylines and an order-based skyline query retrieves the skyline with respect to a specific order. We develop two methods systematically and report an extensive performance study using both synthetic and real data sets to verify their effectiveness and efficiency.

SELF-LEARNING DISK SCHEDULING

Yu Zhang   Bhargava, B.,  Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009, Volume: 21, Issue: 1, On page(s): 50-65, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10324290, Digital Object Identifier: 10.1109/TKDE.2008.116,

First Published: 2008-06-20, Current Version Published: 2008-11-25

ABSTRACT

Performance of disk I/O schedulers is affected by many factors, such as workloads, file systems, and disk systems. Disk scheduling performance can be improved by tuning scheduler parameters, such as the length of read timers. Scheduler performance tuning is mostly done manually. To automate this process, we propose four self-learning disk scheduling schemes: change-sensing Round-Robin, feedback learning, per-request learning, and two-layer learning. Experiments show that the novel two-layer learning scheme performs best. It integrates the workload-level and request-level learning algorithms. It employs feedback learning techniques to analyze workloads, change scheduling policy, and tune scheduling parameters automatically. We discuss schemes to choose features for workload learning, divide and recognize workloads, generate training data, and integrate machine learning algorithms into the two-layer learning scheme. We conducted experiments to compare the accuracy, performance, and overhead of five machine learning algorithms: decision tree, logistic regression, naive Bayes, neural network, and support vector machine algorithms. Experiments with real-world and synthetic workloads show that self-learning disk scheduling can adapt to a wide variety of workloads, file systems, disk systems, and user preferences. It outperforms existing disk schedulers by as much as 15.8% while consuming less than 3%-5% of CPU time.

DISCRIMINATIVE TRAINING OF THE HIDDEN VECTOR STATE MODEL FOR SEMANTIC PARSING

Deyu Zhou   Yulan He., Inf. Res. Centre, Univ. of Reading, Reading

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009,Volume: 21, Issue: 1, On page(s): 66-77, Location: Los Angeles, CA, USA, ISSN: 1041-4347

INSPEC Accession Number: 10324291, Digital Object Identifier: 10.1109/TKDE.2008.95,

First Published: 2008-05-16, Current Version Published: 2008-11-25

ABSTRACT

In this paper, we discuss how discriminative training can be applied to the hidden vector state (HVS) model in different task domains. The HVS model is a discrete hidden Markov model (HMM) in which each HMM state represents the state of a push-down automaton with a finite stack size. In previous applications, maximum-likelihood estimation (MLE) is used to derive the parameters of the HVS model. However, MLE makes a number of assumptions and unfortunately some of these assumptions do not hold. Discriminative training, without making such assumptions, can improve the performance of the HVS model by discriminating the correct hypothesis from the competing hypotheses. Experiments have been conducted in two domains: the travel domain for the semantic parsing task using the DARPA Communicator data and the Air Travel Information Services (ATIS) data and the bioinformatics domain for the information extraction task using the GENIA corpus. The results demonstrate modest improvements of the performance of the HVS model using discriminative training. In the travel domain, discriminative training of the HVS model gives a relative error reduction rate of 31 percent in F-measure when compared with MLE on the DARPA Communicator data and 9 percent on the ATIS data. In the bioinformatics domain, a relative error reduction rate of 4 percent in F-measure is achieved on the GENIA corpus.

EFFICIENT EVALUATION OF PROBABILISTIC ADVANCED SPATIAL QUERIES ON EXISTENTIALLY UNCERTAIN DATA

Man Lung Yiu   Mamoulis, N.   Xiangyuan Dai   Yufei Tao   Vaitis,  M.   Dept. of Comput. Sci., Aalborg Univ., Aalborg

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009, Volume: 21, Issue: 1, On page(s): 108-122, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10324294, Digital Object Identifier: 10.1109/TKDE.2008.135,

First Published: 2008-07-15, Current Version Published: 2008-11-25

ABSTRACT

We study the problem of answering spatial queries in databases where objects exist with some uncertainty and they are associated with an existential probability. The goal of a thresholding probabilistic spatial query is to retrieve the objects that qualify the spatial predicates with probability that exceeds a threshold. Accordingly, a ranking probabilistic spatial query selects the objects with the highest probabilities to qualify the spatial predicates. We propose adaptations of spatial access methods and search algorithms for probabilistic versions of range queries, nearest neighbors, spatial skylines, and reverse nearest neighbors and conduct an extensive experimental study, which evaluates the effectiveness of proposed solutions.

CDNS CONTENT OUTSOURCING VIA GENERALIZED COMMUNITIES

Katsaros, D.   Pallis, G.   Stamos, K.   Vakali, A.   Sidiropoulos, A.  Manolopoulos, Y.

Dept. of Comput. & Commun. Eng., Thessaly Univ., Volos

This paper appears in: Knowledge and Data Engineering, IEEE Transactions on Publication Date: Jan. 2009

Volume: 21, Issue: 1, On page(s): 137-151, Location: Los Angeles, CA, USA, ISSN: 1041-4347,

INSPEC Accession Number: 10324296, Digital Object Identifier: 10.1109/TKDE.2008.92,

First Published: 2008-05-12, Current Version Published: 2008-11-25

ABSTRACT

Content distribution networks (CDNs) balance costs and quality in services related to content delivery. Devising an efficient content

Advertisements