Please wait a minute...
Tsinghua Science and Technology  2019, Vol. 24 Issue (2): 171-182    doi: 10.26599/TST.2018.9010051
    
Computing Skyline Groups: An Experimental Evaluation
Haoyang Zhu, Xiaoyong Li*, Qiang Liu, Hao Zhu
∙ Haoyang Zhu is with the Academy of Military Science of the People’s Liberation Army, Beijing 100091, China. E-mail: zhuhaoyang@nudt.edu.cn.
∙ Xiaoyong Li is with the Academy of Ocean Science and Engineering, National University of Defense Technology, Changsha 410073, China.
∙ Qiang Liu and Hao Zhu are with the College of Computer, National University of Defense Technology, Changsha 410073, China. E-mail: qiangliu06@nudt.edu.cn; haozhu@nudt.edu.cn.
Download: PDF (1018 KB)      HTML
Export: BibTeX | EndNote (RIS)       Supporting Info
Guide   

Abstract  

Skyline group, also named as combinational skyline or group-based skyline, has attracted more attention recently. The concept of skyline groups is proposed to address the problem in the inadequacy of the traditional skyline to answer queries that need to analyze not only individual points but also groups of points. Skyline group algorithms aim at finding groups of points that are not dominated by any other same-size groups. Although two types of dominance relationship exist between the groups defined in existing works, they have not been compared systematically under the same experimental framework. Thus, practitioners face difficulty in selecting an appropriate definition. Furthermore, the experimental evaluation in most existing works features a weakness, that is, studies only experimented on small data sets or large data sets with small dimensions. For comprehensive comparisons of the two types of definition and existing algorithms, we evaluate each algorithm in terms of time and space on various synthetic and real data sets. We reveal the characteristics of existing algorithms and provide guidelines on selecting algorithms for different situations.



Key wordsskyline queries      skyline groups      performance evaluation     
Received: 22 May 2017      Published: 29 April 2019
Corresponding Authors: Xiaoyong Li   
About author:

Hao Zhu received the PhD degree in the SNE group from Delft University of technology, in 2015. Now he is a researcher in the College of Computer Science, National University of Defense Technology. His research interests are in the area of green computing and GPU computing. To be particular, he studies power estimation models, power management of heterogeneous system, and real time GPU computing.

Cite this article:

Haoyang Zhu, Xiaoyong Li, Qiang Liu, Hao Zhu. Computing Skyline Groups: An Experimental Evaluation. Tsinghua Science and Technology, 2019, 24(2): 171-182.

URL:

http://tst.tsinghuajournals.com/10.26599/TST.2018.9010051     OR     http://tst.tsinghuajournals.com/Y2019/V24/I2/171

Fig. 1 A skyline example.
NotationDescription
Dd-dimensional data set
dNumber of dimensions
nNumber of points in D
QiThe i?-th point in D
QjiValue on the j?-th dimension of Qi
Preference/dominance relation
SkylineSkyline of data set D
lSize of a group
Table 1 Summary of notations.
Fig. 2 Dominance relations under different aggregate functions.
Skyline group
Permutation{Q1,Q3},{Q1,Q5},{Q3,Q4},{Q3,Q5}
SUM{Q1,Q3},{Q1,Q5},{Q3,Q5}
MAX{Q1,Q5}
MIN{Q1,Q2},{Q1,Q3},{Q2,Q3},
{Q3,Q4},{Q3,Q5}
Table 2 Skyline groups under different definitions.
Fig. 3 An example of computing 2-point skyline groups under Permutation.
n l=2l=4l=6
SUMMAXMINDef.?2SUMMAXMINDef.2SUMMAXMINDef.2
2×105G2123241011747146111292255×105513464246.3×107
V212235101146135322551341424
4×105G258323120226825917523619×10511 757107681.52×108
V258284118259139635711 7571762
6×105G2481584120276025643983691.3×10615 145367422.79×108
V248300120256432136915 1451736
8×105V163302881706110621222485×10550361604488.3×107
V16321588110619624850361448
1×106G173370892004124443182687×10552294504551.15×108
V17322889124422226852291449
Table 3 G and V, under various n, l (d=6).
d l=2l=4l=6
SUMMAXMINDef.2SUMMAXMINDef.2SUMMAXMINDef.2
3G31×106411721159163006813
V3127131614
4G19641116756701253751172283 155
V1917956112117116
5G65118356302456404775 6745911×108504.2×106
V6552352452547117150
6G173370892004124443182687×10552294504551.15×108
V17322889124422226852291449
7G2857001774475321677236053.5×10618 541391212761.2×109
V2855371773216132060518 541991276
Table 4 G and V, under various d, l (n=1 ×?×𝟏𝟎𝟔).
Fig. 4 Runtime on NBA data set.
Fig. 5 Output size on NBA data set.
Fig. 6 Experimental results on NBA data set under MAX.
Fig. 7 Experimental results on NBA data set under MIN.
Fig. 8 (a) and (b): Runtime; (c) and (d): output size, mixture of SUM, MAX, and MIN.
n=1××𝟏𝟎𝟔,l=2).">
Fig. 9 Experimental results on correlated data sets (n=1××𝟏𝟎𝟔,l=2).
n=1××𝟏𝟎𝟔,l=2).">
Fig. 10 Experimental results on independent data sets (n=1××𝟏𝟎𝟔,l=2).
n=1××𝟏𝟎𝟔,l=2).">
Fig. 11 Experimental results on anti-correlated data sets (n=1××𝟏𝟎𝟔,l=2).
AlgorithmDefinitionRuntimeOutput
unit-wise+Permutationshortlarge
DPSGSUMmedialmedial
MAXshortsmall
MINlongsmall
DPAVMAXlongsmall
MINlongsmall
Table 5 Characteristics of existing algorithms.
[1]   Im H. and Park S., Group skyline computation, Inf. Sci., vol. 188, no. 1, pp. 151-169, 2012.
[2]   Li C., Zhang N., Hassan N., Rajasekaran S., and Das G., On skyline groups, in 21st International Conference on Information and Knowledge Management, 2012, pp. 2119- 2123.
[3]   Chung Y., Su I., and Lee C., Efficient computation of combinatorial skyline queries, Inf. Syst., vol. 38, no. 3, pp. 369-387, 2013.
[4]   Liu J., Xiong L., Pei J., Luo J., and Zhang H., Finding pareto optimal groups: Group-based skyline, PVLDB, vol. 8, no. 13, pp. 2086-2097, 2015.
[5]   Zhang N., Li C., Hassan N., Rajasekaran S., and Das G., On skyline groups, IEEE Trans. Knowl. Data Eng., vol. 26, no. 4, pp. 942-956, 2014.
[6]   Su I., Chung Y., and Lee C., Top-kcombinatorial skyline queries, in Database Systems for Advanced Applications, 15th International Conference, 2010, pp. 79-93.
[7]   Zhu H., Zhu P., Li X., and Liu Q., Top-k skyline groups queries, in Proceedings of the 20th International Conference on Extending Database Technology, 2017, pp. 442-445.
[8]   Guo X., Li H., Wulamu A., Xie Y., and Fu Y., Efficient processing of skyline group queries over a data stream, Tsinghua Science and Technology, vol. 21, no. 1, pp. 29-39, 2016.
[9]   Zhu H., Zhu P., Li X., and Liu Q., Computing skyline groups: An experimental evaluation, in Proceedings of the ACM Turing 50th Celebration Conference, 2017, pp. 1-48.
[10]   B¨orzs¨onyi S., Kossmann D., and Stocker K., The skyline operator, in Proceedings of the 17th International Conference on Data Engineering, 2001, pp. 421-430.
[11]   Zhang P., Zhou C., Wang P., Gao B. J., Zhu X., and Guo L., E-tree: An efficient indexing structure for ensemble models on data streams, IEEE Trans. Knowl. Data Eng., vol. 27, no. 2, pp. 461-474, 2015.
[12]   Kossmann D., Ramsak F., and Rost S., Shooting stars in the sky: An online algorithm for skyline queries, in Proceedings of 28th International Conference on Very Large Data Bases, 2002, pp. 275-286.
[13]   Li X., Wang Y., Li X., Wang X., and Yu J., GDPS: An efficient approach for skyline queries over distributed uncertain data, Big Data Research, vol. 1, no.1, pp. 23-36, 2014.
[14]   Li X., Wang Y., Li X., and Wang Y., Parallel skyline queries over uncertain data streams in cloud computing environments, IJWGS, vol. 10, no. 1, pp. 24-53, 2014.
[15]   Lee K. C. K., Zheng B., Li H., and Lee W., Approaching the skyline in Z order, in Proceedings of the 33rd International Conference on Very Large Data Bases, 2007, pp. 279-290.
[16]   Chomicki J., Godfrey P., Gryz J., and Liang D., Skyline with presorting, in Proceedings of the 19th International Conference on Data Engineering, 2003, pp. 717-719.
[17]   Bartolini I., Ciaccia P., and Patella M., Efficient sort-based skyline evaluation, ACM Trans. Database Syst., vol. 33, no. 4, pp. 1-31, 2008.
[18]   Zhang S., Mamoulis N., and Cheung D. W., Scalable skyline computation using object-based space partitioning, in Proceedings of the International Conference on Management of Data, 2009, pp. 483-494.
[19]   Lee J. and Hwang S., Scalable skyline computation using a balanced pivot selection technique, Inf. Syst., vol. 39, no. 1, pp. 1-21, 2014.
[20]   Pei J., Jiang B., Lin X., and Yuan Y., Probabilistic skylines on uncertain data, in Proceedings of the 33rd International Conference on Very Large Data Bases, 2007, pp. 15-26.
[21]   Lu H., Jensen C. S., and Zhang Z., Flexible and efficient resolution of skyline query size constraints, IEEE Trans. Knowl. Data Eng., vol. 23, no. 7, pp. 991-1005, 2011.
[22]   Yuan Y., Lin X., Liu Q., Wang W., Yu J., and Zhang Q., Efficient computation of the skyline cube, in Proceedings of the 31st International Conference on Very Large Data Bases, 2005, pp. 241-252.
[23]   Zhang Q., Ye P., Lin X., and Zhang Y., Skyline probability over uncertain preferences, in Joint 2013 EDBT/ICDT Conferences, 2013, pp. 395-405.
[24]   Zhang X., Li G., and Feng J., Crowd-sourced top-k algorithms: An experimental evaluation, PVLDB, vol. 9, no. 8, pp. 612-623, 2016.
[25]   Chester S., Sidlauskas D., Assent I., and B?gh K. S., Scalable parallelization of skyline computation for multi-core processors, in 31st IEEE International Conference on Data Engineering, 2015, pp. 1083-1094.
[1] Yinjun Wu,Zhen Chen,Yuhao Wen,Wenxun Zheng,Junwei Cao. COMBAT: A New Bitmap Index Coding Algorithm for Big Data[J]. Tsinghua Science and Technology, 2016, 21(2): 136-145.
[2] Xu Xiaolin, Jin Hai, Wu Song, Tang Lixiang, Wang Yihong. URMG: Enhanced CBMG-Based Method for Automatically Testing Web Applications in the Cloud[J]. Tsinghua Science and Technology, 2014, 19(1): 65-75.
[3] . High-Performance Packet Classification on Multi-Core Network Processing Platforms[J]. Tsinghua Science and Technology, 2011, 16(4): 432-439.
[4] . Performance Improvement of Distributed Systems by Autotuning of the Configuration Parameters[J]. Tsinghua Science and Technology, 2011, 16(4): 440-448.