CLC number: TP311.13
On-line Access:
Received: 2004-04-20
Revision Accepted: 2004-08-01
Crosschecked: 0000-00-00
Cited: 0
Clicked: 6168
XIANG Long-gang, FENG Yu-cai, GUI Hao. Construction and compression of Dwarf[J]. Journal of Zhejiang University Science A, 2005, 6(6): 519-527.
@article{title="Construction and compression of Dwarf",
author="XIANG Long-gang, FENG Yu-cai, GUI Hao",
journal="Journal of Zhejiang University Science A",
volume="6",
number="6",
pages="519-527",
year="2005",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2005.A0519"
}
%0 Journal Article
%T Construction and compression of Dwarf
%A XIANG Long-gang
%A FENG Yu-cai
%A GUI Hao
%J Journal of Zhejiang University SCIENCE A
%V 6
%N 6
%P 519-527
%@ 1673-565X
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.A0519
TY - JOUR
T1 - Construction and compression of Dwarf
A1 - XIANG Long-gang
A1 - FENG Yu-cai
A1 - GUI Hao
J0 - Journal of Zhejiang University Science A
VL - 6
IS - 6
SP - 519
EP - 527
%@ 1673-565X
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.A0519
Abstract: There exists an inherent difficulty in the original algorithm for the construction of dwarf, which prevents it from constructing true dwarfs. We explained when and why it introduces suffix redundancies into the dwarf structure. To solve this problem, we proposed a completely new algorithm called PID. It bottom-up computes partitions of a fact table, and inserts them into the dwarf structure. If a partition is an MSV partition, coalesce its sub-dwarf; otherwise create necessary nodes and cells. Our performance study showed that PID is efficient. For further condensing of dwarf, we proposed Condensed dwarf, a more compressed structure, combining the strength of dwarf and Condensed Cube. By eliminating unnecessary stores of “ALL” cells from the dwarf structure, Condensed dwarf could effectively reduce the size of dwarf, especially for dwarfs of the real world, which was illustrated by our experiments. Its query processing is still simple and, only two minor modifications to PID are required for the construction of Condensed dwarf.
[1] Beyer, K., Ramakrishnan, R., 1999. Bottom-up Computation of Sparse and Iceberg CUBEs. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, ACM press, p.359-370.
[2] Fang, M., Shivkumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D., 1998. Computing Iceberg Queries Efficiently. Proceedings of 24th International Conference on Very Large Data Bases, Morgan Kaufmann Press, p.299-310.
[3] Gray, J., Bosworth, B., Layman, A., Pirahesh, H., 1996. Data Cube: A Relational Operator Generalizing Group-by, Cross-tab, and Sub-totals. Proceedings of the 12th International Conference on Data Engineering, IEEE Computer Society Press, p.152-159.
[4] Hahn, C., Warren, S., London, J., 1994. Edited Synoptic Cloud Reports from Ships and Land Stations over the Globe, 1982–1991. http://cidiac.est.ornl.gov/ftp/ndp026b/SEP-85L.z.
[5] Lakshmanan, L.V.S., Pei, J., Han, J.W., 2002. Quotient Cube: How to Summarize the Semantics of A Data Cube. Proceedings of 28th International Conference on Very Large Data Bases, Morgan Kaufmann Press, p.766-777.
[6] Lakshmanan, L.V.S., Pei, J., Zhao, Y., 2003. QC-Trees: An Effective Summary Structure for Semantic OLAP. Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ACM Press, p.64-75.
[7] Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y., 2002. Dwarf: Shrinking the PetaCube. Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, ACM Press, p.464-475.
[8] Wang, W., Feng, J.L., Lu, H.J., Yu, J.X., 2002. Condensed Cube: An Effective Approach to Reducing Data Cube Size. Proceedings of the 18th International Conference on Data Engineering, IEEE Computer Society Press, p.155-165.
[9] Xin, D., Han, J.W., Li, X.L., Wah, B.W., 2003. Star-cubing: Computing Iceberg Cubes by Top-down and Bottom-up Integration. Proceedings of 29th International Conference on Very Large Data Bases, Morgan Kaufmann Press, p.476-487.
[10] Zhao, Y., Deshpande, P., Naughton, J.F., 1997. An Array-based Algorithm for Simultaneous Multidimensional Aggregates. Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, ACM Press, p.159-170.
Open peer comments: Debate/Discuss/Question/Opinion
<1>