Full Text:   <3354>

CLC number: TP391

On-line Access: 

Received: 2005-08-05

Revision Accepted: 2005-09-10

Crosschecked: 0000-00-00

Cited: 0

Clicked: 5119

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE A 2005 Vol.6 No.11 P.1327-1340


The Million Book Project at Bibliotheca Alexandrina

Author(s):  ELDAKAR Youssef, EL-GAZZAR Khalid, ADLY Noha, NAGI Magdy

Affiliation(s):  Bibliotheca Alexandrina, El Shatby 21526, Alexandria, Egypt; more

Corresponding email(s):   Noha.Adly@bibalex.org

Key Words:  Million Book Project (MBP), Digital books workflow, Digitization, Universal Digital Library, Scanning, Multilingual OCR, Digital publishing, Image-on-text, DjVu, PDF

ELDAKAR Youssef, EL-GAZZAR Khalid, ADLY Noha, NAGI Magdy. The Million Book Project at Bibliotheca Alexandrina[J]. Journal of Zhejiang University Science A, 2005, 6(11): 1327-1340.

@article{title="The Million Book Project at Bibliotheca Alexandrina",
author="ELDAKAR Youssef, EL-GAZZAR Khalid, ADLY Noha, NAGI Magdy",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T The Million Book Project at Bibliotheca Alexandrina
%A ELDAKAR Youssef
%A ADLY Noha
%A NAGI Magdy
%J Journal of Zhejiang University SCIENCE A
%V 6
%N 11
%P 1327-1340
%@ 1673-565X
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.A1327

T1 - The Million Book Project at Bibliotheca Alexandrina
A1 - ELDAKAR Youssef
A1 - EL-GAZZAR Khalid
A1 - ADLY Noha
A1 - NAGI Magdy
J0 - Journal of Zhejiang University Science A
VL - 6
IS - 11
SP - 1327
EP - 1340
%@ 1673-565X
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.A1327

The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for turning printed books into digital books as its contribution to the building of a universal Digital Library. This workflow is a process consisting of multiple phases, namely, scanning, image processing, OCR, digital archiving, document encoding, and publishing. Over the past couple of years, the BA has defined procedures and special techniques for the scanning, processing, OCR and publishing, especially of Arabic books. This workflow has been automated, allowing the governance of the different phases and making possible the production of 18000 books so far. The BA has also designed and implemented a framework for the encoding of digital books that allows publishing as well as a software system for managing the creation, maintenance, and publishing of the overall digital repository.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Allen, J., Becker, J., 2003. The Unicode Standard, Version 4.0. Addison-Wesley, Reading, MA.

[2] DAFS (Document Attribute Format Specification), 1994. RAF Technology, Inc., Redmond, Washington.

[3] DCMI (Dublin Core Metadata Element Set), 2004. The Dublin Core Metadata Initiative.

[4] Ding, X., Wen, D., Peng, L., Liu, C., 2004. Document Digitization Technology and its Application for Digital Library in China. Proceedings of the First International Conference on Document Image Analysis for Libraries, p.46-53.

[5] DjVu Technology Primer, 2004. LizardTech, Inc., Seattle, WA.

[6] Haffner, P., Bottou, L., Howard, P., Le Cun, Y., 1999. DjVu: Analyzing and Compressing Scanned Documents for Internet Distribution. Proceedings of International Conference on Document Analysis and Recognition (ICDAR’99), p.625-628.

[7] Hong, T., Srihari, S., 1997. Representing OCRed Documents in HTML. Proceedings of the Fourth International Conference on Document Analysis and Recognition, p.831-834.

[8] JAIAPI (Java Advanced Imaging API), 2004. Sun Microsystems, Santa Clara, CA.

[9] Kenney, A., Rieger, O., 2000. Moving Theory into Practice: Digital Imaging for Libraries and Archives. Research Libraries Group.

[10] Lesk, M., 1996. Substituting Images for Books: The Economics for Libraries. Proceedings of Symposium on Document Analysis and Information Retrieval, p.1-16.

[11] Lie, H., Bos, B., 1999. Cascading Style Sheets, Level 1. The World Wide Web Consortium.

[12] PDF Reference, 2004. Fourth Edition. Adobe Systems, Inc., San Jose, CA.

[13] Phelps, T., Wilensky, R., 2001. The Multivalent Browser: A Platform for New Ideas. Proceedings of the ACM Symposium on Document Engineering, Atlanta, US.

[14] Saleh, I., Adly, N., Nagi, M., 2005. DAR: A Digital Assets Repository for Library Collections. Proceedings of ECDL’05, Vienna, Austria.

[15] UDL (The Universal Digital Library), 2004. Carnegie Mellon University, Pittsburg, PA.

[16] Yergeau, F., Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E., 2004. Extensible Markup Language (XML), 1.0, Third Edition. The World Wide Web Consortium.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE