References

[1] N. Abdallah, F. Goasdoué, and M.-C. Rousset. DL-LITE $R$ in the Light of Propositional Logic for Decentralized Data Management. In Proc. Intl. Joint Conference on Artificial Intelligence (IJCAI), 2009.

[2] S. Abiteboul, S. Alstrup, H. Kaplan, T. Milo, and T. Rauhe. Compact labeling scheme for ancestor queries. SIAM J. Comput., 35(6):1295–1309, 2006.

[3] S. Abiteboul and C. Beeri. The power of languages for the manipulation of complex values. Very Large Databases Journal (VLDBJ), 4(4):727–794, 1995.

[4] S. Abiteboul and N. Bidoit. Non first normal form relations: An algebra allowing data restructuring. J. Comput. Syst. Sci., 33(3):361–393, 1986.

[5] S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan-Kaufman, 1999.

[6] S. Abiteboul, S. Cluet, V. Christophides, T. Milo, G. Moerkotte, and J. Simeon. Querying documents in object databases. Intl. Journal on Digital Libraries, 1:5–19, 1997.

[7] S. Abiteboul, M. Preda, and G. Cobena. Adaptive on-line page importance computation. In Proc. Intl. World Wide Web Conference (WWW), 2003.

[8] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data. Intl. Journal on Digital Libraries, 1:68–88, 1997.

[9] S. Abiteboul, R.Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.

[10] A. Abouzeid, K. Bajda-Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz. HadoopDB: An Architectural Hybrid of MAPREDUCE and DBMS Technologies for Analytical Workloads. Proceedings of the VLDB Endowment (PVLDB), 2(1):922–933, 2009.

[11] A.Cali, G.Gottlob, and T. Lukasiewicz. Datalog+-: a unified approach to ontologies and integrity constraints. In Proc. Intl. Conf. on Database Theory (ICDT), 2009.

[12] A.Cali, G.Gottlob, and T. Lukasiewicz. A general datalog-based framework for tractable query answering over ontologies. In Proc. ACM Symp. on Principles of Database Systems (PODS), 2009.

[13] A. Acciarri, D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, M. Palmieri, and R. Rosati. Quonto: Querying ontologies. In Proc. Intl. Conference on Artificial Intelligence (AAAI), 2005.

[14] P. Adjiman, P. Chatalic, F. Goasdoué, M.-C. Rousset, and L. Simon. Distributed reasoning in a peer-to-peer setting. Journal of Artificial Intelligence Research, 25, 2006.

[15] S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural joins: A primitive for efficient XML query pattern matching. In Proc. Intl. Conf. on Data Engineering (ICDE), 2002.

[16] D. Allemang and J. Hendler. Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL. Morgan-Kaufman, 2008.

[17] J. C. Anderson, J. Lehnardt, and N. Slater. CouchDB: the Definitive Guide. O’Reilly, 2010. Available at http://wiki.apache.org/couchdb/.

[18] V. N. Anh and A. Moffat. Inverted Index Compression Using Word-Aligned Binary Codes. Inf. Retrieval, 8(1):151–166, 2005.

[19] V. N. Anh and A. Moffat. Improved Word-Aligned Binary Compression for Text Indexing. IEEE Transactions on Knowledge and Data Engineering, 18(6):857–861, 2006.

[20] G. Antoniou and F. van Harmelen. A Semantic Web Primer. The MIT Press, 2008.

[21] A. Arasu and H. Garcia-Molina. Extracting structured data from Web pages. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 337–348, June 2003.

[22] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003.

[23] J.-F. Baget, M. Croitoru, A. Gutierrez, M. LeclÃĺre, and M.-L. Mugnier. Translations between rdf(s) and conceptual graphs. In Proc. Intl. Conference on Conceptual Structures (ICCS), pages 28–41, 2010.

[24] M. Benedikt and C. Koch. XPath leashed. ACM Computing Surveys, 41(1), 2008.

[25] M. Benedikt and C. Koch. From XQuery to relational logics. ACM Trans. on Database Systems, 34(4), 2009.

[26] V. Benzaken, G. Castagna, and A. Frisch. Cduce: an xml-centric general-purpose language. SIGPLAN Notices, 38(9):51–63, 2003.

[27] G. J. Bex, F. Neven, T. Schwentick, and S. Vansummeren. Inference of concise regular expressions and DTDs. ACM Trans. on Database Systems, 35(2), 2010.

[28] G. J. Bex, F. Neven, and S. Vansummeren. Inferring xml schema definitions from xml data. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 998–1009, 2007.

[29] K. P. Birman, editor. Reliable distributed systems: technologies, Web services, and applications. Springer, 2005.

[30] P. Blackburn, J. V. Benthem, and F. Wolter. Handbook of Modal Logic. Springer, 2006.

[31] G. E. Blelloch. Programming Parallel Algorithms. Commun. ACM, 39(3):85–97, 1996.

[32] P. A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, and J. Teubner. MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 479–490, 2006.

[33] BrightPlanet. The Deep Web: Surfacing Hidden Value. White Paper, July 2000.

[34] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30(1–7):107–117, Apr. 1998.

[35] A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the Web. Computer Networks, 29(8-13):1157–1166, 1997.

[36] J. D. Bruijn, E. Franconi, and S. Tessaris. Logical reconstruction of normative RDF. In Proc. OWL: Experiences and Directions Workshop (OWLED’05), 2005.

[37] N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), 2002.

[38] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-LITE Family. Journal of Automated Reasoning, 39(3):385–429, 2007.

[39] R. G. G. Cattell, editor. The Object Database Standard: ODMG-93. Morgan Kaufmann, 1994.

[40] R. Chaiken, B. Jenkins, P.-Å. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. Proc. Intl. Conf. on Very Large Databases (VLDB), 1(2):1265–1276, 2008.

[41] S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, 2003.

[42] A. Chandra and M. Vardi. The implication problem for functional and inclusion dependencies is undecidable. SIAM Journal on Computing, 14(3):671–677, 1985.

[43] C.-H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan. A survey of Web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18(10):1411–1428, Oct. 2006.

[44] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Intl. Symp. on Operating System Design and Implementation (OSDI), 2006.

[45] K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. Structured Databases on the Web: Observations and Implications. SIGMOD Record, 33(3):61–70, 2004.

[46] K. C.-C. Chang, B. He, and Z. Zhang. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In Proc. Intl. Conference on Innovative Data Systems Research (CIDR), Jan. 2005.

[47] M. Chein and M.-L. Mugnier. Graph-based Knowledge Representation. Springer, 2008.

[48] H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree automata techniques and applications. http://www.grappa.univ-lille3.fr/tata, 2007.

[49] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, 1990.

[50] A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram. Querying Peer-to-Peer Networks Using P-Trees. In Proc. Intl. Workshop on the Web and Databases (WebDB), pages 25–30, 2004.

[51] V. Crescenzi, G. Mecca, and P. Merialdo. RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In Proc. Intl. Conf. on Very Large Databases (VLDB), 2001.

[52] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon’s highly available key-value store. In Proc. ACM Symposium on Operating Systems Principles (SOSP), pages 205–220, 2007.

[53] R. Devine. Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm. In Intl. Conf. on Foundations of Data Organization and Algorithms (FODO), pages 101–114, 1993.

[54] D. DeWitt and M. Stonebraker. MAPREDUCE, a major Step Backward. DatabaseColumn blog, 1987. http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/.

[55] D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA - A High Performance Dataflow Database Machine. In Proc. Intl. Conf. on Very Large Databases (VLDB), 1996.

[56] D. J. DeWitt and J. Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85–98, 1992.

[57] P. Dietz. Maintaining order in a linked list. In Proc. ACM SIGACT Symp. on the Theory of Computing (STOC), 1982.

[58] Document Object Model. w3.org/DOM.

[59] O. Duschka, M. Genesereth, and A. Y. Levy. Recursive query plans for data integration. Journal of Logic Programming, 43(1):49–73, 200.

[60] P. Elias. Universal code word sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194–203, 1975.

[61] R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Addison-Wesley, 200.

[62] FaCT++. http://owl.cs.manchester.ac.uk/fact++/.

[63] R. Fagin. Combining fuzzy information from multiple systems. Journal of Computer and System Sciences, 58:83–99, 1999. Abstract published in PODS’96.

[64] R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 66:614–656, 2003. Abstract published in PODS’2001.

[65] G. Flake, S. Lawrence, and C. L. Giles. Efficient Identification of Web Communities. In Proc. ACM Intl. Conf. on Knowledge and Data Discovery (SIGKDD), pages 150–160, 2000.

[66] G. W. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-Organization of the Web and Identification of Communities. IEEE Computer, 35(3):66–71, 2002.

[67] D. Florescu and D. Kossmann. Storing and Querying XML Data using an RDMBS. IEEE Data Eng. Bull., 22(3):27–34, 1999.

[68] A. Fox, S. D. Gribble, Y. Chawathe, E. A. Brewer, and P. Gauthier. Cluster-Based Scalable Network Services. In Proc. ACM Symposium on Operating Systems Principles (SOSP), pages 78–91, 1997.

[69] S. Fushimi, M. Kitsuregawa, and H. Tanaka. An overview of the system software of a parallel relational database machine grace. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 209–219, 1986.

[70] A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a HighLevel Dataflow System on top of MAPREDUCE: The PIG Experience. Proceedings of the VLDB Endowment (PVLDB), 2(2):1414–1425, 2009.

[71] S. Ghemawat, H. Gobioff, , and S.-T. Leung. The Google File System. In Proc. Intl. ACM Symposium on Operating Systems Principles (SOSP), 2003.

[72] S. Gilbert and N. A. Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51–59, 2002.

[73] F. Goasdoué and M.-C. Rousset. Querying distributed data through distributed ontologies: A simple but scalable approach. IEEE Intelligent Systems (IS), 18(5):60–65, 2003.

[74] C. Goldfarb. The SGML Handbook. Calendon Press, Oxford, 1990.

[75] R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 436–445, 1997.

[76] G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. ACM Trans. on Database Systems, 30(2):444–491, 2005.

[77] J. Gray, P. Helland, P. E. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 173–182, 1996.

[78] J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993.

[79] S. Grumbach and T. Milo. An algebra for pomsets. Inf. Comput., 150(2):268–306, 1999.

[80] T. Grust. Accelerating XPath location steps. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 109–120, 2002.

[81] T. Grust, S. Sakr, and J. Teubner. XQuery on SQL hosts. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 252–263, 2004.

[82] T. Grust, M. van Keulen, and J. Teubner. Staircase join: Teach a relational DBMS to watch its (axis) steps. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 524–525, 2003.

[83] T. Grust, M. van Keulen, and J. Teubner. Accelerating XPath evaluation in any RDBMS. ACM Trans. on Database Systems, 29:91–131, 2004.

[84] I. Gupta, T. D. Chandra, and G. S. Goldszmidt. On Scalable and Efficient Distributed Failure Detectors. In Proc. ACM Intl. Symposium on Principles of Distributed Computing (PODC), 2001.

[85] Z. Gyöngyi, H. Garcia-Molina, and J. O. Pedersen. Combating Web Spam with TrustRank. In Proc. Intl. Conf. on Very Large Databases (VLDB), 2004.

[86] A. Halevy, Z. Ives, D.Suciu, and I. Tatarinov. Schema Mediation for Large-Scale Semantic Data Sharing. Very Large Databases Journal (VLDBJ), 14(1):68–83, 2005.

[87] A. Y. Halevy. Answering queries using views: A survey. Very Large Databases Journal (VLDBJ), 10(4):270–294, 2001.

[88] A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In Proc. Intl. Conf. on Data Engineering (ICDE), 2003.

[89] E. R. Harold. Effective XML. Addison-Wesley, 2003.

[90] S. Heinz and J. Zobel. Efficient single-pass index construction for text databases. Journal of the American Society for Information Science and Technology (JASIST), 54(8):713–729, 2003.

[91] J. Hopcroft, R. Motwani, and J. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 2006.

[92] H. Hosoya and B. C. Pierce. Xduce: A statically typed xml processing language. ACM Trans. Internet Techn., 3(2):117–148, 2003.

[93] H. Hosoya, J. Vouillon, and B. C. Pierce. Regular expression types for xml. ACM Trans. Program. Lang. Syst., 27(1):46–90, 2005.

[94] IETF. Request For Comments 1034. Domain names—concepts and facilities. http://www.ietf.org/rfc/rfc1034.txt, June 1999.

[95] IETF. Request For Comments 2616. Hypertext transfer protocol—HTTP/1.1. http://www.ietf.org/rfc/rfc2616.txt, June 1999.

[96] ISO. Specification of astraction syntax notation one (asn.1), 1987. Standard 8824, Information Processing System.

[97] ISO. ISO/IEC 19757-2: Document Schema Definition Language (DSDL). Part 2: Regular-grammar-based validation. RELAX NG. International Standards Organization, 2008.

[98] ISO. ISO/IEC 19757-3: Document Schema Definition Language (DSDL). Part 3: Rule-based validation. Schematron. International Standards Organization, 2008.

[99] ISO/IEC 9075-14:2003, Information technology – Database languages – SQL – Part 14: XML-Related Specifications (SQL/XML), 2003.

[100] P. Jaccard. Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 1901.

[101] H. V. Jagadish, B. C. Ooi, K.-L. Tan, Q. H. Vu, and R. Zhang. Speeding up search in peer-to-peer networks with a multi-way tree structure. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 1–12, 2006.

[102] H. V. Jagadish, B. C. Ooi, and Q. H. Vu. BATON: A Balanced Tree Structure for Peer-to-Peer Networks. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 661–672, 2005.

[103] H. V. Jagadish, B. C. Ooi, Q. H. Vu, R. Zhang, and A. Zhou. VBI-Tree: A Peer-to-Peer Framework for Supporting Multi-Dimensional Indexing Schemes. In Proc. Intl. Conf. on Data Engineering (ICDE), 2006.

[104] Jena - a semantic web framework for java. http://jena.sourceforge.net/.

[105] H. Jiang, H. Lu, W. Wang, and J. X. Yu. XParent: An efficient RDBMS-based XML database system. In Proc. Intl. Conf. on Data Engineering (ICDE), pages 335–336, 2002.

[106] H. Kaplan, T. Milo, and R. Shabo. Compact labeling scheme for XML ancestor queries. Theory Comput. Syst., 40(1):55–99, 2007.

[107] D. R. Karger, E. Lehman, F. T. Leighton, R. Panigrahy, M. S. Levine, and D. Lewin. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In Proc. ACM SIGACT Symp. on the Theory of Computing (STOC), pages 654–663, 1997.

[108] M. Kay. XSLT 2.0 and XPath 2.0 Programmer’s Reference. Wrox, fourth edition, May 2008.

[109] J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46(5):604–632, 1999.

[110] M. Koster. A standard for robot exclusion. http://www.robotstxt.org/orig.html, June 1994.

[111] B. Kröll and P. Widmayer. Distributing a Search Tree Among a Growing Number of Processors. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 265–276, 1994.

[112] P.-Å. Larson. Dynamic hash tables. Commun. ACM, 31(4):446–457, 1988.

[113] A. Y. Levy, A. Rajaraman, and J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. Intl. Conf. on Very Large Databases (VLDB), 1996.

[114] W. Litwin. Linear Hashing, a new tool for file and table addressing. In Proc. Intl. Conf. on Very Large Databases (VLDB), 1980.

[115] W. Litwin, M.-A. Neimat, and D. Schneider. RP^*: A Family of Order-Preserving Scalable Distributed Data Structures. In Proc. Intl. Conf. on Very Large Databases (VLDB), 1994.

[116] W. Litwin, M.-A. Neimat, and D. A. Schneider. LH* - A Scalable, Distributed Data Structure. ACM Trans. Database Syst., 21(4):480–525, 1996.

[117] B. Liu, R. L. Grossman, and Y. Zhai. Mining Web Pages for Data Records. IEEE Intelligent Systems, 19(6):49–55, 2004.

[118] J. Lu, T. W. Ling, C. Y. Chan, and T. Chen. From region encoding to extended Dewey: On efficient processing of XML twig pattern matching. In Proc. Intl. Conf. on Very Large Databases (VLDB), 2005.

[119] J. Madhavan, A. Y. Halevy, S. Cohen, X. Dong, S. R. Jeffery, D. Ko, and C. Yu. Structured Data Meets the Web: A Few Observations. IEEE Data Engineering Bulletin, 29(4):19–26, Dec. 2006.

[120] C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Online version at http://informationretrieval.org/.

[121] J. Melton and S. Buxton. Querying XML: XQuery, XPath, and SQL/XML in context. Morgan Kaufmann, Mar. 2006.

[122] M. Michael, J. Moreira, D. Shiloach, and R. Wisniewski. Scale-up x Scale-out: A Case Study using Nutch/Lucene. In Proc. Intl. Parallel Processing Symposium (IPPS), 2007.

[123] P. Michiels, I. Manolescu, and C. Miachon. Toward microbenchmarking XQuery. Inf. Systems, 33(2):182–202, 2008.

[124] T. D. Millstein, A. Y. Halevy, and M. Friedman. Query containment for data integration systems. Journal of Computer and System Sciences, 66(1):20–39, 2003.

[125] T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. Journal of Computer and System Sciences, 66(1):66–97, 2003.

[126] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2), 2004.

[127] OASIS. RELAX NG specification. http://www.relaxng.org/spec-20011203.html, Dec. 2001.

[128] OASIS. RELAX NG compact syntax. http://www.relaxng.org/compact-20021121.html, Nov. 2002.

[129] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 1099–1110, 2008.

[130] P. E. O’Neil, E. J. O’Neil, S. Pal, I. Cseri, G. Schaller, and N. Westbury. ORDPATHs: Insert-friendly XML node labels. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 903–908, 2004.

[131] M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems, Third Edition. Prentice-Hall, 2010.

[132] Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proc. Intl. Conf. on Data Engineering (ICDE), 1995.

[133] A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), pages 165–178, 2009.

[134] P.Buneman, S. Davidson, and D. Suciu. Programming constructs for unstructured data. In Proc. Intl. Workshop on Database Programming Languages (DBLP), 1995.

[135] R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the Data: Parallel Analysis with Sawzall. Scientific Programming Journal, Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure, 13(4):227–298, 2005.

[136] M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, July 1980.

[137] R. Pottinger and A. Y. Halevy. Minicon: A scalable algorithm for answering queries using views. Very Large Databases Journal (VLDBJ), 10(2-3):182–198, 2001.

[138] Racerpro. http://www.racer-systems.com/.

[139] V. Ramasubramanian and E. G. Sirer. Beehive: O(1) Lookup Performance for Power-Law Query Distributions in Peer-to-Peer Overlays. In Intl. Symposium on Networked Systems Design and Implementation (NSDI), pages 99–112, 2004.

[140] S. Ratnasamy, P. Francis, M. Handley, R. M. Karp, and S. Shenker. A scalable content-addressable network. In ACM-SIGCOMM, pages 161–172, 2001.

[141] A. I. T. Rowstron and P. Druschel. Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems. In Middleware 2001, IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, volume 2218 of Lecture Notes in Computer Science, pages 329–350. Springer, 2001.

[142] Y. Saito and M. Shapiro. Optimistic replication. ACM Computing Surveys, 37(1):42–81, 2005.

[143] S. E. Schaeffer. Graph clustering. Computer Science Review, 1(1):27–64, 2007.

[144] F. Scholer, H. E. Williams, J. Yiannis, and J. Zobel. Compression of inverted indexes for fast query evaluation. In Proc. ACM Symp. on Information Retrieval, pages 222–229, 2002.

[145] P. Senellart, A. Mittal, D. Muschick, R. Gilleron, and M. Tommasi. Automatic Wrapper Induction from Hidden-Web Sources with Domain Knowledge. In Proc. Intl. Workshop on Web Information and Data Management (WIDM), pages 9–16, Oct. 2008.

[146] J. Shanmugasundaram, E. J. Shekita, R. Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald. Efficiently publishing relational data as XML documents. In Proc. Intl. Conf. on Very Large Databases (VLDB), pages 65–76, 2000.

[147] J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In Proc. Intl. Conf. on Very Large Databases (VLDB), 1999.

[148] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2):51–53, 2007.

[149] sitemaps.org. Sitemaps XML format. http://www.sitemaps.org/protocol.php, Feb. 2008.

[150] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11(1):17–32, 2003.

[151] M. Stonebraker, D. J. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. MAPREDUCE and parallel DBMSs: friends or foes? Commun. ACM, 53(1):64–71, 2010.

[152] D. Suciu. The XML Typechecking Problem. SIGMOD Record, 31(1):89–96, 2002.

[153] A. S. Tanenbaum and M. van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, 2001.

[154] B. ten Cate and M. Marx. Navigational XPath: calculus and algebra. SIGMOD Record, 36(2):19–26, 2007.

[155] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive - A Warehousing Solution Over a Map-Reduce Framework. Proceedings of the VLDB Endowment (PVLDB), 2(2):1626–1629, 2009.

[156] J. Ullman. Principles of Database and Knowledge Base Systems, Volume I. Computer Science Press, 1988.

[157] US National Archives and Records Administration. The Soundex indexing system. http://www.archives.gov/genealogy/census/soundex.html, May 2007.

[158] S. M. van Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000.

[159] M. Vardi. The Complexity of Relational Query Languages. In Proc. ACM SIGACT Symp. on the Theory of Computing (STOC), pages 137–146, 1982.

[160] J. S. Vitter. External memory algorithms and data structures. ACM Computing Surveys, 33(2):209–271, 2001.

[161] World wide web consortium. http://www.w3.org/.

[162] W3C. HTML 4.01 specification, Sept. 1999. http://www.w3.org/TR/REC-html40/.

[163] W3C. XML path language (XPath). http://www.w3.org/TR/xpath/, Nov. 1999.

[164] W3C. XHTML 1.0: The extensible hypertext markup language (second edition). http://www.w3.org/TR/xhtml1/, Aug. 2002.

[165] W3C. XML Schema Part 0: Primer. http://www.w3.org/TR/xmlschema-0/, Oct. 2004.

[166] W3C. XML Schema Part 1: Structures. http://www.w3.org/TR/xmlschema-1/, Oct. 2004.

[167] W3C. XML Schema Part 2: Datatypes. http://www.w3.org/TR/xmlschema-2/, Oct. 2004.

[168] W3C. XML path language (XPath) 2.0. http://www.w3.org/TR/xpath20/, Jan. 2007.

[169] W3C. XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/, Jan. 2007.

[170] W3C. XQuery 1.0 and XPath 2.0 data model (XDM). http://www.w3.org/TR/xpath-datamodel/, Jan. 2007.

[171] W3C. XQuery 1.0 and XPath 2.0 formal semantics. http://www.w3.org/TR/xquery-semantics/, Jan. 2007.

[172] W3C. XQuery 1.0 and XPath 2.0 functions and operators. http://www.w3.org/TR/xquery-operators/, Jan. 2007.

[173] W3C. XSLT 2.0 and XQuery 1.0 serialization. http://www.w3.org/TR/xslt-xquery-serialization/, Jan. 2007.

[174] W3C. Extensible markup language (XML) 1.0. http://www.w3.org/TR/REC-xml/, Nov. 2008.

[175] W3C. SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/, Jan. 2008.

[176] W3C. Owl 2 web ontology language profiles. http://www.w3.org/2004/OWL/, 2009.

[177] W3C. HTML5, 2010. Working draft available at http://dev.w3.org/html5/spec/Overview.html.

[178] P. Walmsley. XQuery. O’Reilly, Mar. 2007.

[179] I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan-Kaufmann, 1999.

[180] X. Wu, M. L. Lee, and W. Hsu. A prime number labeling scheme for dynamic ordered XML trees. In Proc. Intl. Conf. on Data Engineering (ICDE), 2004.

[181] Y. Wu, J. M. Patel, and H. V. Jagadish. Structural join order selection for XML query optimization. In Proc. Intl. Conf. on Data Engineering (ICDE), pages 443–454, 2003.

[182] XML Query (XQuery). http://www.w3.org/XML/Query.

[183] The Extensible Stylesheet Language Family. http://www.w3.org/Style/XSL.

[184] L. Xu, T. W. Ling, H. Wu, and Z. Bao. DDE: from Dewey to a fully dynamic XML labeling scheme. In Proc. ACM Intl. Conf. on the Management of Data (SIGMOD), 2009.

[185] M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura. XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans. on Internet Technology, 1(1):110–141, 2001.

[186] H. Yu and A. Vahdat. Design and evaluation of a continuous consistency model for replicated services. ACM Trans. Comput. Syst., 20(3):239–282, 2002.

[187] Y. Zhai and B. Liu. Web data extraction based on partial tree alignment. In Proc. Intl. World Wide Web Conference (WWW), 2005.

[188] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. Kubiatowicz. Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications, 22(1):41–53, 2004.

[189] J. Zobel and A. Moffat. Inverted Files for Text Search Engines. ACM Computing Surveys, 38(2), 2006.