Skip to main content

2023 | OriginalPaper | Buchkapitel

3. How Search Engines Capture and Process Content from the Web

verfasst von : Dirk Lewandowski

Erschienen in: Understanding Search Engines

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter describes the technical basis of search engines. This basis includes how the documents available on the Web are brought into the search engine and how they are made searchable, as well as how the link between a search query and the documents in the database is established. Details on the workings of the crawler, the indexer, and the searcher are given.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search. Addison Wesley. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search. Addison Wesley.
Zurück zum Zitat Bharat, K., & Broder, A. (1998). A technique for measuring the relative size and overlap of public Web search engines. Computer Networks and ISDN Systems, 30(1–7), 379–388.CrossRef Bharat, K., & Broder, A. (1998). A technique for measuring the relative size and overlap of public Web search engines. Computer Networks and ISDN Systems, 30(1–7), 379–388.CrossRef
Zurück zum Zitat Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the web. Computer Networks, 33(1–6), 309–320.CrossRef Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the web. Computer Networks, 33(1–6), 309–320.CrossRef
Zurück zum Zitat Chang, Y., & Deng, H. (Eds.). (2020). Query understanding for search engines. Springer. Chang, Y., & Deng, H. (Eds.). (2020). Query understanding for search engines. Springer.
Zurück zum Zitat Croft, W. B., Metzler, D., & Strohman, T. (2009). Search engines: Information retrieval in practice. Pearson. Croft, W. B., Metzler, D., & Strohman, T. (2009). Search engines: Information retrieval in practice. Pearson.
Zurück zum Zitat Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference (pp. 4171–4186). Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference (pp. 4171–4186).
Zurück zum Zitat Gulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. In 14th International Conference on World Wide Web (pp. 902–903). ACM. Gulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. In 14th International Conference on World Wide Web (pp. 902–903). ACM.
Zurück zum Zitat Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(8), 107–109.CrossRef Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(8), 107–109.CrossRef
Zurück zum Zitat Levene, M. (2010). An introduction to search engines and web navigation. Wiley.CrossRef Levene, M. (2010). An introduction to search engines and web navigation. Wiley.CrossRef
Zurück zum Zitat Lewandowski, D. (2011). Query understanding. In D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen 2: Neue Entwicklungen in der Web-Suche (pp. 55–75). Akademische Verlagsgesellschaft AKA. Lewandowski, D. (2011). Query understanding. In D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen 2: Neue Entwicklungen in der Web-Suche (pp. 55–75). Akademische Verlagsgesellschaft AKA.
Zurück zum Zitat Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.CrossRefMATH Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.CrossRefMATH
Zurück zum Zitat Ntoulas, A., Cho, J., & Olston, C. (2004). What’s new on the web?: The evolution of the web from a search engine perspective. In Proceedings of the 13th international conference on World Wide Web (pp. 1–12). ACM. Ntoulas, A., Cho, J., & Olston, C. (2004). What’s new on the web?: The evolution of the web from a search engine perspective. In Proceedings of the 13th international conference on World Wide Web (pp. 1–12). ACM.
Zurück zum Zitat Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.CrossRef Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.CrossRef
Zurück zum Zitat Tyagi, V. (2017). Content-based image retrieval: Ideas, influences, and current trends. Springer.CrossRefMATH Tyagi, V. (2017). Content-based image retrieval: Ideas, influences, and current trends. Springer.CrossRefMATH
Zurück zum Zitat Vaidhyanathan, S. (2011). The Googlization of everything (and why we should worry). University of California Press.CrossRef Vaidhyanathan, S. (2011). The Googlization of everything (and why we should worry). University of California Press.CrossRef
Zurück zum Zitat Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: Evidence and possible causes. Information Processing & Management, 40, 693–707.CrossRef Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: Evidence and possible causes. Information Processing & Management, 40, 693–707.CrossRef
Zurück zum Zitat Vaughan, L., & Zhang, Y. (2007). Equal representation by search engines? A comparison of websites across countries and domains. Journal of Computer-Mediated Communication, 12, 888–909.CrossRef Vaughan, L., & Zhang, Y. (2007). Equal representation by search engines? A comparison of websites across countries and domains. Journal of Computer-Mediated Communication, 12, 888–909.CrossRef
Metadaten
Titel
How Search Engines Capture and Process Content from the Web
verfasst von
Dirk Lewandowski
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-22789-9_3

Neuer Inhalt