Skip to main content

2024 | OriginalPaper | Buchkapitel

Generating 3D Reconstructions Using Generative Models

verfasst von : Mehdi Malah, Ramzi Agaba, Fayçal Abbas

Erschienen in: Applications of Generative AI

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the capacity for visual representation continues to evolve, there is a growing need for techniques for realistic and efficient creation of three-dimensional objects. Generative models, particularly Generative Adversarial Networks, Variational Autoencoders and novel methods of Text-to-3D, utilize textual descriptions to generate 3D reconstructions with high-quality geometry and disentangled materials. In this chapter, we present an in-depth exploration of the application of generative models in 3D reconstruction. We begin by discussing the theoretical underpinnings of these models and their applicability to 3D reconstruction. This chapter studies how these models learn to generate new instances from a given distribution. We end with discussions of potential future directions and the broader impacts of these technologies in various industries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006, June). A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) (Vol. 1, pp. 519–528). IEEE. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006, June). A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) (Vol. 1, pp. 519–528). IEEE.
2.
Zurück zum Zitat Ullman, S. (1979). The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences, 203(1153), 405–426. Ullman, S. (1979). The interpretation of structure from motion. Proceedings of the Royal Society of London. Series B. Biological Sciences, 203(1153), 405–426.
3.
Zurück zum Zitat Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (Vol. 27). Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems (Vol. 27).
4.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:1406.2661. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial networks. arXiv preprint arXiv:​1406.​2661.
6.
Zurück zum Zitat Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401–4410). Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401–4410).
7.
Zurück zum Zitat Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems (Vol. 29). Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems (Vol. 29).
8.
Zurück zum Zitat Gadelha, M., Maji, S., Wang, R. (2017). 3D shape induction from 2D views of multiple objects. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4031–4041). Gadelha, M., Maji, S., Wang, R. (2017). 3D shape induction from 2D views of multiple objects. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4031–4041).
9.
Zurück zum Zitat Han, X., Li, Z., Huang, H., Kalogerakis, E., & Yu, Y. (2017). High-resolution shape completion using deep neural networks for global structure and local geometry inference. In Proceedings of the IEEE International Conference on Computer Vision (pp. 85–93). Han, X., Li, Z., Huang, H., Kalogerakis, E., & Yu, Y. (2017). High-resolution shape completion using deep neural networks for global structure and local geometry inference. In Proceedings of the IEEE International Conference on Computer Vision (pp. 85–93).
10.
Zurück zum Zitat Wen, X., Xiang, P., Han, Z., Cao, Y. P., Wan, P., Zheng, W., & Liu, Y. S. (2022). PMP-Net++: Point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 852–867.CrossRef Wen, X., Xiang, P., Han, Z., Cao, Y. P., Wan, P., Zheng, W., & Liu, Y. S. (2022). PMP-Net++: Point cloud completion by transformer-enhanced multi-step point moving paths. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 852–867.CrossRef
11.
Zurück zum Zitat Sauer, A., Karras, T., Laine, S., Geiger, A., |7 Aila, T. (2023). Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis. arXiv preprint arXiv:2301.09515. Sauer, A., Karras, T., Laine, S., Geiger, A., |7 Aila, T. (2023). Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis. arXiv preprint arXiv:​2301.​09515.
12.
Zurück zum Zitat Tao, M., Bao, B. K., Tang, H., & Xu, C. (2023). GALIP: Generative adversarial CLIPs for text-to-image synthesis. arXiv preprint arXiv:2301.12959. Tao, M., Bao, B. K., Tang, H., & Xu, C. (2023). GALIP: Generative adversarial CLIPs for text-to-image synthesis. arXiv preprint arXiv:​2301.​12959.
13.
Zurück zum Zitat Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y. G. (2018). Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV) (pp. 52–67). Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y. G. (2018). Pixel2mesh: Generating 3d mesh models from single rgb images. In Proceedings of the European conference on computer vision (ECCV) (pp. 52–67).
14.
Zurück zum Zitat Gkioxari, G., Malik, J., & Johnson, J. (2019). Mesh r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9785–9795). Gkioxari, G., Malik, J., & Johnson, J. (2019). Mesh r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9785–9795).
15.
Zurück zum Zitat Malah, M., Hemam, M., & Abbas, F. (2023). 3D face reconstruction from single image with generative adversarial networks. Journal of King Saud University-Computer and Information Sciences, 35(1), 250–256.CrossRef Malah, M., Hemam, M., & Abbas, F. (2023). 3D face reconstruction from single image with generative adversarial networks. Journal of King Saud University-Computer and Information Sciences, 35(1), 250–256.CrossRef
16.
Zurück zum Zitat Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., & Ranzuglia, G. (2008, July). Meshlab: an open-source mesh processing tool. In Eurographics Italian Chapter Conference (Vol. 2008, pp. 129–136). Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., & Ranzuglia, G. (2008, July). Meshlab: an open-source mesh processing tool. In Eurographics Italian Chapter Conference (Vol. 2008, pp. 129–136).
17.
Zurück zum Zitat Pintelas, E., & Pintelas, P. (2022). A 3D-CAE-CNN model for deep representation learning of 3D images. Engineering Applications of Artificial Intelligence, 113, 104978.CrossRef Pintelas, E., & Pintelas, P. (2022). A 3D-CAE-CNN model for deep representation learning of 3D images. Engineering Applications of Artificial Intelligence, 113, 104978.CrossRef
18.
Zurück zum Zitat Yang, Y., Feng, C., Shen, Y., | & Tian, D. (2018). Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 206–215). Yang, Y., Feng, C., Shen, Y., | & Tian, D. (2018). Foldingnet: Point cloud auto-encoder via deep grid deformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 206–215).
19.
Zurück zum Zitat Dai, A., Ruizhongtai Qi, C., & Nießner, M. (2017). Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5868–5877). Dai, A., Ruizhongtai Qi, C., & Nießner, M. (2017). Shape completion using 3d-encoder-predictor cnns and shape synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5868–5877).
20.
Zurück zum Zitat Rezende, D. J., Eslami, S. M., Mohamed, S., Battaglia, P., Jaderberg, M., & Heess, N. (2016). Unsupervised Learning of 3D Structure from Images. In Advances in neural information processing systems (pp. 4997–5005). Rezende, D. J., Eslami, S. M., Mohamed, S., Battaglia, P., Jaderberg, M., & Heess, N. (2016). Unsupervised Learning of 3D Structure from Images. In Advances in neural information processing systems (pp. 4997–5005).
21.
Zurück zum Zitat Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). AtlasNet: A Papier-Mâché approach to learning 3D surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 216–224). Groueix, T., Fisher, M., Kim, V. G., Russell, B. C., & Aubry, M. (2018). AtlasNet: A Papier-Mâché approach to learning 3D surface generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 216–224).
22.
Zurück zum Zitat Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A. (2019). Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4460–4470). Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A. (2019). Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4460–4470).
23.
Zurück zum Zitat Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S. C., & Wu, Y. N. (2019). Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In Proceedings of the IEEE International Conference on Computer Vision. Retrieved from https://arxiv.org/abs/1901.11153 Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S. C., & Wu, Y. N. (2019). Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In Proceedings of the IEEE International Conference on Computer Vision. Retrieved from https://​arxiv.​org/​abs/​1901.​11153
24.
Zurück zum Zitat Liu, Q., Zhou, H., Xu, Q., Liu, X., & Wang, Y. (2020). PSGAN: A generative adversarial network for remote sensing image pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing, 59(12), 10227–10242.CrossRef Liu, Q., Zhou, H., Xu, Q., Liu, X., & Wang, Y. (2020). PSGAN: A generative adversarial network for remote sensing image pan-sharpening. IEEE Transactions on Geoscience and Remote Sensing, 59(12), 10227–10242.CrossRef
25.
Zurück zum Zitat Coyne, B., & Sproat, R. (2001, August). WordsEye: An automatic text-to-scene conversion system. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (pp. 487–496). Coyne, B., & Sproat, R. (2001, August). WordsEye: An automatic text-to-scene conversion system. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (pp. 487–496).
26.
Zurück zum Zitat Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988. Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2022). Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:​2209.​14988.
27.
Zurück zum Zitat Chen, R., Chen, Y., Jiao, N., & Jia, K. (2023). Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873. Chen, R., Chen, Y., Jiao, N., & Jia, K. (2023). Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:​2303.​13873.
28.
Zurück zum Zitat Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2463–2471). Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3D object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2463–2471).
29.
Zurück zum Zitat Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.CrossRef Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.CrossRef
Metadaten
Titel
Generating 3D Reconstructions Using Generative Models
verfasst von
Mehdi Malah
Ramzi Agaba
Fayçal Abbas
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-46238-2_20

Premium Partner