nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

DAFDeTr: Deformable Attention Fusion Based 3D Detection Transformer

verfasst von : Gopi Krishna Erabati, Helder Araujo

Erschienen in: Robotics, Computer Vision and Intelligent Systems

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Existing approaches fuse the LiDAR points and image pixels by hard association relying on highly accurate calibration matrices. We propose Deformable Attention Fusion based 3D Detection Transformer (DAFDeTr) to attentively and adaptively fuse the image features to the LiDAR features with soft association using deformable attention mechanism. Specifically, our detection head consists of two decoders for sequential fusion: LiDAR and image decoder powered by deformable cross-attention to link the multi-modal features to the 3D object predictions leveraging a sparse set of object queries. The refined object queries from the LiDAR decoder attentively fuse with the corresponding and required image features establishing a soft association, thereby making our model robust for any camera malfunction. We conduct extensive experiments and analysis on nuScenes and Waymo datasets. Our DAFDeTr-L achieves 63.4 mAP and outperforms well established networks on the nuScenes dataset and obtains competitive performance on the Waymo dataset. Our fusion model DAFDeTr achieves 64.6 mAP on the nuScenes dataset. We also extend our model to the 3D tracking task and our model outperforms state-of-the-art methods on 3D tracking.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Optimizing Mobile Robot Navigation Through Neuro-Symbolic Fusion of Deep Deterministic Policy Gradient (DDPG) and Fuzzy Logic

Nächstes Kapitel MDC-Net: Multimodal Detection and Captioning Network for Steel Surface Defects

Bai, X., et al.: Transfusion: robust lidar-camera fusion for 3D object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1090–1099 (2022)

Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13CrossRef

Chen, Q., Sun, L., Cheung, E., Yuille, A.L.: Every view counts: cross-view consistency in 3d object detection with hybrid-cylindrical-spherical voxelization. Adv. Neural. Inf. Process. Syst. 33, 21224–21235 (2020)

Chen, Q., Sun, L., Wang, Z., Jia, K., Yuille, A.: Object as hotspots: an anchor-free 3D Object detection approach via firing of hotspots. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 68–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_5CrossRef

Chen, Q., Vora, S., Beijbom, O.: Polarstream: streaming object detection and segmentation with polar pillars. Adv. Neural. Inf. Process. Syst. 34, 26871–26883 (2021)

Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907–1915 (2017)

Chiu, H.k., Prioletti, A., Li, J., Bohg, J.: Probabilistic 3d multi-object tracking for autonomous driving. arXiv preprint arXiv:2001.05673 (2020)

Contributors, M.: MMDetection3D: OpenMMLab next-generation platform for general 3D object detection (2020). https://github.com/open-mmlab/mmdetection3d

10.

Deng, S., Liang, Z., Sun, L., Jia, K.: Vista: boosting 3d object detection via dual cross-view spatial attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8448–8457 (2022)

11.

Erabati, G.K., Araujo, H.: Li3detr: a lidar based 3d detection transformer. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 4250–4259 (2023)

12.

Fan, L., Xiong, X., Wang, F., Wang, N., Zhang, Z.: RangeDet: in defense of range view for lidar-based 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2918–2927 (2021)

13.

Graham, B., Engelcke, M., Van Der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9224–9232 (2018)

14.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

15.

Kim, A., Ošep, A., Leal-Taixé, L.: Eagermot: 3d multi-object tracking via sensor fusion. In: 2021 IEEE International conference on Robotics and Automation (ICRA), pp. 11315–11321. IEEE (2021)

16.

Ku, J., Mozifian, M., Lee, J., Harakeh, A., Waslander, S.L.: Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1–8. IEEE (2018)

17.

Kuhn, H.W.: The hungarian method for the assignment problem. Nav. Res. Logist. q. 2(1–2), 83–97 (1955)MathSciNetCrossRef

18.

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O.: Pointpillars: fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12697–12705 (2019)

19.

Lee, Y., Park, J.: Centermask: real-time anchor-free instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13906–13915 (2020)

20.

Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

21.

Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2CrossRef

22.

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

23.

Mao, J., et al.: Voxel transformer for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3164–3173 (2021)

24.

Misra, I., Girdhar, R., Joulin, A.: An end-to-end transformer model for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2906–2917 (2021)

25.

Ngiam, J., et al.: Starnet: targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069 (2019)

26.

Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

27.

Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointnets for 3d object detection from RGB-D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 918–927 (2018)

28.

Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

29.

Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30 (2017)

30.

Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

31.

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

32.

Shi, S., Wang, X., Li, H.: Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–779 (2019)

33.

Shin, K., Kwon, Y.P., Tomizuka, M.: Roarnet: a robust 3d object detection based on region approximation refinement. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 2510–2515. IEEE (2019)

34.

Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)

35.

Sun, P., et al.: RSN: range sparse net for efficient, accurate lidar 3D object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5725–5734 (2021)

36.

Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636 (2019)

37.

Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

38.

Vora, S., Lang, A.H., Helou, B., Beijbom, O.: Pointpainting: sequential fusion for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4604–4612 (2020)

39.

Wang, C., Ma, C., Zhu, M., Yang, X.: Pointaugmenting: cross-modal augmentation for 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11794–11803 (2021)

40.

Wang, Y., et al.: Pillar-based object detection for autonomous driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 18–34. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_2CrossRef

41.

Wang, Y., Guizilini, V.C., Zhang, T., Wang, Y., Zhao, H., Solomon, J.: Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In: Conference on Robot Learning, pp. 180–191. PMLR (2022)

42.

Wang, Y., Solomon, J.M.: Object DGCNN: 3d object detection using dynamic graphs. Adv. Neural Inf. Process. Syst. 34 (2021)

43.

Yan, Y., Mao, Y., Li, B.: Second: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)CrossRef

44.

Yang, Z., Sun, Y., Liu, S., Jia, J.: 3DSSD: point-based 3D single stage object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11040–11048 (2020)

45.

Yin, T., Zhou, X., Krahenbuhl, P.: Center-based 3d object detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11784–11793 (2021)

46.

Yin, T., Zhou, X., Krähenbühl, P.: Multimodal virtual point 3d detection. Adv. Neural Inf. Process. Syst. 34 (2021)

47.

Yoo, J.H., Kim, Y., Kim, J., Choi, J.W.: 3D-CVF: generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 720–736. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_43CrossRef

48.

Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4490–4499 (2018)

49.

Zhu, B., Jiang, Z., Zhou, X., Li, Z., Yu, G.: Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492 (2019)

50.

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Titel: DAFDeTr: Deformable Attention Fusion Based 3D Detection Transformer
verfasst von: Gopi Krishna Erabati
Helder Araujo
Verlag: Springer Nature Switzerland
Buch: Robotics, Computer Vision and Intelligent Systems
Print ISBN: 978-3-031-59056-6

Electronic ISBN: 978-3-031-59057-3

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-59057-3_19

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner