Skip to main content

2024 | OriginalPaper | Buchkapitel

UCoD: Ensemble BERT for Hierarchical Classification of the Urdu Disinformation Corpus

verfasst von : Umar Farooq, Omer Beg, Faisal Riaz, Saeid Jamali, William Holderbaum, Umar Raza

Erschienen in: The Second International Adaptive and Sustainable Science, Engineering and Technology Conference

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Online disinformation poses a growing threat, requiring fact-checking and detection/prevention measures. To address this, we propose a hierarchical classification approach using the DistilBERT and XLM-RoBERTa ensemble architectures on the Urdu Corpus of Disinformation (UCoD). Our ensemble outperforms other models like RNNs, LSTMs, k-nearest neighbors, random forests, and quadratic discriminant analysis, achieving a weighted F1 of 68.7 on UCoD. These results confirm the advantage of ensembles for imbalanced corpora, supporting the use of deep learning techniques in combating disinformation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal, N., Balasubramanian, V.N., Jawahar, C.: Improving multi-class classification by deep networks using dagsvm and triplet loss. Pattern Recogn. Lett. 112, 184–190 (2018)CrossRef Agarwal, N., Balasubramanian, V.N., Jawahar, C.: Improving multi-class classification by deep networks using dagsvm and triplet loss. Pattern Recogn. Lett. 112, 184–190 (2018)CrossRef
2.
Zurück zum Zitat Amjad, M., Sidorov, G., Zhila, A.: Naive Bayes data augmentation using machine translation for fake news detection in the Urdu language. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2537–2542. European Language Resources Association, Marseille (2020) Amjad, M., Sidorov, G., Zhila, A.: Naive Bayes data augmentation using machine translation for fake news detection in the Urdu language. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 2537–2542. European Language Resources Association, Marseille (2020)
3.
Zurück zum Zitat Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)CrossRef Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)CrossRef
4.
Zurück zum Zitat Nelson, T., Nicole K., Claire C., Alan H., Albert H.: The danger of misinformation in the COVID-19 crisis. Missouri Medicine 117(6), 510 (2020) Nelson, T., Nicole K., Claire C., Alan H., Albert H.: The danger of misinformation in the COVID-19 crisis. Missouri Medicine 117(6), 510 (2020)
5.
Zurück zum Zitat Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)CrossRef Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–236 (2017)CrossRef
6.
Zurück zum Zitat Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., et al.: The science of fake news. Science. 359(6380), 1094–1096 (2018)CrossRef Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., et al.: The science of fake news. Science. 359(6380), 1094–1096 (2018)CrossRef
7.
Zurück zum Zitat Da San Martino, G., Barrón-Cedeno, A., Rosso, P.: Automatic detection and classification of propaganda techniques in news articles. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong (2019) Da San Martino, G., Barrón-Cedeno, A., Rosso, P.: Automatic detection and classification of propaganda techniques in news articles. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong (2019)
8.
Zurück zum Zitat Mustafa, R.U., Nawaz, M.S., Lali, M., Shahzad, B.: Early detection of controversial urdu speeches from social media. Data Sci. Pattern Recognit. 1(2), 26–42 (2017) Mustafa, R.U., Nawaz, M.S., Lali, M., Shahzad, B.: Early detection of controversial urdu speeches from social media. Data Sci. Pattern Recognit. 1(2), 26–42 (2017)
9.
Zurück zum Zitat Barrón-Cedeño, Da San Martino, Jaradat, and Nakov, Proppy: Organizing News Coverage on the Basis of Their Propagandistic Content, Information Processing and Management, 2019 Barrón-Cedeño, Da San Martino, Jaradat, and Nakov, Proppy: Organizing News Coverage on the Basis of Their Propagandistic Content, Information Processing and Management, 2019
10.
Zurück zum Zitat Conroy, N.K., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 52, 1–4 (2015)CrossRef Conroy, N.K., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 52, 1–4 (2015)CrossRef
11.
Zurück zum Zitat Torok, R.: Symbiotic radicalisation strategies: Propaganda tools and neuro linguistic programming (2015) Torok, R.: Symbiotic radicalisation strategies: Propaganda tools and neuro linguistic programming (2015)
12.
Zurück zum Zitat Barrón-Cedeno, A., Giovanni Da San M., Israa J., Preslav N.: Proppy: A system to unmask propaganda in online news. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9847–9848 (2019) Barrón-Cedeno, A., Giovanni Da San M., Israa J., Preslav N.: Proppy: A system to unmask propaganda in online news. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 9847–9848 (2019)
13.
Zurück zum Zitat Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y.: Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937. Association for Computational Linguistics, Copenhagen (2017) Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y.: Truth of varying shades: analyzing language in fake news and political fact-checking. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2931–2937. Association for Computational Linguistics, Copenhagen (2017)
14.
Zurück zum Zitat Habernal, I., Hannemann, R., Pollak, C., Klamm, C., Pauli, P., Gurevych, I.: Argotario: computational argumentation meets serious games. arXiv preprint arXiv:1700.06002 2017 Habernal, I., Hannemann, R., Pollak, C., Klamm, C., Pauli, P., Gurevych, I.: Argotario: computational argumentation meets serious games. arXiv preprint arXiv:1700.06002 2017
15.
Zurück zum Zitat Wang, W.Y.: “Liar, liar pants on fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 422–426. Association for Computational Linguistics, Vancouver, Canada (2017)CrossRef Wang, W.Y.: “Liar, liar pants on fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 422–426. Association for Computational Linguistics, Vancouver, Canada (2017)CrossRef
16.
Zurück zum Zitat Nelson, J.L., Taneja, H.: The small, disloyal fake news audience: the role of audience availability in fake news consumption. New Media Soc. 20, 3720–3737 (2018)CrossRef Nelson, J.L., Taneja, H.: The small, disloyal fake news audience: the role of audience availability in fake news consumption. New Media Soc. 20, 3720–3737 (2018)CrossRef
17.
Zurück zum Zitat Rubin, V.L., Chen, Y., Conroy, N.K.: Deception detection for news: three types of fakes. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)CrossRef Rubin, V.L., Chen, Y., Conroy, N.K.: Deception detection for news: three types of fakes. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)CrossRef
18.
Zurück zum Zitat Zhou, X., Jain, A., Phoha, V.V., Zafarani, R.: Fake news early detection: a theory-driven model. Digital Threats: Res. Pract. 1, 1–25 (2020)CrossRef Zhou, X., Jain, A., Phoha, V.V., Zafarani, R.: Fake news early detection: a theory-driven model. Digital Threats: Res. Pract. 1, 1–25 (2020)CrossRef
19.
Zurück zum Zitat Daud, A., Wahab K., Dunren C.: Urdu language processing: a survey. Artificial Intelligence Review 47, 279–311 (2017) Daud, A., Wahab K., Dunren C.: Urdu language processing: a survey. Artificial Intelligence Review 47, 279–311 (2017)
20.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, Association for Computational Linguistics (2019) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Minneapolis, Association for Computational Linguistics (2019)
21.
Zurück zum Zitat Demirkaya, A., Chen, J., Oymak, S.: Exploring the role of loss functions in multiclass classification. In: 2020 54th Annual Conference on Information Sciences and Systems (CISS), pp. 1–5. IEEE (2020) Demirkaya, A., Chen, J., Oymak, S.: Exploring the role of loss functions in multiclass classification. In: 2020 54th Annual Conference on Information Sciences and Systems (CISS), pp. 1–5. IEEE (2020)
22.
Zurück zum Zitat Ho, Y., Wookey, S.: The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access. 8, 4806–4813 (2019)CrossRef Ho, Y., Wookey, S.: The real-world-weight cross-entropy loss function: modeling the costs of mislabeling. IEEE Access. 8, 4806–4813 (2019)CrossRef
23.
Zurück zum Zitat Semenov, A., Boginski, V., Pasiliao, E.L.: Neural networks with multi-dimensional cross-entropy loss functions. In: International Conference on Computational Data and Social Networks, pp. 57–62. Springer (2019)CrossRef Semenov, A., Boginski, V., Pasiliao, E.L.: Neural networks with multi-dimensional cross-entropy loss functions. In: International Conference on Computational Data and Social Networks, pp. 57–62. Springer (2019)CrossRef
24.
Zurück zum Zitat Amjad, M., Grigori S., Alisa Z., Helena Gómez-Adorno, Ilia V., Alexander G.: Bend the truth: Benchmark dataset for fake news detection in Urdu language and its evaluation. Journal of Intelligent & Fuzzy Systems 39(2), 2457–2469 (2020) Amjad, M., Grigori S., Alisa Z., Helena Gómez-Adorno, Ilia V., Alexander G.: Bend the truth: Benchmark dataset for fake news detection in Urdu language and its evaluation. Journal of Intelligent & Fuzzy Systems 39(2), 2457–2469 (2020)
25.
Zurück zum Zitat Benito, D., Araque, O., Iglesias, C.A.: Gsi-upm at semeval-2019 task 5: semantic similarity and word embeddings for multilingual detection of hate speech against immigrants and women on twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 396–403. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)CrossRef Benito, D., Araque, O., Iglesias, C.A.: Gsi-upm at semeval-2019 task 5: semantic similarity and word embeddings for multilingual detection of hate speech against immigrants and women on twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 396–403. Association for Computational Linguistics, Minneapolis, Minnesota, USA (2019)CrossRef
Metadaten
Titel
UCoD: Ensemble BERT for Hierarchical Classification of the Urdu Disinformation Corpus
verfasst von
Umar Farooq
Omer Beg
Faisal Riaz
Saeid Jamali
William Holderbaum
Umar Raza
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-53935-0_25

Neuer Inhalt