Skip to main content

2024 | OriginalPaper | Buchkapitel

4. Computational Linguistics and Biological Sequences in Artificial Intelligence

verfasst von : Qingfeng Chen

Erschienen in: Association Analysis Techniques and Applications in Bioinformatics

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Researchers generally believe that nucleic acid is also a language with rich information. Nucleic acid language can be used to describe the structure of life and life processes, and there is as much diversity as language, with many common characteristics. Therefore, many existing studies apply the results and methods achieved in the field of language theory to the study of biological sequences. Based on this route, computational linguistics has also brought many new breakthroughs to the study of biological sequences. Association analysis is a data mining technique that can be used to discover frequent patterns and association rules in a dataset. In computational linguistics, association analysis can be used to uncover association rules in text data, helping us better understand semantic and grammatical rules in natural languages. In biological sequence analysis, association analysis can be applied to identify association rules in the genome and proteome to reveal interactions and functional relationships between genes or proteins. These analysis results can help us better understand the data and draw conclusions and promote further development in both fields. Therefore, the application of association analysis technology to the study of biological sequences in computational linguistics is a research field worthy of our expectations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Encyclopedia of China Publishing House. Language Encyclopedia[M]. Beijing: Encyclopedia of China Publishing House, 1994. Encyclopedia of China Publishing House. Language Encyclopedia[M]. Beijing: Encyclopedia of China Publishing House, 1994.
2.
Zurück zum Zitat Wu X, Jain L, Wang J, et al. Survey of Biodata Analysis from a Data Mining Perspective[J]. Data Mining in Bioinformatics, 2005:9–39. Wu X, Jain L, Wang J, et al. Survey of Biodata Analysis from a Data Mining Perspective[J]. Data Mining in Bioinformatics, 2005:9–39.
3.
Zurück zum Zitat Yu S W, Huang J R. Prospects of Computational Linguistics[M]. Beijing: The Commercial Press, 2005. Yu S W, Huang J R. Prospects of Computational Linguistics[M]. Beijing: The Commercial Press, 2005.
4.
Zurück zum Zitat Zhu B P, Li Q M. Formal languages and Automata[M]. Beijing: Tsinghua University Press, 2015. Zhu B P, Li Q M. Formal languages and Automata[M]. Beijing: Tsinghua University Press, 2015.
5.
Zurück zum Zitat Chomsky N. Syntactic Structures[M].In Xing G W, Pang B J, et al. Beijing: China Social Sciences Press, 1979: 28–35. Chomsky N. Syntactic Structures[M].In Xing G W, Pang B J, et al. Beijing: China Social Sciences Press, 1979: 28–35.
6.
Zurück zum Zitat Liu Y. computational linguistics[M]. Beijing: Tsinghua University Press, 2002. Liu Y. computational linguistics[M]. Beijing: Tsinghua University Press, 2002.
7.
Zurück zum Zitat Zong C Q. Statistical natural language processing[M]. Tsinghua University Press, 2013. Zong C Q. Statistical natural language processing[M]. Tsinghua University Press, 2013.
8.
Zurück zum Zitat Jiang Z L, Jiang S X. fundamentals of compiling[M]. Higher Education Press, 2010. Jiang Z L, Jiang S X. fundamentals of compiling[M]. Higher Education Press, 2010.
9.
Zurück zum Zitat Brejovab B, Dimarco C, Vinar T, et al. Finding Patterns in Biological Sequences[J]. Technical report, 2000. Brejovab B, Dimarco C, Vinar T, et al. Finding Patterns in Biological Sequences[J]. Technical report, 2000.
10.
Zurück zum Zitat Ben-Hur A, Brutlag D. Remote homology detection: A motif based approach[J]. Bioinformatics, 2003, 19: 26–33.CrossRef Ben-Hur A, Brutlag D. Remote homology detection: A motif based approach[J]. Bioinformatics, 2003, 19: 26–33.CrossRef
11.
Zurück zum Zitat Li Y, Korol A, Fahima T, Beiles A, et al. Microsatellites: genomic distribution, putative functions and mutational mechanisms[J]. Molecular Ecology, 2002, 11(12): 2453–2465.CrossRef Li Y, Korol A, Fahima T, Beiles A, et al. Microsatellites: genomic distribution, putative functions and mutational mechanisms[J]. Molecular Ecology, 2002, 11(12): 2453–2465.CrossRef
12.
Zurück zum Zitat Shapiro J A, Sernberg R V. Why repetitive DNA is essential to genome function[J]. Biological Reviews, 2005: 1–24. Shapiro J A, Sernberg R V. Why repetitive DNA is essential to genome function[J]. Biological Reviews, 2005: 1–24.
13.
Zurück zum Zitat Agaawal R, Srikant R. Mining sequential patterns. In: Yu PS, Chen ALP, eds. Proc. Of the 11th Int’l Conf. on Data Engineering[J]. Taipei: IEEE Computer Society, 1995: 3–14. Agaawal R, Srikant R. Mining sequential patterns. In: Yu PS, Chen ALP, eds. Proc. Of the 11th Int’l Conf. on Data Engineering[J]. Taipei: IEEE Computer Society, 1995: 3–14.
14.
Zurück zum Zitat Srikant R, Agrawal R. Mining sequential patterns: Generalization and performance improvements[C]// In: Apers PMG, Bouzeghoub M, Gardarin G, eds. Advances in Database Technology, Proc. of the 15th Int’l Conf. on Extending Database Technology, 1996: 3–17. Srikant R, Agrawal R. Mining sequential patterns: Generalization and performance improvements[C]// In: Apers PMG, Bouzeghoub M, Gardarin G, eds. Advances in Database Technology, Proc. of the 15th Int’l Conf. on Extending Database Technology, 1996: 3–17.
15.
Zurück zum Zitat Pei J, Han J W, Mortazavi-Asl B, et al. Prefixspan: Mining sequential patterns efficiently by prefix-projected growth[C]// In: Proc. of the 17th Int’l Conf. on Data Engineering, 2001: 215–224. Pei J, Han J W, Mortazavi-Asl B, et al. Prefixspan: Mining sequential patterns efficiently by prefix-projected growth[C]// In: Proc. of the 17th Int’l Conf. on Data Engineering, 2001: 215–224.
16.
Zurück zum Zitat Xiong Y, Zhu Y Y. BioPM: An Efficient Algorithm for Protein Motif Mining[C]. In: Proc. of ICBBE’07, 2007: 394–397. Xiong Y, Zhu Y Y. BioPM: An Efficient Algorithm for Protein Motif Mining[C]. In: Proc. of ICBBE’07, 2007: 394–397.
17.
Zurück zum Zitat Wang D, Wang G, Wu Q Q, Chen B.C. Finding LPRs in DNA sequence based on a new index SUA[C]// Bioinformatics and Bioengineering, 2005: 281–284. Wang D, Wang G, Wu Q Q, Chen B.C. Finding LPRs in DNA sequence based on a new index SUA[C]// Bioinformatics and Bioengineering, 2005: 281–284.
18.
Zurück zum Zitat Kurtz S, Choudhuri J V, Ohlebusch E, et al. REPuter: The manifold applications of repeat analysis on a genomic scale[J]. Nucleic Acids Research, 2001, 29(22): 4633–4642.CrossRef Kurtz S, Choudhuri J V, Ohlebusch E, et al. REPuter: The manifold applications of repeat analysis on a genomic scale[J]. Nucleic Acids Research, 2001, 29(22): 4633–4642.CrossRef
19.
Zurück zum Zitat Guo S, Jiang Q S, Wang B Z, Shi L. A new algorithm for protein sequence pattern mining[J]. Computer Engineering, 2009, 35(8): 208–210. Guo S, Jiang Q S, Wang B Z, Shi L. A new algorithm for protein sequence pattern mining[J]. Computer Engineering, 2009, 35(8): 208–210.
20.
Zurück zum Zitat Pearson W R, Lipman D J. Improved tools for biological sequence comparison[J]. Proceedings of the National Academy of Sciences, 1988, 85(8):2444–2448.CrossRef Pearson W R, Lipman D J. Improved tools for biological sequence comparison[J]. Proceedings of the National Academy of Sciences, 1988, 85(8):2444–2448.CrossRef
21.
Zurück zum Zitat Roth F P, Hughes J D, Estep P W, et al. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation[J]. Nature Biotechnology, 1998, 16(10): 939–945.CrossRef Roth F P, Hughes J D, Estep P W, et al. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation[J]. Nature Biotechnology, 1998, 16(10): 939–945.CrossRef
22.
Zurück zum Zitat Cardon L, Stormo G. Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments[J].Journal of Molecular Biology, 1992, 223: 159–170. Cardon L, Stormo G. Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments[J].Journal of Molecular Biology, 1992, 223: 159–170.
23.
Zurück zum Zitat Liu J, Neuwald A, Lawrence C. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies[J]. Journal of the American Statistical Association, 1995, 90(432): 1156–1170.CrossRef Liu J, Neuwald A, Lawrence C. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies[J]. Journal of the American Statistical Association, 1995, 90(432): 1156–1170.CrossRef
24.
Zurück zum Zitat Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool[J]. Journal of molecular biology, 1990, 215(3): 403–410.CrossRef Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool[J]. Journal of molecular biology, 1990, 215(3): 403–410.CrossRef
25.
Zurück zum Zitat Gotoh O. Multiple sequence alignment: Algorithms and applications[J]. Advances in biophysics, 1999, 36: 159–206.CrossRef Gotoh O. Multiple sequence alignment: Algorithms and applications[J]. Advances in biophysics, 1999, 36: 159–206.CrossRef
26.
Zurück zum Zitat Stuart G W, Moffett K, Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences[J]. Bioinformatics, 2002, 18(1): 100–108.CrossRef Stuart G W, Moffett K, Baker S. Integrated gene and species phylogenies from unaligned whole genome protein sequences[J]. Bioinformatics, 2002, 18(1): 100–108.CrossRef
27.
Zurück zum Zitat Wu T J, Burke J P, Davison D B. A measure of DNA sequence dissimilarity based on mahalanobis distance between frequencies of words[J]. Biometrics, 1997, 1431–1439. Wu T J, Burke J P, Davison D B. A measure of DNA sequence dissimilarity based on mahalanobis distance between frequencies of words[J]. Biometrics, 1997, 1431–1439.
28.
Zurück zum Zitat Wu T J, Hsieh Y C, Li L A. Statistical measures of DNA sequence dissimilarity under markov chain models of base composition[J]. Biometrics, 2001, 57(2): 441–448.MathSciNetCrossRef Wu T J, Hsieh Y C, Li L A. Statistical measures of DNA sequence dissimilarity under markov chain models of base composition[J]. Biometrics, 2001, 57(2): 441–448.MathSciNetCrossRef
29.
Zurück zum Zitat Li M, Badger J H, Chen X, et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny[J]. Bioinformatics, 2001, 17(2): 149–154.CrossRef Li M, Badger J H, Chen X, et al. An information-based sequence distance and its application to whole mitochondrial genome phylogeny[J]. Bioinformatics, 2001, 17(2): 149–154.CrossRef
30.
Zurück zum Zitat Yu S W. Introduction to computational linguistics[J]. The Commercial Press, 2003. Yu S W. Introduction to computational linguistics[J]. The Commercial Press, 2003.
31.
Zurück zum Zitat Fu J S. Pattern recognition and its application[M]. Science Press, 1983. Fu J S. Pattern recognition and its application[M]. Science Press, 1983.
32.
Zurück zum Zitat Nagata, M A clustered global phrase reordering model for statistical machine translation[A]. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 2006, (7): 713–720. Nagata, M A clustered global phrase reordering model for statistical machine translation[A]. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 2006, (7): 713–720.
33.
34.
Zurück zum Zitat Turing A.M, Copeland B.J. essential Turing seminal writings in computing, logic, philosophy, artificial intelligence, and artificial life plus The secrets of Enigma[M]. Oxford University Press, 2004. Turing A.M, Copeland B.J. essential Turing seminal writings in computing, logic, philosophy, artificial intelligence, and artificial life plus The secrets of Enigma[M]. Oxford University Press, 2004.
35.
Zurück zum Zitat Sakakibara Y, Brown M, Hughey R, et al. Stochastic context-free grammars for tRNA modeling[J]. Nucleic Acids Research, 1994, 22(23): 5112.CrossRef Sakakibara Y, Brown M, Hughey R, et al. Stochastic context-free grammars for tRNA modeling[J]. Nucleic Acids Research, 1994, 22(23): 5112.CrossRef
36.
Zurück zum Zitat Serls D B. The linguistics of DNA[J]. American Scientist, 1992, 80(6): 579–591. Serls D B. The linguistics of DNA[J]. American Scientist, 1992, 80(6): 579–591.
37.
Zurück zum Zitat Searl D B, Group B. Formal Language Theory and Biological Macromolecules[J]. series in discrete mathematics \(\displaystyle \&\) theoretical computer science, 1999. Searl D B, Group B. Formal Language Theory and Biological Macromolecules[J]. series in discrete mathematics \(\displaystyle \&\) theoretical computer science, 1999.
38.
Zurück zum Zitat Searls D B. Linguistic approaches to biological sequences. Comput Apply Biosci, 1997, 13(4): 333–344. Searls D B. Linguistic approaches to biological sequences. Comput Apply Biosci, 1997, 13(4): 333–344.
39.
Zurück zum Zitat Collado-Vides J. The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars[J]. Computer applications in the biosciences: CABIOS, 1991, 7(3): 321–326. Collado-Vides J. The search for a grammatical theory of gene regulation is formally justified by showing the inadequacy of context-free grammars[J]. Computer applications in the biosciences: CABIOS, 1991, 7(3): 321–326.
40.
Zurück zum Zitat Baldip P, Brunak S, Stolovitzky G A. Bioinformatics: The Machine Learning Approach[J]. Physics Today, 2002, 55(12): 57–58.CrossRef Baldip P, Brunak S, Stolovitzky G A. Bioinformatics: The Machine Learning Approach[J]. Physics Today, 2002, 55(12): 57–58.CrossRef
41.
Zurück zum Zitat Paun Gh, Sântean L. Further remarks on parallel communicating grammar systems[J]. International Journal of Computer Mathematics, 1990, 34(3–4): 187–203. Paun Gh, Sântean L. Further remarks on parallel communicating grammar systems[J]. International Journal of Computer Mathematics, 1990, 34(3–4): 187–203.
42.
Zurück zum Zitat Pieter AJ, Den B, Van B E, et al. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation[J]. Nucleic Acids Research, 1990,(10): 3035. Pieter AJ, Den B, Van B E, et al. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation[J]. Nucleic Acids Research, 1990,(10): 3035.
43.
Zurück zum Zitat Brendel V, Busse H G. Genome structure described by formal languages[J]. Nucleic Acids Research, 1984, 12(5): 2561–2568.CrossRef Brendel V, Busse H G. Genome structure described by formal languages[J]. Nucleic Acids Research, 1984, 12(5): 2561–2568.CrossRef
44.
Zurück zum Zitat Chomsky N. Some simple evo devo theses: How true might they be for language[C]// In Richard K.Larson, Viviane Déprez\(\displaystyle \&\)Hiroko Yamakido (eds), The Evolution of Language: Biolinguistic Perspectives, Cambridge University Press, 2010: 45–62. Chomsky N. Some simple evo devo theses: How true might they be for language[C]// In Richard K.Larson, Viviane Déprez\(\displaystyle \&\)Hiroko Yamakido (eds), The Evolution of Language: Biolinguistic Perspectives, Cambridge University Press, 2010: 45–62.
45.
Zurück zum Zitat Atkinson Q D, Gray R D. Curious Parallels and Curious Connections—Phylogenetic Thinking in Biology and Historical Linguistics[J]. Systematic Biology, 2005(4):4. Atkinson Q D, Gray R D. Curious Parallels and Curious Connections—Phylogenetic Thinking in Biology and Historical Linguistics[J]. Systematic Biology, 2005(4):4.
46.
Zurück zum Zitat Ritt N. Selfish Sounds and Linguistic Evolution: A Darwinian Approach to Language Change[M]. Cambridge University Press, 2004.CrossRef Ritt N. Selfish Sounds and Linguistic Evolution: A Darwinian Approach to Language Change[M]. Cambridge University Press, 2004.CrossRef
Metadaten
Titel
Computational Linguistics and Biological Sequences in Artificial Intelligence
verfasst von
Qingfeng Chen
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-8251-6_4

Premium Partner