Structured abstract generator (SAG) model: analysis of IMRAD structure of articles and its effect on extractive summarization
DOI: 10.1007/s00799-024-00402-8
© The Author(s) 2024
Received: 11 July 2023
Accepted: 1 April 2024
Published: 7 May 2024
Abstract
An abstract is the most crucial element that may convince readers to read the complete text of a scientific publication. However, studies show that in terms of organization, readability, and style, abstracts are also among the most troublesome parts of the pertinent manuscript. The ultimate goal of this article is to produce better understandable abstracts with automatic methods that will contribute to scientific communication in Turkish. We propose a summarization system based on extractive techniques combining general features that have been shown to be beneficial for Turkish. To construct the data set for this aim, a sample of 421 peer-reviewed Turkish articles in the field of librarianship and information science was developed. First, the structure of the full-texts, and their readability in comparison with author abstracts, were examined for text quality evaluation. A content-based evaluation of the system outputs was then carried out. System outputs, in cases of using and ignoring structural features of full-texts, were compared. Structured outputs outperformed classical outputs in terms of content and text quality. Each output group has better readability levels than their original abstracts. Additionally, it was discovered that higher-quality outputs are correlated with more structured full-texts, highlighting the importance of structural writing. Finally, it was determined that our system can facilitate the scholarly communication process as an auxiliary tool for authors and editors. Findings also indicate the significance of structural writing for better scholarly communication.
Keywords
Abstracts Readability Scholarly communication Automatic text summarization1 Introduction
Abstracts are the most important textual tools in enabling potential readers to read the relevant full-texts from the huge stack of electronic information retrieved through the Internet. It is reported that there is a correlation between a scientific article’s readability and impact determined by its subsequent citations or the possibility of being published in a top 5 journal in a relevant subject [1, 2]. However, compared to the relevant full-texts, abstracts are even much more subject to readability issues and structural flaws in their contents [3, 4, 5, 6].
The electronic versions of scientific publications have become more preferred than the printed ones in a short time, with their advanced functionality that accelerates the access and publishing process [7]. However, electronic formats of scientific publications are almost identical to the printed formats. Thus, the electronic forms of publications have not increased the user experience in terms of readability [8]. In contrast, online communication brings new challenges to the scientific community for analyzing retrieved documents. These challenges include the distraction caused by being online, the obligation to choose from a stack of related articles, and the difficulty of maintaining focus while navigating through linked web pages [9, 10, 11]. Research has shown that reading and comprehending a lengthy electronic text, which requires scrolling and navigating back and forth, demands more mental effort than reading a printed text [12, 13]. Screen reading has been found to be inherently distracting, mainly because of the above mentioned multitasking nature of online reading [14].
-
Introduction: What was studied and why?
-
Methods: How was the study conducted?
-
Results: What were the findings?
-
Discussion: What do the findings mean?
The language used in the abstract should be clear enough so that everyone can understand it, even if they don’t know much about the topic or English isn’t their first language. However, it’s often the case that abstracts are more difficult to read than the main body of an article [3, 4, 5, 18, 19]. Moreover, the abstract section should also cover the major information given in the full-text. Studies have found that skipping necessary information in abstracts is a frequently observed problem [6, 20, 21, 22].
How can abstracts be written to persuade readers to read the full text, especially if the reader has difficulty understanding the abstract? Structured abstract writing may be a solution, as it can improve readability and comprehension by dividing the text into subheadings [23]. In this way the informativeness of the abstract increases. When compared to unstructured abstracts, structured abstracts have significantly higher information quality [24]. Further, the indexing performance of the publication increases. It provides ease of access to the user and increased relevance in search results. This facilitates access to the article for all users with varying degrees of familiarity with the subject of the publication. The structural headings can help readers to find and understand the information they need more easily. It is easier for the author to write an abstract using a structured format than a classical one. The author cannot forget to mention all parts of the publication in the abstract. In that manner, abstract full-text consistency increases. It is preferred more by the readers and authors than the classical versions [23].
Given the critical role of abstracts in scholarly communication, this study is conducted to enhance the informativeness of abstracts by utilizing the high readability of full-text sentences and the structured ordering inherited from the full-text articles.
2 Literature review
The main research topics related to abstracts in the literature deal with organizational issues, readability issues and presentation issues in general. Many researchers have found that abstracts do not follow the structural order followed in the full-text, if the journal does not have a specific policy on this issue.
In the process of deciding whether to read the full text of an academic article, readers are most interested in descriptive information about the research problem, method, or results. Skipping information about these parts in abstracts is a frequently observed problem [6, 20, 21, 22]. The abstract of a scientific paper often contains long, inverted sentences with conjunctions and intensive use of specific technical terms or jargon related to the field. The conscious preference for such sophisticated language features has resulted in abstracts becoming progressively more difficult to read over time. The readability of an abstract is usually found more difficult than the other parts of the article [3, 4, 5, 18, 19]. Although the subject of the presentation is an element that should be considered separately from the readability context [25], it is difficult to read an abstract written in a single block without paragraphs and subtitles, in fonts smaller than the full-text, and sometimes in italics [27, 27]. The abstract formats required by journals vary. The two most dominant formats are classical (or traditional) abstracts and structured abstracts. Classical abstracts which are preferred by most journals, are not produced in a format that will attract the attention of the reader within the scope of the presentation. Abstracts that are written in a single block in an unstructured format, without paragraphs and subheadings, are generally called classical. Structured abstracts must be produced by filling in all the structural titles specified by the journal.
Luhn [28] carried out his pioneering work in the field of automatic text summarization in order to save the reader time and effort in finding useful information in an article or report when the widespread use of the Internet and information technologies were not yet on the agenda. Since then, the summarization of scientific textual data has become a necessary and crucial task in Natural Language Processing (NLP) [29, 30]. However, there are certain difficulties such as the abstract generation, having labeled training and test corpora, and the scaling of collections of large documents.
Research in automatic text summarization has witnessed a proliferation of techniques since the beginning. The process generally involves several stages, including pre-processing the source document, extracting relevant features, and applying a summary generation method or algorithms. In the pre-processing stage, text documents are prepared for the next stages using linguistic techniques such as sentence segmentation, punctuation removal, stop word filtering, stemming, etc. Then, words are converted to numbers for computers to decode language patterns. Common methods include bag-of-words, n-grams, tf-idf, and word embeddings. For feature extraction, some of the commonly used features [31] that are used at both the word and sentence level to identify and extract salient sentences from documents are listed below:
-
Keywords (content words): Nouns, verbs, adjectives, and adverbs with high TF-IDF scores suggesting sentence importance.
-
Title words: Sentences containing words from the title are likely to be relevant to the topic of the document.
-
Cue Phrases: Phrases such as “conclusion”, “because”, “this information”, etc. that indicate structure or importance.
-
Biased words: Domain-specific words that reflect the topic of the document are considered important.
-
Capitalized words: Names or acronyms such as “UNICEF” that indicate important entities.
-
Sentence Location: Sentences in the document are prioritized due to information hierarchy. For instance, beginning and ending sentences are likely to hold more weight.
-
Length: Optimal length of sentences plays an important role in identifying excessive detail or lack of information.
-
Paragraph Location: Similar to sentence location, beginning and ending paragraphs of the document carry higher weight.
-
Sentence-Sentence Similarity: Sentences with higher similarity to other sentences of the document indicate their importance.
Summarization of scientific papers is one of the applications of automatic summarization. Abstract generation-based applications and citation-based applications are two main branches of scientific article summarization. Other applications focus on specific problems such as the summarization of tables, figures, or specific sections of the related article [29]. Turkish text summarization studies primarily used extractive techniques due to a deficiency of trained corpora, a requirement that is still unmet in languages with limited resources like Turkish [33].
In addition, in scientific article summarization, single-article summarization with extractive techniques has predominantly been used with the high dominance of combinations of statistical and machine learning approaches, and intrinsic evaluation methods which are largely based on ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics [29]. The ROUGE evaluation of an automated scientific article summarization system that focused on the dataset containing academic articles shows that the extractive algorithms are better than the abstractive algorithms [34].
Our summarization model is based on a study [35] that evaluated the performance of 15 different extractive-based sentence selection methods, both individually and combined, on 20 Turkish news documents. The study aimed to select the most important sentences in a document. They analyzed the outputs of the methods based on the summaries of sentences hand-selected by 30 evaluators. The best results were obtained when the sentence position, number of common adjacencies, and inclusion of nouns were combined. While these features were combined in a linear function, their weights were kept equal.
3 Research objectives and questions
We propose a summarization model based on extractive techniques combining general sentence selection features that have been shown by human judgments to be beneficial for Turkish [35]. Our study aims to assess the suitability of the Turkish librarianship and information science (LIS) corpus for automatic summarization methods by evaluating it from a broad perspective, rather than developing our own method. We focus on the full-text structural order to improve the extractive sentence selection process. Additionally, we compare the readability levels of full texts and abstracts to emphasize the significance of readability in scholarly communication. Raising awareness of this issue is also important, especially among LIS professionals.
The main goal of this study is to understand the benefits of generating structured abstracts using extractive methods. We aim to identify the most feasible way to generate abstracts for scholarly communication in Turkish. It is clear that choosing the most important sentences from each structural section of a scientific article and presenting them under the structural headings will facilitate the abstract generation process. Moreover, such structural sectioning increases the semantic integrity and readability of an abstract. Our main hypothesis is “Considering the structural features of full-texts in extracting abstract sentences with automatic methods will increase the quality of the outputs”. The study attempts to answer the following research questions: (1) Are the full-texts of Turkish LIS articles organized taking into consideration the basic structural features that are expected to exist in a scientific publication? (2) What is the readability of the full-texts and the abstracts of Turkish LIS articles, based on the readability scale? (3) Does using full-text structural features in extracting abstracts with automated methods improve output quality?
In our study, we examined articles published in the field of LIS with classical abstracts. The corpus was analyzed to determine whether the full-texts of the articles are more readable and better structured than the classical author abstracts. We generate a simple automatic abstract generator model that chooses the most important sentences from each structural section of each article.
4 Methodology
We utilized an extractive automatic summarization system named, Structured Abstract Generator (SAG), which depends on the extraction of the most important sentences from all structural parts of the full-texts of articles. Figure 1 demonstrates the architecture of the SAG. This section describes the methodology used in the study.
4.1 Data collection and representation
To construct a corpus for the study, Türk Kütüphaneciliği -Turkish Librarianship (TL) and Bilgi Dünyası -Information World (IW), which are major journals in the field of librarianship and information science in Turkey, have been used. Both journals asked the authors to develop classical abstracts. In addition, both journals do not set either an IMRAD or similar clear template for full-texts. However, IW draws a framework in line with the IMRAD regarding the arrangement of the content. All refereed articles written in Turkish were included in the study. Since each journal is open access, there was no problem in accessing these articles. This study is the first in Turkish to conduct a detailed full-text analysis of a large corpus of LIS literature.
In the initial stage, all articles were saved in PDF format with a unique identifier that encoded the journal name, year, volume, and issue information. For example, the identifier BD200011 indicates an article published in the year 2000, which is the 1st volume of the year and the 1st article of the volume in the IW (BD in Turkish) journal.
Once the articles were identified, they were converted into.txt format using UTF-8 character encoding to ensure the correct representation of Turkish characters. Then, article metadata was automatically extracted. This included author names, titles, abstracts, body text, and keywords, which are clear indicators of the content and are located in specific places in the document.
After processing 421 documents from two journals (172 IW, 249 TL), a relational database was created using MySQL. This database enabled the efficient processing of article full-text sentences as vectors, where each component is assigned to the corresponding structural section of the document, as well as the document’s metadata. The IMRAD format, which is the most prominent organizational structure for full-text in scientific writing, was used in this study.
To facilitate further stages, web-based interfaces were developed to enable the monitoring and management of rules governing the structural layout decisions for each article. The development of a web-based system offered inherent advantages in terms of providing flexible work arrangements and enabling quick control over individuals in operator roles. The solution was designed to be compatible with both mobile and desktop devices, enabling the team to operate flexibly and remotely.
The team of operators consisted of six professionals, two undergraduate students, and four PhD students from the Department of Information Management. These individuals had prior expertise regarding the structural components of scientific articles. Two roles were identified for the expert team: operator (4 experts) and administrator (2 experts).
Operators copied and pasted the body text from these interfaces according to IMRAD headings, retaining complete control over the process. After the completion of the IMRAD marking procedure for an article, operators were unable to make any additional modifications using the interface. However, administrators retained the authorization to execute final supervision and operational functions subsequent to this stage. This control was important to ensure that the IMRAD structure of the articles, which was inherited by paragraphs, was determined correctly. To ensure inter-annotator agreement of scholarship decisions, each article was tagged by at least two operators and one expert doctoral student during the manual step.
By implementing this work plan, the expert team successfully achieved the systematic and efficient classification of the boundaries and structural sections (according to the IMRAD format) of each paragraph of the body text. Consequently, the work of carefully adhering to the sequential arrangement of sentences in all articles was successfully completed within a brief timeframe. This hierarchical structure of body text was further applied to the sentence level through the utilization of a relational database. At the end of the two main steps mentioned above, 101,019 sentences were extracted from 421 articles. Next, word frequency vectors and n-gram sequences were obtained using Zemberek [37] and then stored in the database.
Data representation of a sentence of an article
Paper id | Sentence_no | Paragraph_no | Imrad_no | Title | Text |
---|---|---|---|---|---|
BD200011 | 27 | 5 | 1 | geleceğe | tarım toplumundan |
yönelik | sanayi toplumuna geçiş | ||||
tartışmalar | eğitimi nasıl etkilediyse | ||||
sanayi toplumundan | |||||
bilgi toplumuna geçiş de | |||||
kurumların yapısında | |||||
köklü değişiklikleri | |||||
zorunlu kılmaktadır |
Word frequency vector example
Paper id | Sentence_no | Imrad_no | Words | Word_vector |
---|---|---|---|---|
BD200011 | 27 | 1 | toplum, sanayi, geç, eğitim, yapı, | 4,2,2,1,1, |
bilgi, tarım, kur, kıl, değiş | 1,1,1,1,1 | |||
nasıl, etki, kök, zorunlu | 1,1,1,1 |
4.2 Stemming
Since Turkish has an agglutinative morphology, inflectional or plural suffixes may produce multiple words from one root. Turkish words that appear in different ways in the text but have the same meaning in terms of their roots can be shown in a single way. Due to the high reduction rate provided in the size of the document-term matrix, it is strongly recommended to apply to stemming in Turkish texts [38]. For root finding, we utilize Zemberek [37], a natural language processing toolkit for Turkish for root finding. Although sentences of articles had been parsed under the supervision of the operators, we employed data-cleaning methods on the raw data.
After the stemming and data-cleaning processes, word frequency vectors are produced. Table 2 depicts the example of a vector representation of a sentence whose raw data is seen in Table 1.
4.3 Extractive summarization and evaluation process
Extractive automatic summarization methods include the process of scoring, sorting and selecting sentences in the document. Automatic text summarization approaches and methods are employed to identify key representative sentences from the full-text. Sentences are scored based on their predetermined features, and the significance of each sentence in the document is determined by these scores. Sentence selection functions that bring together each feature by weighting are another stage of the extractive automatic summarization systems. Features used in sentence scoring are as follows.
4.3.1 Sentence position
Formula 1 gives, each sentence ranking points from 1 to 0 depending on the order of appearance in the article.
4.3.2 Sentence centrality
The sim value of each sentence is calculated using the cosine similarity measure [40]. Cosine similarity is one of the most preferred methods to compare two texts and to make decisions over the similarity between them.
4.3.3 Noun score
Another feature discussed in this study is whether the sentences contain nouns. The nouns in the texts transmit the information about the content of the text. Therefore, the text summarization system gives points to the sentences containing nouns according to the number of nouns they contain. Zemberek [37] was used to calculate the score. That score (NS) of each sentence was added to the formula after normalizing by a count of all words of the related sentence.
4.3.4 Ranking score
4.3.5 Generating automatic abstracts
The intended outputs of our system are automatic structured abstracts (ASA). In addition to these outputs, we evaluated the impact of considering structural features on the performance of an extractive-based text summarization system with automatic classical abstracts (ACA) without using structural features, with the same ranking function. The structural section marking of the corpus full-texts is compatible with the widely accepted and well-known IMRAD headings, so the layout of the ASA output of our system is also compatible with IMRAD.
The word limit for our system’s output was determined by reviewing the TL and IW journal guides. The journal TL does not have a word limit for abstracts, while the journal IW has a 250-word limit, which we considered reasonable. Usually, journal guides indicate a word limit for abstracts, with the range being from 150 to 300 words (APA, 2010). As such, we set a 250-word limit for the output of our automated structural abstract system.
For ASAs, the 250-word limit is divided equally among the structural sections of the article. The highest-scoring sentences are selected from each section until the word limit for that section is reached. In this step, sentences are first sorted according to their structural section and then according to their score. For ACAs, the highest-scoring sentences are selected from the entire article until the 250-word limit is reached. In this step, we only sort sentences according to their score.
4.3.6 Evaluation process
In this study, the effect of selecting sentences by considering the structural features of the full-text while generating abstracts was measured using automatic methods. The evaluation is conducted in three stages. Firstly, the distribution of selected sentences for ASA and ACAs within the full text is compared to ensure that the automatic summaries are representative. Next, the full text, original abstract, ASA, and ACA are evaluated for readability to determine whether the automatic summaries are easier to understand than the author summaries. Finally, structural (ASA) and non-structural (ACA) automatic summaries are compared using n-gram co-occurrence between the original abstracts to measure quality and effectiveness. ROUGE scores [42] are used to compare n-grams in the reference summaries and the extracted summaries as a standard of automatic evaluation of document summarization.
ROUGE evaluation
Readability of texts
Reading is a complex process that requires readers to make sense of the given message, comprehend it, and finally interpret it [47]. The suitability of the text for the target audience can be determined through readability calculations.
Although a language-specific formula has not been produced to measure the readability of Turkish texts, an adaptation of the well-known formula called “Flesch Reading Ease” (FRE) [48] has been widely used since 1997. This adaptation is known as Atesman’s Readability Formula [49], which calculates the readability of a text based on the average syllable length of the words in the text and the average number of words per sentence.
Atesman’s readability scale
Readability value | Readability scale of text |
---|---|
90–100 | Very easy |
70–89 | Easy |
50–69 | Fairly difficult |
30–49 | Difficult |
0–29 | Very difficult |
Academic texts are typically challenging since they contain a lot of jargon specific to the study domain and lengthy sentences with conjunctions. In our study, we have a domain-specific corpus of articles with similar linguistic characteristics. Thus, it is believed that assessing the text’s readability based on the length of sentences and words will be distinctive. While examining the characteristics of the corpus, we calculated the readability values of the body text and traditional abstracts of each article using ARVs. Finally, we compared these calculations with the ARVs of system outputs.
5 Results
Count of IMRAD patterns used in the articles
IMRAD (#) | Article (%) | Pattern | Article (#) |
---|---|---|---|
1 | 4.7 | I | 19 |
R | 1 | ||
2 | 45.8 | I,D | 193 |
3 | 6.8 | I,M,R | 3 |
I,M,D | 1 | ||
I,R,D | 25 | ||
4 | 42.5 | I,M,R,D | 179 |
However, it is important to note that every scientific article must contain research question(s) and a method adopted to investigate the question(s). Therefore, the findings about the research question(s) should also be included in the articles. Articles with methods without results (I,M,D), results without methods (I,R,D), or methods and results without discussion (I,M,R) are incompatible with academic writing, as they do not provide a complete account of the research. However, these sentences are the minority of our corpus, constituting only 6.7% (5.9% + 0.4% + 0.4%) of the total. Also, articles consisting of a single IMRAD section including introduction (I), or result (R) remain a minority (3.3% + 0.1% = 3.4%). If such incompatible structural patterns were prevalent, using the SAG system on Turkish LIS articles would be considered inappropriate.
The implications of incompatible structural orders in Turkish LIS articles, particularly those without a method section (I,R,D) (5.9%) or with only an introduction section (I) (3.3%), are worth examining to determine whether they are a domain-specific format or a sign of incomplete content. Having only two IMRAD sections is also worth examining. We defer discussion of these implications to future work, as they are beyond the scope of the present study.
Figure 3 presents boxplots comparing the readability scores of different groups (original abstracts, full-text articles, ASAs, and ACAs) within the corpus. The area between the red horizontal line (y = 29) and the black horizontal line (y = 49) limits the “difficult” area in the graphic depending on the readability scale. The area below the red line indicates “very difficult” readability and the area above the black line indicates “medium difficulty” readability levels. The collection of original abstracts produced by the author is located at the bottom of Fig. 3 which is almost entirely classified as “very difficult”. The full-texts are clearly limited within the “difficult” readability range. The majority of ACA, ASA, and average of the readability values of these texts appear in the “difficult” readability area.
On the other hand, Fig. 5a, which gives the distribution of ACA sentences based on the structural format, differs clearly both from Figs. 3 and 5b. The weight of the output sentences taken from the articles that have the pattern of four IMRAD sections for ACA is found to be 41.6% (= 1.3% + 17.5% + 12.6% + 10.2%). Only 1.3% of all ACA sentences in the articles with four IMRAD sections consist of four IMRAD sections themselves. Articles with four IMRAD sections account for 17.5% of the ACA sentences in this group, 12.6% for two IMRAD sections, and 10.2% for a single IMRAD section.
Average F-score of ROUGE measures according to count of full-text IMRAD sections
Count of IMRAD | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-SU4 | ||||
---|---|---|---|---|---|---|---|---|
ACA | ASA | ACA | ASA | ACA | ASA | ACA | ASA | |
4 | 0.34480 | 0.38540 | 0.11080 | 0.13619 | 0.06021 | 0.06687 | 0.15076 | 0.17809 |
3 | 0.31640 | 0.32330 | 0.10244 | 0.10776 | 0.06300 | 0.05719 | 0.13947 | 0.14524 |
2 | 0.29663 | 0.31421 | 0.09198 | 0.10083 | 0.05016 | 0.05471 | 0.12808 | 0.13795 |
1 | 0.22790 | 0.22790 | 0.06265 | 0.06265 | 0.05639 | 0.05639 | 0.10350 | 0.10350 |
All | 0.31659 | 0.34250 | 0.09967 | 0.11493 | 0.05561 | 0.06014 | 0.13733 | 0.15392 |
Figure 6 shows a graph that displays the distribution of sentences based on their output type and IMRAD patterns. The x-axis represents the abstract type, while the y-axis represents the IMRAD label. The grids at the top show the relationships between different groups of outputs based on the count of IMRAD sections, while the right outer edges of the figure show the relationships between different groups formed based on the IMRAD pattern of the related articles. The labels on the right outer edge represent the abbreviation of IMRAD pattern in the source articles, and the numbers at the top indicate the count of IMRAD in each output group. Each point in the graph shows the distribution of automatic abstract sentences based on the IMRAD count of each output group and IMRAD patterns of the articles from which they are produced.
The grids on the top and right side of the Fig. 6 show how the outputs are grouped based on the number of IMRAD sections and IMRAD pattern, respectively, helping to examine the full-text representativeness between these groups. The projection of each point on the x-axis determines the type of automatic summary in which the relevant sentence is from. Figure 6 displays the distribution of sentences to each output type and IMRAD section.
The distribution of ACAs and ASAs in full-text sentences, as shown in Fig. 6, indicates that they are completely different. ACAs are generated without considering the IMRAD structure of the full-text, while ASAs are generated from each IMRAD section. This results in the count of IMRAD sections in ACAs being independent of the count of IMRAD sections in the full-text. For example, ACAs from full-texts with two (I,M), three (I,M,R), and four (I,M,R,D) IMRAD sections may consist of a single (I) IMRAD section.
On the other hand, ASAs are compatible with the full-text and output patterns since they are generated by selecting relevant sentences from the full-text for a specific IMRAD section.
The content-based performance of the SAG is evaluated with n-gram co-occurrences between the system outputs and ideal summaries by ROUGE 2.0 package. At this stage, we used the original summaries as the ideal summaries. It should be noted that the abstracts are relatively short texts that may limit the overlap between the author’s abstracts and the system outputs. On the other hand, the difference between the author’s abstracts and the system outputs may be due to meaning and content, or synonymous words and concepts. Evaluating synonyms in automatic summarization is a difficult task as different synonyms can have different meanings and a word’s meaning can change based on the context in which it is used. Since our study focuses on structural layouts that influence the performance of automatic summarization systems, we have limited our scope to exclude the evaluation of synonyms. As a result, synonyms were not evaluated in the study.
Table 5 shows the mean F-score values for each ROUGE measure, grouped by the count of IMRAD sections in the articles in the corpus. The line labeled “All” refers to the values without grouping the corpus based on IMRAD count. The mean F-score is consistently highest for the count of four IMRAD section groups compared to all other output groups. Additionally, ASAs performed better than ACAs for all F-scores at both four and two IMRAD sections, which are the dominant IMRAD patterns in the corpus.
The highest values of n-gram overlapping with the authors’ abstracts are the ROUGE-1 in all cases. It is also suggested for very short outputs, such as abstracts of scientific articles, that ROUGE-1 alone may be sufficient for evaluating text quality [44]. The lower values of n-gram overlapping with the abstracts are those in the ROUGE-L. The ROUGE-L deals with the sentence-level structure similarity and identifies the longest string of n-gram associations that occur among the texts it compares. Therefore, it can be argued that short outputs and authors’ abstracts may affect the size of the n-gram association sequences between the sentences. The overall decrease in ROUGE-L scores can also be explained in this way.
In Fig. 7 the results of content-based evaluation are presented. Since the majority of articles in the corpus had two or four IMRAD sections, the performance of the dominant group was compared to better illustrate the effect of IMRAD count on output. The boxplots in each section show the F-scores of the developed system outputs, based on the ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 scores. The distributions of the ACA and ASA output groups show similar characteristics in all four score types. It is understood from the graphs that as the count of IMRAD sections in the full-texts increases, the ROUGE scores of both output groups of SAG also increase, and the ASAs have better performances in contrast to the ACAs in all cases.
6 Discussion and conclusion
In this paper, we introduced a Structured Abstract Generator which depends on a simple model for generating high-quality structured abstracts of scientific articles. The purpose of employing such automated methods in extracting abstract sentences from relevant full texts while considering the article structure was to improve the quality of the abstracts. Our system generates structured abstracts (ASAs). We evaluated the impact of considering structural features on the performance of an extractive-based automatic text summarization system with automatically generated classical abstracts (ACAs) without using structural features.
We also present a database that enables the efficient processing of the corpus of 421 Turkish LIS articles in full-text sentences where each component is assigned to the corresponding structural section of the document, as well as the document’s metadata.
First, we explored any factors that could prevent the creation of structured abstracts and showed that our corpus is formatted in a way that enables the automatic generation of structured abstracts. 89.8% of the sentences in our corpus come from articles with an acceptable IMRAD pattern of all four (43.5%) IMRAD sections, or at least two (46.3%) IMRAD sections (Introduction and Discussion). Further research is needed to determine whether having only two IMRAD sections is a domain-specific format or a sign of incomplete content. The other problematic articles were completely incompatible with academic writing are remained in the minority. Our study only examined article structural arrangements with a focus on the sentence selection processes. We leave in-depth studies of articles with missing sections in their structural order according to IMRAD for future work.
Second, the readability levels of the full-texts of articles published in the field of Turkish LIS were calculated, and the corpus was largely classified as “difficult” according to the readability scale. However, the readability value of the abstracts produced by the same authors was significantly at the “very difficult” level. We observed that authors deliberately choose difficult-to-read language features in their abstracts, regardless of the language features they use in full-texts. Both ACA and ASA abstracts were calculated at the same readability level as full-text articles showing that selecting important sentences from full-text articles to generate automatic abstracts improves readability. Despite the reasons that lead authors to write difficult-to-read abstracts, widespread use of tools to select important sentences from the structural sections of full-texts may help to break this habit, which hinders scientific communication, over time.
After assessing the quality of SAG outputs, we found that having a well-organized full text improves the quality of both two output groups of SAG. It was observed that ASAs performed significantly better than ACAs. However, interestingly, ACAs also performed better as the number of structured sections increased, despite being produced without taking into account the structure of the full-text. This could be due to an increase in the structured content of original abstracts, resulting in greater similarity between structured and non-structured automatic abstracts and author abstracts. Alternatively, in the context of information retrieval, it means that authors can produce abstracts that convey information more accurately and have higher recall and precision scores when full-texts structural layout improves. We conclude that it is possible to argue that focusing on structural writing in full-texts alone can contribute to improving the content of the original abstracts produced by the author.
In the near future, we can expect to see various systems such as LLMs (Large Language Models), knowledge graphs, NER (Named Entity Recognition systems) systems, QA (Question Answering) systems, MT (Machine Translation) systems, and text summarization systems being used together to produce high-quality structured abstracts. We may also see the emergence of new tools that are specifically designed to assist researchers in communicating their findings more effectively.
Future research should explore more efficient and effective features for automatic summarization methods to generate summaries of scientific records in different languages and domains. Additionally, future research should investigate how the structure of the full-text can be further optimized to improve the quality of automatic summarization methods. Training domain-specific dictionaries would help to improve the accuracy, readability, and effectiveness of generated abstracts. We plan to train a model to classify structural sections of Turkish articles by employing our data for future research. Thus, we can fully automate the process of producing structured abstracts by learning systems. Different summarization approaches and algorithms should be applied to obtain more readable, high-quality structured abstracts. We also plan studies to train our data to predict the structural order of abstracts. A detailed analysis of user opinions on the readability issue can also be conducted. User studies can also reveal the best sentence weights depending on the structural sections of articles.
Finally, we verified that using structural sentence selection, abstract-generating systems can support scholarly communication as a supplementary tool for authors and editors.
Acknowledgements
This article is based on Özkan Çelik’s [50] Ph.D. dissertation and was supported in part by a research grant from The Scientific and Technological Research Council of Türkiye (Project No: SOBAG 115K440) [51].
Declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.