JLLT

Since its inception in 2010, the Journal of Linguistics and Language Teaching (JLLT) has been dedicated to providing a platform for academic publication. JLLT is a multilingual, open access, DOAJ-indexed journal.
For access to the journal's website and downloadable PDF files of all published issues, please navigate to:
https://www.journaloflinguisticsandlanguageteaching.com


edited by Thomas Tinnefeld
Showing posts with label 81 Florou. Show all posts
Showing posts with label 81 Florou. Show all posts

Journal of Linguistics and Language Teaching

Volume 16 (2025) Issue 2


Exploring Interlanguage Pragmatics: 

An Analysis of Italian Pragmatic Markers in

 Learner Corpora


Katerina Florou (National and Kapodistrian University of Athens) & Dimitris Bilianos (National and Kapodistrian University of Athens)


Abstract

This study examines the use of the Italian pragmatic markers quindi, allora, and dunque in learner and native speaker corpora. While pragmatic markers in English have been extensively studied, Italian remains relatively underexplored, particularly in learner corpora research. To address this gap, we compare the frequency and functions of these markers in the UniC learner corpus, containing essays by Greek learners of Italian (B2-C1 level), and the native speakers’ Coris Corpus. Using a form-to-function methodology, we categorise their usage as semantic markers or as pragmatic devices facilitating cohesion and discourse structuring. The results indicate notable discrepancies between learners and native speakers. Greek learners predominantly use quindi in its semantic function, aligning with grammatical conventions, whereas native speakers employ it more flexibly in pragmatic contexts. Allora appears frequently as a discourse-sustaining marker in native speech but is underused by learners, who associate it more with its temporal meaning. Dunque is rarely utilised by learners, while native speakers use it across various pragmatic functions. Statistical analyses reveal significant differences in distribution, suggesting L1 influence and restricted exposure to natural discourse. This research enhances our understanding of interlanguage pragmatics in Italian, demonstrating how L1 transfer and instructional focus shape learners' pragmatic competence. The findings inform language teaching by highlighting the need for greater emphasis on pragmatic markers in Italian instruction. Enhanced exposure to authentic usage patterns can facilitate learners' development of discourse competence, improving their ability to engage in fluid and natural interactions.

Keywords: Pragmatic markers, learner corpus research, interlanguage studies



1   Introduction

This study addresses a significant research gap by exploring the pragmatic use of Italian markers in learner corpora, particularly focusing on Greek learners of Italian. Although prior research has explored pragmatic markers in English and other major languages, this study is among the first to systematically examine their usage in Italian from a learner corpus perspective, thus offering new insights into L2 Italian pragmatics.

The study of a foreign language requires an examination and analysis of communication, which extends beyond a purely grammatical perspective. This includes the analysis and interpretation of a learner’s interlanguage from a pragmatic standpoint, especially when the language in question has not been extensively studied as a foreign language. In global research, pragmatic studies, particularly those focusing on pragmatic markers, rely heavily on corpus data, primarily from spoken language. Such data facilitate the analysis of their distribution across various communicative contexts and text types (Aijmer, 2004).

This study builds on past research on pragmatic markers in corpora and compares the usage of three Italian language markers in two corpora: one composed of native speakers and one of learners. The results will provide insights into both the interlanguage of learners and the frequency and quality of pragmatic marker usage by both groups, i.e. learners and native speakers.

Research in the field has utilised various corpora, such as the London-Lund Corpus (Aijmer, 2002; Paradis, 2003; Stenstrom, 1990a; Stenstrom, 1990b; Svartvik, 1980) and the Bergen Corpus of London Teenager Language (COLT) (Andersen, 2001; Stenstrom & Andersen, 1996), to study pragmatic markers in spoken discourse. Similarly, the MICASE corpus (Michigan Corpus of Academic Spoken English) has been used to investigate the use of hedges (Mauranen, 1994), while Van Bogaert (2009) analysed spoken data from the International Corpus of English-Great Britain (ICE-GB). The defining characteristic of corpus-based pragmatic analysis is that it begins with the identification of a specific linguistic form before analysing its function (O’ O’Keeffe, 2018). This form-to-function methodology allows for a detailed study of linguistic elements, such as pragmatic markers, in relation to their frequency, function, and position within a sentence and discourse. This approach can be applied to corpora of native speakers, parallel corpora and learners' corpora, thereby enhancing the accuracy of linguistic analysis.

Smaller-scale studies have also utilised spoken data from authentic conversations. For example, in Kevoe-Feldman, Robinson, and Mandelbaum’s (Kevoe-Feldman et al., 2011) research, data were drawn from recordings of 193 telephone conversations between customer service representatives and customers contacting an electronics company. Their study contributed to understanding how pragmatic factors shape participants’ perceptions of what constitutes a possibly complete speech unit and acceptable turn-taking points. Vaughan & Clancy (2013) argued that smaller, carefully collected, and context-specific corpora – both spoken and written – are particularly significant for pragmatic research. Many pragmatic features, such as deixis and discourse markers, play a fundamental role in communication. These features are often realised linguistically in short discourse units that appear frequently across corpora.

The availability of historical corpora, such as A Corpus of English Dialogues 1560–1760 (Kytö & Walker, 2006) has enabled studies on the historical evolution of pragmatic markers in English. This corpus includes dialogues from trials, depositions, plays, instructional texts, and prose, facilitating research on topics like the use of hedges in historical contexts (Culpeper & Kytö, 1999).

Additionally, cross-linguistic studies of pragmatic markers use parallel corpora to allow comparisons between different languages and the identification of equivalent expressions. Such studies have compared English with languages like Swedish and Dutch (Aijmer & Simon-Simon-Vandenbergen, 2003; Simon-Simon-Vandenbergen & Aijmer, 2002/2003).

The development of multimodal corpora has further expanded research perspectives on pragmatic markers, enabling the simultaneous analysis of linguistic and non-linguistic elements. Studies of this type have examined the relationship between gestures and verbal strategies employed in active listening, such as feedback mechanisms designed to enhance interaction in spoken discourse (Knight & Adolphs, 2007).

To refine the research questions further, special attention must be given to the area of comparative analysis, which includes corpora of non-native speakers (learner corpora). These corpora provide insight into the use of pragmatic markers by learners of English as a foreign language. Extensive research in this domain was conducted by Vyatkina & Cunningham (2015). A recent study by Huschova (2021) examined the use of the modal verbs can and could in speech acts produced by learners of English. Her study aimed to analyse syntactic patterns, semantic properties, and pragmatic functions using the Spoken English Corpus of Czech learners. Findings indicated that modal verbs are frequently used as modification mechanisms in indirect speech acts, particularly in conventionalised imperative expressions.

Other studies leveraging learner corpora include Corsetti & Perna (2017), who analysed English adverbs in spoken discourse produced by Brazilian learners of English, focusing on pragmatic aspects and their alignment with CEFR descriptors. Additionally, Buysse (2012, 2017, and 2020) investigated pragmatic markers such as in fact and actually using the LINDSEI corpus, while Gilquin (Gilquin, 2008) studied hesitation markers, and Hasselgren (2002) compared the English pragmatic markers well, sort of, and a bit in a corpus of native speakers and a learner corpus of Swedish-speaking learners. Aijmer (2004) also conducted extensive research on pragmatic markers in spoken learner corpora.

Few studies have explored pragmatic markers in Italian through the use of corpora. These studies have examined syntax, semantics, and usage by native and non-native speakers. Bellato & Sartori (2023) analysed the use of conjunctions and relative adverbs among secondary school students and proposed a guided approach to improve understanding. Branciforti & Duso (2023) examined the distinction between subordinating and coordinating conjunctions in school grammar, emphasising students' challenges in distinguishing between syntactic and textual structures. Ghezzi (2022) investigated three Italian pragmatic markers (un po', così, cioè) in corpora, while an earlier study with Molinelli (2014) examined the markers guarda, prego, and dai to emphasise their role in interaction-based discourse. 

Finally, De Cristofaro, Badan, & Belletti (2024) compared the use of pragmatic markers in Italian as a first (L1) and second language (L2), focusing on their syntactic positioning within sentences. Their findings showed that L1 speakers use pragmatic markers more frequently than L2 learners. Despite having the correct syntax, L2 learners often produce forms that are pragmatically inappropriate due to interference from their L1. This study adopted a cartographic approach and revealed that these markers occupy specific syntactic positions linked to modality. This offers new insights into the syntax-pragmatics interface in second language acquisition.

Overall, research on Italian as a second or foreign language remains limited in terms of pragmatic markers studied through learner corpora.


2    Research Questions

This study seeks to fill a notable gap in the field of interlanguage pragmatics by focusing on Italian, a language that remains underrepresented in learner corpus research. Specifically, we aim to understand how Greek learners of Italian use pragmatic markers compared to native speakers, thereby shedding light on cross-linguistic influence and interlanguage development.

The study of pragmatic markers has been one of the most productive areas in corpus-based research on second language (L2) pragmatics. In particular, numerous studies – primarily in English – have compared the use of pragmatic markers by native (L1) and non-native (L2) speakers, revealing that L2 learners tend to use these markers less frequently or with less variety than native speakers in similar contexts (Aijmer, 2004; Gilquin, 2008). This finding, combined with the observation that words with pragmatic functions are often interpreted purely semantically and the fact that very few studies have examined learner corpora for Italian, leads us to formulate the following research questions for a study comparing learner and native speaker corpora: 

  1. What are the frequencies of quindi, allora, and dunque in the learner corpus and the native speaker corpus?
  2. To what extent do non-native speakers use these discourse markers pragmatically (e.g., as cohesion markers, silence fillers) compared to native speakers?
  3. Are there instances of overuse or underuse of specific markers in the learner corpus?
  4. How does the learners’ respective native language (L1) influence their use of these discourse markers?


3   Corpora and Data 

3.1 Learner Corpus 

UniC is a learner corpus that includes written productions of non-native speakers of Italian with different levels of linguistic proficiency, ranging from B2 to C1 (according to the Common European Framework). Each sub-corpus represents a small collection of texts developed over the course of an academic year as part of a larger corpus, UniC (University Corpus). UniC was created primarily for studies on Greek learners of Italian as a foreign language and their interlanguage development (Florou, 2024). The dataset under consideration consists of essays written by university students across four different academic years. These essays were produced at an institution of higher education in a supervised environment, thereby ensuring that the style of writing is purely human:


Table 1: University Corpus and Sub-Corpora

All texts were saved in plain format so as to maintain consistency across all university years. The texts from the years 2021 and 2022 were designated by the name of each learner, whereas text files from 2023 were labelled sequentially. (e.g., 001.txt, 002.txt), while those from 2024 included the year in their filenames (e.g., 2024-001.txt). This labeling system facilitated easy differentiation between the four datasets.


3.2 Native Speakers’ Corpus

The Coris Corpus is a corpus of data from authentic conversations of native Italian speakers. As described by the pioneers of the project, it is a corpus of written Italian – CORIS – that has been under construction at the Centre for Theoretical and Applied Linguistics of Bologna University (CILTA) since 1998. The project aims to create a representative and sizeable general reference corpus of contemporary Italian, designed to be easily accessible and user-friendly. The Coris Corpus contains 80 million running words and it is updated every two years by means of a built-in monitor corpus (Rossini Favretti, 2000). It consists of a compilation of authentic texts in electronic form, which have been selected on the basis of their representativeness of written Italian. It is aimed at a broad spectrum of potential users, from Italian language scholars to Italian and foreign learners engaged in linguistic analysis based on authentic data and, in a wider perspective, all those interested in intra- and/or interlinguistic analysis. In addition to the established model, a dynamic model (CODIS) has been developed. This model facilitates the selection of sub-corpora relevant to specific research objectives, as well as the dimensions of each sub-corpus, with the intention of adapting the corpus structure to diverse comparative requirements. Several tools have been developed, both for corpus access and for corpus POS tagging and lemmatization (Rossini Favretti, 2000; Rossini Favretti et al., 2002). In this corpus we identify five sub corpora: press, fiction, academic prose, legal and administrative prose, miscellanea, and ephemera in a total of about 165 million words up to 2021.


3.3 Size and Genre

It is important to acknowledge potential methodological limitations due to differences in writing conditions and genres between the two corpora. Learner texts were produced in supervised academic settings; by contrast, the native texts were drawn from written discourse that had occurred in natural settings. These differences could introduce biases that affect the comparability of discourse features such as pragmatic marker usage.

Given that the size of UniC is fixed (105,249 words), as well as the time period of text collection, adjustments are necessary to ensure that the two corpora are as comparable as possible. From UniC, we will use the first three sub-corpora (2021-2023), since the fourth one has not yet been fully completed in terms of resource collection and recording. The total of these sub-corpora is approximately 100,000 words. Similarly, from the Coris Corpus, we will select the subset of texts collected during the latest period 2017-2020, focusing on the text type Miscellanea. It is posited that learners' texts, in which a memory or experience is described, do not belong to a specific text type (e.g., academic writing, correspondence, or business reports).

The Miscellanea sub-corpus for the selected period contains 1,400,000 words (Tamburini, 2022). Despite the disparity in size between the two corpora, it is possible to apply normalization procedures during the analysis of the results in order to balance the differences. Consequently, the results obtained from the search in the learner corpus and the native speakers' corpus will be derived from equivalent data, aligned in terms of size, text type, temporal dimension of language, and discourse form, which, in both cases, is written.


4   Results

4.1 Data Analysis

The Coris Corpus has a search platform with a very user-friendly interface for both users and researchers. However, the search results had to be recorded immediately and processed in an Excel spreadsheet, as unfortunately, there is no option to download them to a device or save them as they are. The data from the learner corpus UniC were processed using the AntConc tool (Anthony, 2014) to search for occurrences of pragmatic markers, count them, and evaluate them within their context.

The first step after uploading the data is the automatic extraction of all instances through keyword search. In the case of allora, 55 occurrences were found in UniC, but 34 of these had allora meaning ‘then’, i.e., as a temporal connector, e.g., Da allora ho cominciato a frequentarli (008.txt). In the Coris Corpus, there were 189 occurrences, but 60 of these were with allora as a temporal connector or even an adjective, e.g., la lettera che fu scritta dall' allora sindaco di Firenze (MONITOR2017_20:6586803). For the case of quindi, there were 91 occurrences in UniC and 748 in the Coris Corpus, while for dunque, there were 4 occurrences in UniC and 211 in the Coris Corpus. The table below presents the normalised results after removing the temporal use of the marker allora:


Table 2: University Corpus and Sub-Corpora


4.2. Categorisation of the Use

While the quantitative data provide a broad overview of usage trends, the qualitative analysis reveals specific learner difficulties. For instance, learners tend to misuse quindi in contexts that require more nuanced discourse functions. Native speaker examples often display layered pragmatic intent, combining coherence and interpersonal strategies, which are largely absent in learner usage. This suggests a gap in both pragmatic awareness and instructional focus.

The categorisation proposed here builds upon Redeker’s (1990) discourse coherence models and on the semantic versus pragmatic source of coherence. It was then partially confirmed by Carter & McCarthy (2006), who distinguished between conjunctions and pragmatic markers. They defined conjunctions as items used to mark logical relationships between words, phrases, clauses, and sentences. Neary-Sundquist (2013) applied this categorisation to English connectors by examining a learner corpus.

Gonzalez (2005) presents a protocol for distinguishing pragmatic markers. First, they are divided into semantic and pragmatic markers. In this distinction, semantic markers indicate logico-semantic, argumentative relations and have referential meaning and a function as conjunctions. In contrast, pragmatic markers can be categorised into two distinct types. Firstly, rhetorical markers are sequential and inferential in nature, serving to guide the speaker's intentions and convey illocutionary force. Secondly, structural markers provide a foundation for maintaining the coherence of the discourse network (Gonzalez, 2005).

A third category of markers as inferential components is also identified, which refers to the macro-function of the marker and is more relevant to spoken language. This third category is deemed unsuitable for use in the categorization of this study. Another category is included, which usually does not concern conjunctions but rather adverbs, adjectives, or other parts of speech: components of intensification (Fiorentini & Sansò, 2017; Pan, 2022). According to Biber (1991), intensifiers are commonly used to modify the semantic force of a word, pushing it either upwards, downwards, or somewhere in between, depending on the language user’s intention in social communication.

A classification system was thus devised for the purpose of analysing and comparing the use of markers in learners' interlanguage and the language of native speakers, as well as every occurrence of quindi, allora, and dunque:



Table 3: Categories of Markers

The subsequent table provides an overview of the utilisation of markers in the context of UniC:

Table 4: Uses of Markers in the UniC


The use of quindi as a logical connector predominates, while, on the other hand, allora appears primarily in its pragmatic function. The subsequent corpus samples demonstrate that specific usages of these markers are accompanied by particular marks of punctuation or collocations:


Semantic use:

I soldi che mi avevano prestato erano molto buoni, allora ho accettato. (Kali2.txt) 

Fortunatamente, Covid non esiste più, quindi è arrivato il momento... (010.txt) 

I risultati sarebbero pronti in tre giorni e dunque, l’unica cosa che potevamo fare era aspettare (SERRH.txt)


Pragmatic use 1:

Non sono mai venuta! Bene, allora devi venire! (GIATSI.txt) 

Concludendo quindi, vorrei dire che sono tornata (BASILEIOY.txt) 


Pragmatic use 2:

Allora perché incontrare il mio cane è stato un momento così importante? (SDRAK.txt) 

Quindi, sarebbe molto faticoso e richiederebbe molto tempo, allora non potrei studiare...(TZANID.txt)


Intensified use:

-Allora, suggerite! -Andiamo in Thailandia. (DIMOU.txt) 

Dunque un disastro! (022.txt) 

In the overall table of the Coris Corpus regarding the use of markers, the distribution of uses per marker varies: 


Table 5: Uses of the Markers in the Coris Corpus

Here, we observe that in some cases, learners align with native speakers. The most significant point is the overall use of dunque. While Greek learners of Italian use it considerably less, we see that in the Coris Corpus, it is extensively used across all categories, though not in a balanced manner.

The following discourse examples provide a clearer understanding of how native speakers use these markers:

Semantic use:

"Che piacere ... ma allora, vuol dire che nella nostra famiglia nasceranno altri bambini (MONITOR2017_20: 6582355)

...sono tarate in partenza, quindi le regole del gioco è come se fossero falsate. (MONITOR2017_20: 6570441)

...ha dunque come obiettivo primario quello di celebrare le potenzialità e la versatilità di un prodotto, (MONITOR2017_20: 6579268)


Pragmatic use 1:

Ecco che allora non si può dire qual è la più buona (MONITOR2017_20: 6571843)

Di conseguenza, quindi, metterei tra parentesi anche questo " odio " generazionale. (MONITOR2017_20: 6570447)

Nonostante, dunque, un gioco non certo entusiasmante, la corazzata di Allegri continua a sorreggersi su una cattiveria,.. (MONITOR2017_20: 6584761)


Pragmatic use 2:

Perché, allora, non provare a seguire un ' alimentazione che contenga con maggior frequenza quegli alimenti... (MONITOR2017_20: 6592452)

E quindi brodetti , paste all ' uovo , passatelli , salumi , arrosti e grigliate di pesce e di carne... (MONITOR2017_20: 6580643)

Vediamo, dunque, ci vuole solamente l ' inserimento di nuove tecnologie... (MONITOR2017_20: 6580720)


Intensified use:

Chi le fu vicino allora? (MONITOR2017_20: 6595812)

Ma quindi quanto sono sicuri i pagamenti digitali ? (MONITOR2017_20: 6575939)

Via dunque lo stereotipo della bambola magra...(MONITOR2017_20: 6593829)


4.3. Comparison of the Two Corpora

To enable a valid comparison between the two corpora, we first normalised the frequencies per 100,000 words, as there were significant differences in corpus size. The learners’ corpus is already approximately 100,000 words in length, meaning its normalided frequency was calculated as follows: (Raw Count / 100000) × 100000 = Raw Count)

For the other corpus, which consists of 1,400,000 words, the normalisation was performed using the formula: (Raw Count / 1400000) × 100000. 

Thus, after normalising the frequencies per 100,000 words, we obtained the following results:


Table 6: UniC Normalised Frequencies


Table 7: Coris Corpus Normalised Frequencies


4.4 Key Findings

An analysis of the corpora reveals distinct differences in how learners and native speakers use common Italian discourse markers:

  1. Quindi is significantly overused by learners (68.0 vs. 2.4 per 100,000 words in semantic use).
  2. Allora is much more frequent in pragmatic functions for learners compared to native speakers.
  3. Dunque is barely used by learners, but native speakers use it more variably.
  4. Intensified use is generally low across both corpora.



4.5 Statistical Analysis

In order to validate the aforementioned key findings, it is necessary to employ quantitative techniques, such as:

  • The chi-squared test (Chi2) is a statistical procedure that calculates the difference between observed and expected word usage frequencies between learners and native speakers. The result of this calculation is a chi-squared statistic, which is a number that measures how different the actual word usage is from what would have been expected if there were no real difference between learners and native speakers.
  • P value, which is used to measure the probability. This coefficient indicates the probability of observing the same word usage patterns (or more pronounced patterns) in the absence of any discernible distinction between learners and native speakers.


Table 8: Statistical Indexes

Chi-squared statistic (52.68): A higher chi-squared value indicates a greater disparity between the observed word usage and the expected usage under the assumption of no difference. In this case, a value of 52.68 suggests a significant difference in how learners and native speakers use these words.

P-value (1.36e-09): A very small p-value (such as the one observed here, which is effectively zero) implies that the likelihood of these differences occurring purely by chance is extremely low. In this study, the p-value provides strong evidence of a genuine difference in word usage between learners and native speakers.


5   Conclusions and Applications

These findings contribute to both theoretical and pedagogical domains. Theoretically, the study enhances our understanding of the syntax-pragmatics interface in L2 Italian, demonstrating how L1 influences shape discourse structuring. Pedagogically, the results advocate for a curriculum that integrates exposure to authentic discourse and explicit instruction on pragmatic marker usage.

The aim of this study was to utilise NS Corpora and Learner Corpora in order to investigate pragmatic markers that may or may not pose a challenge for Greek learners of Italian. The overuse or underuse of certain hedges in pragmatic use is not a straightforward task. This is primarily because there is no automated annotation for pragmatic markers (Biber, 1991). Such an analysis requires the involvement of at least two researchers or one researcher along with a proficient speaker of the target language – in this case, Italian.

However, this is not the sole challenge. Beyond the fact that pragmatic uses are more evident in spoken corpora, they also require extensive data. In a small corpus – of around 100,000 words –, reliability and representativeness are at risk (Vaughan & Clancy, 2013), as is the ability to compare findings with large native speaker corpora.

Nevertheless, it is evident that significant differences were observed between the two corpora. Notably, the annotation of the native speaker corpus was relatively straightforward, considering that the pragmatic or non-pragmatic use of a marker often coincided with the textual structure. Certain patterns were identified, primarily among native speakers and, to a lesser extent, among learners, in relation to punctuation and the placement of markers within sentences. Specifically, two commas often frame quindi and dunque, while allora appears less frequently in this structure when used in a purely pragmatic function. Additionally, a comma is observed before quindi when introducing a conclusive clause, indicating its semantic function.

Non-native speakers, for the most part, adopt specific patterns regarding their most frequently used markers. They tend to use quindi primarily in its semantic function rather than as a pragmatic marker, strictly following grammatical rules. Regarding the learning process, it is hypothesised that quindi has been established in learners’ minds as the equivalent of the Greek ώστε (‘so that’), which explains its predominant use in conclusive statements.

Allora appears less frequently as a marker of sustaining or intensifying discourse in the learner corpus. For learners, its acceptable use is to connect ideas and contribute to text cohesion. Notably, learners follow the rule by using allora not only as a pragmatic marker but also as a temporal adverb, though less frequently as a conclusive conjunction. It is likely that they associate allora with the Greek τότε (‘then’). These observations are not entirely novel when considering language acquisition processes. Corsetti & Perna's (2017) research posits that learners who are first exposed to a foreign language and subsequently trained in it exhibit divergent learning outcomes compared to those who are first trained and only later exposed to it. Given that Greek learners follow the latter path, it is likely that they rely more on rules and stereotypical patterns of assimilation.

This phenomenon has been previously described by Hasselgren (1994) as the Lexical Teddy Bear effect, referring to the overuse of certain words that provide a sense of security to learners. Conversely, dunque appears less frequently among non-native speakers due to its more formal nature and lacks the range of uses observed among native speakers. In fact, dunque is not typically emphasised in language instruction, leading to either a fixed single usage or underuse, a trend that has also been noted in research on other languages, such as English (Trillo, 2002).

Another observation that may be linked to the learners’ native language, Greek, is the position of quindi and allora within the sentence when they function as discourse markers used for bridging speech. They are typically found at the beginning of a clause or sentence. In Greek, the equivalents λοιπόν (‘well’) or τότε (‘then’) appear in the same position with the same function, particularly in spoken and written discourse (Georgakopoulou & Goutsos, 1998).

The findings of the present study have the potential to contribute valuable pedagogical recommendations for the teaching of pragmatic markers to learners of Italian as a foreign language. It is hypothesised that increased exposure to Italian, and its authentic communicative environment, could enhance learners' pragmatic awareness. A secondary significant implication pertains to Italian language instructors, particularly in relation to the teaching strategies they employ in the classroom and the curriculum design they develop.


Acknowledgements 

We would like to extend our gratitude to the students of our department, as well as to the Laboratory of Linguistic Analysis and Computational Processing of Romance Languages, for their invaluable assistance in developing a learner corpus. We would like to express our profound gratitude to our Italian colleagues for their invaluable contribution to the corpus annotation.



References

Aijmer, K. (2002). English discourse particles: Evidence from a corpus. John Benjamins.

Aijmer, K. (2004). Pragmatic markers in spoken interlanguage. Nordic Journal of English Studies, 3(1), 159–172.

Aijmer, K., & Simon-Simon-Vandenbergen, A.-M. (2003). Well in English, Swedish and Dutch. Linguistics, 41(6), 1123–1161.

Andersen, G. (2001). Pragmatic markers and sociolinguistic variation. John Benjamins.

Anthony, L. (2014). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University.

Archer, D., Aijmer, K., & Wichmann, A. (2013). Pragmatics: An advanced resource book for students. Routledge.

Bellato, A., & Sartori, M. (2023). Verso la scoperta guidata: Congiunzioni e avverbi anaforici, coordinazione e giustapposizione. Italiano LinguaDue, 15(2), 500-515.

Biber, D. (1991). Variation across speech and writing. New York: Cambridge University Press.

Branciforti, G., & Duso, E. M. (2023). Frase o testo? Congiunzioni e avverbi anaforici nelle grammatiche scolastiche. Italiano LinguaDue, 15(2), 447-474.

Buysse, L. (2012). So as a multifunctional discourse marker in native and learner speech. Journal of Pragmatics, 44(13), 1764-1782.

Buysse, L. (2017). The pragmatic marker you know in learner Englishes. Journal of Pragmatics, 121, 40-57.

Buysse, L. (2020). ‘It was a bit stressy as well actually’. The pragmatic markers actually and in fact in spoken learner English. Journal of Pragmatics, 156, 28-40.

Carter, R., & McCarthy, M. (2006). Cambridge grammar of English: A comprehensive guide. Spoken and written English. Grammar and usage. Cambridge: Cambridge University Press.

Corsetti, C. R., & Perna, C. L. (2017). A corpus-based study of pragmatic markers at CEFR level B1. Letras de Hoje, 52(3), 302-309.

Culpeper, J., & Kytö, M. (1999). Modifying pragmatic force: Hedges in a corpus of Early Modern English dialogues. In A. Jucker, G. Fritz, & F. Lebsanft (Eds.), Historical dialogue analysis (pp. 293–312). John Benjamins

De Cristofaro, E., Badan, L., & Belletti, A. (2024). Discourse markers in L1 and L2 Italian: A cartographic analysis of the sentence-internal position. Second Language Research, 40(3), 643-673.

Fiorentini, I., & Sansò, A. (2017). Intensifiers between grammar and pragmatics. In Exploring Intensification: Synchronic, diachronic and cross-linguistic perspectives (Vol. 189, pp. 173).

Florou, K. (2024). The effect of English as a second foreign language on learning Italian as a third foreign language: A learner corpus-based research in written speech. International Journal of Language and Literary Studies, 6(3), 407-419.

Georgakopoulou, Alexandra & Goutsos, Dionysis. "Conjunctions versus discourse markers in Greek: the interaction of frequency, position, and functions in context" Linguistics, vol. 36, no. 5, 1998, pp. 887-918. https://doi.org/10.1515/ling.1998.36.5.887

Ghezzi, C. (2022). Vagueness markers in Italian: Age variation and pragmatic change. Milan:  Franco Angeli, 344.

Ghezzi, C., & Molinelli, P. (2014). Italian guarda, prego, dai: Pragmatic markers and the left and right periphery. In A. Jucker, G. Fritz, & F. Lebsanft (Eds.), Discourse Functions at the Left and Right Periphery (pp. 117-150). Brill.

Gilquin, G. (2008). Hesitation markers among EFL learners: Pragmatic deficiency or difference. Pragmatics and corpus linguistics: A mutualistic entente (Vol. 2, pp. 119-149).

González, M. (2005). Pragmatic markers and discourse coherence relations in English and Catalan oral narrative. Discourse Studies, 7(1), 53-86.

Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics, 4(2), 237-258.

Huschová, P. (2021). Modalized speech acts in a spoken learner corpus: The case of and. Topics in Linguistics, 22(1), 27-37.

Kevoe-Feldman, H., Robinson, J. D., & Mandelbaum, J. (2011). Extending the notion of pragmatic completion: The case of the responsive compound action unit. Journal of Pragmatics, 43(15), 3844-3859.

Knight, D., & Adolphs, S. (2007). Multi-modal corpus pragmatics: The case of active listenership. In J. Romero-Trillo (Eds.), Corpus and pragmatics: A mutualistic entente (pp. 175–190). Mouton de Gruyter.

Kytö, M., & Walker, T. (2006). Guide to a Corpus of English Dialogues 1560–1760. Studia Anglistica Upsaliensia 130. Acta Universitatis Upsaliensis.

Mauranen, A. (1994). They’re a little bit different…Observations on hedges in academic talk. In K. Aijmer & A.-B. Stenström (Eds.), Discourse patterns in spoken and written corpora (pp. 173–197). John Benjamins.

Neary-Sundquist, C. (2013). The development of cohesion in a learner corpus. Studies in Second Language Learning and Teaching, 3(1), 109-130.

O’Keeffe, A. (2018). Corpus-based function-to-form approaches. In Methods in pragmatics (pp. 587).

Pan, Y. (2022). Intensification for discursive evaluation: A corpus-pragmatic view. Text & Talk, 42(3), 391-417.

Paradis, C. (2003). Between epistemic modality and degree: The case of really. Topics in English Linguistics, 44, 191-222.

Redeker, G. (1990). Ideational and pragmatic markers of discourse structure. Journal of Pragmatics, 14(3), 367-381.

Rossini Favretti, R. (2000). Progettazione e costruzione di un corpus di italiano scritto: CORIS/CODIS. In R. Rossini Favretti (Eds.), Linguistica e informatica. Multimedialità, corpora e percorsi di apprendimento (pp. 39-56). Bulzoni.

Rossini Favretti, R., Tamburini, F., & De Santis, C. (2002). A corpus of written Italian: A defined and a dynamic model. In A. Wilson, P. Rayson, & T. McEnery (Eds.), A Rainbow of Corpora: Corpus Linguistics and the Languages of the World (pp. 39-56). Lincom-Europa.

Simon-Simon-Vandenbergen, A.-M., & Aijmer, K. (2002/2003). The expectation marker of course. Languages in Contrast, 41(1), 13–43.

Stenström, A.-B. (1990a). Pauses in monologue and dialogue. In J. Svartvik (Eds.), The London-Lund Corpus of Spoken English: Description and research (pp. 211–252). Lund University Press.

Stenström, A.-B. (1990b). Lexical items peculiar to spoken language. In J. Svartvik (Eds.), The London-Lund Corpus of Spoken English: Description and research (pp. 137–175). Lund University Press.

Stenström, A.-B., & Andersen, G. (1996). More trends in teenage talk: A corpus-based investigation of the discourse items cos and innit. In C.E. Percy, C.F. Meyer, & I. Lancashire (Eds.), Synchronic corpus linguistics (pp. 177–190). Rodopi.

Svartvik, J. (1980). ‘Well’ in conversation. In S. Greenbaum, G. Leech, J. Svartvik, & V. Adams (Eds.), Studies in English linguistics for Randolph Quirk (pp. 167–177). Longman.

Tamburini, F. (2022). I corpora del FICLIT, Università di Bologna: CORIS/CODIS, BoLC e DiaCORIS. In Corpora e Studi Linguistici. Atti del LIV Congresso Internazionale di Studi della Società di Linguistica Italiana (pp. 189-197). Società di Linguistica Italiana.

Trillo, J. R. (2002). The pragmatic fossilization of discourse markers in non-native speakers of English. Journal of Pragmatics, 34(6), 769-784.

Van Bogaert, J. (2009). A reassessment of the syntactic classification of pragmatic expressions: The positions of you know and I think with special attention to you know as a marker of metalinguistic awareness. In A. Renouf & A. Kehoe (Eds.), Corpus linguistics: Refinements and reassessments. Amsterdam: Rodopi.

Vaughan, E., & Clancy, B. (2013). Small corpora and pragmatics. In Yearbook of corpus linguistics and pragmatics 2013: New domains and methodologies (pp. 53-73). Dordrecht: Springer Netherlands.

Vyatkina, N., & Cunningham, D. J. (2015). Learner corpora and pragmatics. In Granger, S., Gilquin, G. & Meunir, F., The Cambridge handbook of learner corpus research (pp. 281-306). Canbridge: Cambridge University Press.



Authors:

Dr Katerina Florou

Assistant Professor of Corpus Linguistics

Department of Italian Language and Literature

National and Kapodistrian University of Athens

GREECE

Email: katiflorou@gmail.com kathyflorou@ill.uoa.gr

Orcid ID: https://orcid.org/0009-0001-3062-5728 


Dimitris Bilianos

Researcher in Computational Linguistics

Department of Italian Language and Literature

National and Kapodistrian University of Athens

GREECE

Email: dbilianos@ill.uoa.gr 

Orcid ID: https://orcid.org/0009-0009-6819-9726