The effects of human-like social cues on social responses towards text-based conversational agents—a meta-analysis – Nature

Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Humanities and Social Sciences Communications volume 12, Article number: 1322 (2025)
10k Accesses
21 Citations
Metrics details
Humanizing chatbots through social cues is a common strategy to increase user acceptance. However, whether and in which circumstances this strategy is generally effective is still unclear. This meta-analysis thus examines the effect of text-based chatbots’ social cues on users’ social responses and the influence of potential moderators. It includes experimental studies that manipulate human-likeness using social cues and examine their effects on user responses, including attitude, perception, affect, rapport, trust, and behavior. A systematic search for published and unpublished research resulted in a final sample of 800 effect sizes from 199 datasets reported in 142 papers (N = 41,642). Meta-analytic random-effects models computed overall and for each outcome category yielded a small effect of human-likeness on social responses (g = 0.36, 95% CI [0.27, 0.44]). The results further suggested that human-like chatbot characteristics improve user responses to varying degrees and under different boundary conditions. The findings can guide practitioners in designing effective and ethically justifiable chatbots.
Not just since ChatGPT and other chatbots powered by large language models (LLMs) were introduced and made available to a large audience, advances in artificial intelligence (AI) have spurred the implementation of text-based chatbots across various sectors, including service, sales, counseling, and consulting (Jacobson and Gorea, 2023; Liu and Yao, 2023). Chatbots are text-based conversational agents that communicate with users through natural language (Shawar and Atwell, 2007). Unlike embodied agents like the social robot Pepper, text-based chatbots communicate exclusively over a “messaging-based interface” (Araujo, 2018, p. 184). Chatbots usually have no or less physical or virtual embodiment, represented, for instance, by facial expressions, body movements, and gaze (Krämer et al., 2009). In contrast to voice-based agents such as Alexa, neither the user nor the chatbot can express themselves via different volumes and pitches. Thus, differences in cognitive workload (Le Bigot et al., 2004; Rzepka et al., 2022) and processing fluency (Zierau et al., 2023) emerge.
Using chatbots has now become an integral part of many people’s daily lives. Take ChatGPT, e.g., whose user base has grown to 400 million weekly users since its launch in 2022 (Reuters, 2025). AI-based chatbots assist users in everyday situations by helping them find information and carry out tasks (Anderl et al., 2024). Chatbots promise organizations cost savings while at the same time upholding a conversational human voice and customer engagement (Kelleher, 2009; Steinhoff et al., 2019). While organizations commonly employ chatbots to communicate with consumers, users do not always readily adopt them, suggesting that chatbots might not be meeting users’ expectations. This is illustrated by a recent study in which only 8% of customers surveyed had turned to a chatbot during their last customer service encounter. Moreover, only a quarter of those planned to reuse the chatbot (LoDolce and Brackenbury, 2023).
One strategy commonly applied to increase user satisfaction and adoption lies in equipping the chatbot with social cues, i.e., human-like characteristics such as a smiling avatar or a casual conversation style (Chaves and Gerosa, 2021). There is a large body of experimental research generally suggesting beneficial effects of human-likeness, for instance, on service satisfaction (Park et al., 2023), trust (Chen et al., 2023), and intention to use (Lee et al., 2020). However, positive effects are inconsistent across studies and outcomes (Araujo, 2018; Haugeland et al., 2022). Under certain circumstances, human-like characteristics can also harm user responses, e.g., due to the chatbot triggering uncanny emotions (Ciechanowski et al., 2019; Mori, 1970), when users enter an interaction in anger (Crolic et al., 2022), when the service interaction is flawed (Brendel et al., 2023), or when chatbot users are experienced (Gnewuch et al., 2022). This line of research has already been summarized in systematic reviews on the impact of social characteristics on user responses (Chaves and Gerosa, 2021; Rapp et al., 2021). Nevertheless, it is still open whether and when chatbots should be equipped with such cues. Therefore, the overall impact and practical implications of providing chatbots with human-like characteristics remain unclear. The current study addresses this gap by complementing the findings of recent systematic reviews with a meta-analysis of the effect of human-like social cues on users’ social responses.
Previous meta-analyses on text-based chatbots investigated their general effectiveness in the health domain (Abd-Alrazaq et al., 2020; Singh et al., 2023) or the antecedents of intention to use service chatbots using a technology acceptance approach (Gopinath and Kasilingam, 2023). Two meta-analyses on the impact of human-like features on user responses have been conducted in the related fields of human–robot interaction and marketing. Roesler et al. (2021a) found that anthropomorphic design features in physically embodied robots moderately improve user responses. Blut et al. (2021) found (perceived) anthropomorphism and intention to use service agents to be strongly correlated. While the authors also included studies on text-based service chatbots, these represented only eight percent of the included studies. Although these two meta-analyses provide the first evidence that human-like social cues generally increase user responses, it is unclear whether the findings can be transferred to disembodied, text-based chatbots.
Due to the conflicting findings on the effectiveness of social cues for users’ perceptions and attitudes towards using text-based chatbots, and since existing systematic reviews did not include meta-analyses, the current research aims to provide a quantitative synthesis. The objective is to investigate the overall effect of social cues on users‘ social responses to text-based chatbots. In addition, previous research points to several factors related to the users, the agent, the context, and the method that potentially moderate this effect (Roesler et al., 2021a). Thus, this study investigates the conditions under which social cues potentially facilitate (or hinder) social responses to text-based chatbots. This meta-analysis draws on the media are social actors (MASA) paradigm as a suitable theoretical framework to investigate the impact of human-like social cues on users‘ social responses towards text-based chatbots (Lombard and Xu, 2021). The findings will contribute to the fields of human–computer interaction (HCI) and human-machine communication (HMC), specifically to the research stream on text-based chatbots and the literature on the MASA framework by clarifying the extent to which, and the conditions under which human-like social cues improve users‘ responses to text-based chatbots.
A fitting theoretical approach for investigating the relationship between social cues and social responses to chatbots is the MASA paradigm (Lombard and Xu, 2021). It has been proposed as an extension of the Media Equation (Reeves and Nass, 1996) and the following computers are social actors (CASA) paradigm (Nass and Moon, 2000), which posits that people respond to machines (including television sets, chatbots, etc.) as if these were genuine social actors. CASA proposes that people transfer social scripts they have developed in human–human interactions to their interactions with machines. This idea is closely connected to the concept of anthropomorphism, which describes the tendency of people to attribute human characteristics to non-human entities, such as chatbots (Epley et al., 2007).
Extending CASA, MASA emphasizes the impact of social cues of media agents on social responses. Media agents refer to technologies that exhibit “sufficient social cues to indicate the potential to be a source of social interaction“ (Gambino et al., 2020, p. 73). Within the MASA paradigm, social cues are defined as “features salient to observers because of their potential as channels of useful information” (Fiore et al., 2013, p. 2). Social cues are posited to signal human-like characteristics, which in turn, influence social responses, i.e., outcomes that represent users’ attitudes, perceptions, positive and negative affective, relational, trust-related, and behavioral reactions towards a chatbot, a chatbot message, or an interaction with a chatbot (Huang and Wang, 2023; Krämer et al., 2009; Xu et al., 2023). Social cues like a colloquial conversation style, emoji use, and expressions signaling active listening are thus commonly employed to make the chatbot’s appearance and behavior more similar to human appearance and behavior, i.e., more human-like (Nass and Moon, 2000).
MASA suggests that the more social cues a technology presents, the more likely humans are to act socially towards it, and the more positive users’ responses will be (Lombard and Xu, 2021). In addition, the paradigm introduces a hierarchy of social cues by categorizing them according to their expected impact on social responses. Whereas primary cues (e.g., gaze and movement) are considered more salient and sufficient to elicit social responses, secondary cues (e.g., language use) are considered less salient and not as powerful in eliciting social responses (Lombard and Xu, 2021). Social cues can also be categorized on a more technical level of the design element they relate to (Feine et al., 2019). For instance, social cues can be visual (avatar-related, e.g., smile or chat-related, e.g., emoji use), verbal (related to content, e.g., small talk, or style, e.g., language use, speech style, message contingency), invisible (e.g., response times) (Feine et al., 2019), or related to the interaction (Zierau et al., 2020).
Within the scientific fields of HMC and HCI, a dedicated strand of literature deals with disembodied, text-based chatbots and the effects of their design features on user responses. While initial questionnaire studies focused on people’s motivations for (Brandtzaeg and Følstad, 2017, 2018) and experiences while using chatbots (Følstad and Brandtzaeg, 2020), a large body of experimental research on the impact of various design or interface characteristics on user engagement and acceptance has developed (e.g., Araujo, 2018; Beattie et al., 2020; Go and Sundar, 2019; Lee et al., 2020).
Prior experimental studies generally indicate positive effects of human-like social cues in chatbots. Positive effects of visual cues in counseling chatbots have been found on satisfaction and reuse intention (Park et al., 2023) as well as on trust and intention to comply with recommendations (Park et al., 2023). Chen et al. (2023) concluded that human-like appearance and conversational style increased feelings of social presence, trust, and satisfaction with the chatbot. In an experiment in the customer service context, Lee et al. (2020) found higher effects of mind perception on closeness when the chatbot used backchanneling and paralinguistic cues, increasing intention to use. Related, in their experiment involving a dating chatbot, Scherer et al. (2020) found longer interaction times and conversation volumes when the chatbot employed phatic cues. Prior research has also shown positive effects of emoji use (Beattie et al., 2020), delayed response times (Holtgraves et al., 2007; Gnewuch et al., 2018), and verbal expressions of sympathy and empathy (Liu and Sundar, 2018). Positive effects have also emerged for human–like interaction mechanisms; e.g., free text (vs. button) interaction has been shown to reduce humanness perceptions and social presence (Diederich et al., 2019).
On the contrary, several studies suggest that human-likeness effects are not consistently positive across outcomes. For instance, in his seminal experiment, Araujo (2018) found significant effects of human-like social cues on users’ perceived anthropomorphism and emotional connection with the company, but not on their feelings of social presence, attitude towards, and satisfaction with the company. Regarding cues relating to the interaction mechanism, Haugeland et al. (2022) and Klein et al. (2025) found reverse effects: structured interactions were favored over human-like free-text interactions.
Human-like chatbot characteristics can be a double-edged sword, potentially harming social responses in specific situations. For instance, Ciechanowski et al. (2019) showed in their experiment on the impact of human-likeness on affective responses that a more human-like chatbot triggered more uncanny emotions, negative affect, and more intense psychophysiological reactions compared to a simple text-based chatbot. A series of experiments by Crolic et al. (2022) yielded negative human-likeness effects on customer satisfaction and purchase intentions when the customer entered the chatbot interaction angrily. Related, Brendel et al. (2023) concluded that human-like cues lead to user frustration in flawed service interactions. Previous chatbot experience might also hinder the effectiveness of social cues. For instance, Gnewuch et al. (2022) found that dynamically delayed (vs. instant) response times only improved user responses for novice chatbot users.
Some of these findings have been systematically brought together in literature reviews on the impact of social characteristics on user-related outcomes (Chaves and Gerosa, 2021) and the human side of chatbot interaction (Rapp et al., 2021). However, a quantitative synthesis is missing, which is why I ask the following research question:
RQ1: What is the overall effect of text-based chatbots’ social cues on users’ social responses?
Prior research suggests that the effectiveness of social cues is dependent on several factors. These factors include characteristics of the user (or, in a meta-analytic context, the sample), the chatbot agent employed in the study, the context in which the study was placed, and the methods used in the study (Blut et al., 2021; Roesler et al., 2021a).
The current research focuses on mean age, gender, and chatbot experience as potential sample-related moderators. On average, older adults consider new technologies to be more complicated (Scopelliti et al., 2005), prefer concise, abstract (vs. humanoid), and static agents to reduce distraction (Chattaraman et al., 2011), and show a lower tendency to anthropomorphize agents (Letheren et al., 2016). Thus, human-likeness effects on social responses may be smaller for older people. However, Straßmann et al. (2018) found that seniors showed higher use intention and bonding with human-like (vs. non-human-like) virtual agents while students showed the opposite. Regarding gender, women are likely to be more skeptical of new technologies than men (Scopelliti et al., 2005), potentially leading to larger human-likeness effects on social responses for men. On the contrary, prior research suggests that women appreciate a robot’s human–like interaction and behavior more (Pelau et al., 2021), suggesting larger human-likeness effects for them. Highly experienced chatbot users, familiar with technology and interface and used to being exposed to social cues, may experience smaller human-likeness effects due to their knowledge of underlying technologies (Gnewuch et al., 2022).
Whether agent characteristics moderate human-likeness effects on social responses is another question this meta-analysis aims to answer. Previous meta-analyses included the agent’s assigned gender as a potential moderator (Blut et al., 2021). Feine et al. (2020) noted that most text-based chatbots have gender-specific cues (e.g., names and avatars), often implying a female gender. These cues may lead users to apply human gender stereotypes to chatbots. For instance, Kim et al. (2021) found that users expect different kinds of tasks from female and male chatbots, e.g., having a chat, advising, and healthcare were expected to be carried out by female chatbots, while male chatbots were rather expected to do professional work. “Female” stereotypes relating to likability and helpfulness (Koenig, 2018) might enhance the effect of human-like cues. Besides agent gender, I focus on agent characteristics specific to text-based chatbots, i.e., their visual representation, implementation of human-likeness, interaction mechanism, and technological foundation. Visual social cues matter (Go and Sundar, 2019), which aligns with MASA’s proposition that cues that are more strongly connected to people’s perceptions of an agent’s socialness (i.e., primary cues like a human-like avatar) are more effective in eliciting social responses (Lombard and Xu, 2021). Therefore, social cues might be more effective with human-like avatars. As interactions with chatbots rely on written text, verbal (vs. other) cues might be more effective than visual, invisible, and interaction cues. In contrast, MASA’s differentiation between primary and secondary social cues might also suggest that secondary cues, such as language use, are not as efficient (Lombard and Xu, 2021). Structured conversational flows using predefined answer buttons may appear less human-like than unstructured free-text interactions (Diederich et al., 2019). Unstructured interactions may magnify the positive human-likeness effects, improving social responses. Last, compared to simple rule-based chatbots, more elaborate technological bases, e.g., LLMs, enable more complex and, thus, more human-like conversations (Abercrombie et al., 2023), which might magnify positive human-likeness effects.
Application context and task criticality might moderate human-likeness effects. Effects might be larger in hedonic contexts as chatbot interactions here shall provide pleasure and entertainment, focusing on social aspects like relationships and emotions, compared to functional, task-based contexts like customer service (Huang and Wang, 2023; Roesler et al., 2021a). Task criticality refers to whether the stakes for users are high (Blut et al., 2021). Stakes are high in insurance, banking, and mental health counseling, but generally lower for tasks like recipe or movie selection and chit-chat. Larger effects for critical contexts are plausible, persuading people to use a chatbot more readily. However, the effects might also be larger in uncritical tasks, as users might be more relaxed, and human-likeness might contribute to a more natural interaction and a positive user experience. Previous research has shown that users were more willing to give out their credit card information to a machine compared to a human agent because the machine was perceived as more objective and secure (i.e., machine heuristic, Sundar and Kim, 2019). Thus, less human-like chatbots might be preferred for critical tasks because of higher perceived objectivity and trustworthiness.
The current research also includes method characteristics that have been shown to influence meta-analytic results and are specific to research on media agents as potential moderators. Student samples, being more homogeneous in their composition than non-student samples, can lead to larger effect sizes (Orsingher et al., 2010). Laboratory experiments provide larger control than online and field experiments, and letting participants have an actual interaction experience might be considered more realistic than showing prerecorded materials, potentially resulting in larger effect sizes (Greussing et al., 2022). Other moderators include the year of publication (as chatbot experience might increase with time), the sample country (i.e., whether the research was conducted in the US) (Blut et al., 2021), and the scientific field (i.e., whether the paper was published in a communication science outlet). I assessed potential publication bias by considering publication status and whether a paper has been peer-reviewed. Further, I included whether a study was preregistered, as non-preregistered studies could have larger effect sizes than preregistered ones (Van den Akker et al., 2023).
To consolidate previous, partly contradictory findings and to explore potential factors that additionally moderate the effects of social cues on social responses to text-based chatbots, I ask the following broad research question:
RQ2: What sample-, agent-, context-, and method-related factors moderate the effects of text-based chatbots’ social cues on users’ social responses?
Figure 1 presents the conceptual overview for this meta-analysis.
The conceptual overview.
This meta-analysis followed the preferred reporting items for systematic review and meta-analysis (PRISMA) framework (Page et al., 2021), with a preregistered PRISMA protocol (Moher et al., 2015; Shamseer et al., 2015) on the Open Science Framework (OSF): https://osf.io/9t2q8. I adapted the templates provided by Moreau and Gamble (2022) for reporting search queries, search results, extracted data, deviations from the protocol, and the call for unpublished work. The deviations from the protocol can be found in the Supplementary Table S2.
Supporting data and corresponding R code files are accessible at https://osf.io/cezmu/. Since a paper (e.g., a journal article, conference paper, or preprint) may contain the results of several experiments, I use the term dataset to distinguish between different data collections (i.e., studies) within a paper throughout the method and results sections (Van Berlo et al., 2023).
The eligibility criteria were developed following the population, intervention, comparison, and outcomes (PICO) framework (Richardson et al., 1995). The population included healthy adults, where healthy refers to the absence of disease or impairment. Studies with human-likeness interventions, i.e., manipulations of human-likeness using one or multiple (e.g., visual, verbal, invisible, interaction) social cues, were eligible (Feine et al., 2019; Lombard and Xu, 2021). Comparisons were made between human-like text-based chatbots and those with less human-like or machine-like characteristics. Studies that assess one or multiple outcomes relating to attitude, perception, positive and negative affect, rapport, trust, and behavior were eligible. These categories were derived from existing HMC and HCI literature, including conceptual articles (Lombard and Xu, 2021; Nass and Moon, 2000), (systematic) reviews (e.g., Abercrombie et al., 2023; Chaves and Gerosa, 2021; Oliveira et al., 2021; Rapp et al., 2021; Van Pinxteren et al., 2020), and meta-analyses (Blut et al., 2021; Roesler et al., 2021a). In addition, the categories were discussed with fellow HMC researchers with expertise in user responses to conversational agents and meta-analysis to ensure all relevant social responses were included. Table 1 displays the dependent variables, definitions, and example operationalizations.
A systematic literature search in the electronic databases Web of Science, ERIC, PsycInfo, Business Source Premier, EconLit, Academic Search Premier, IEEE Xplore Digital Library, ACM Digital Library, Open Access Theses and Dissertations (OATD), OpenDissertations, Networked Digital Library of Theses and Dissertations (NDLTD), and Google Scholar (first 500 search results) was carried out. All records I had access to by 15 September 2023 were considered. The terms used for the literature search included combinations of chatbot, human-likeness, empirical study, and synonyms. A specific query for each database was developed. Other search activities included scanning reference lists of included papers or prior reviews (i.e., Abercrombie et al., 2023; Chaves and Gerosa, 2021; Rapp et al., 2021; Van Pinxteren et al., 2020). Additionally, I sent requests for unpublished research to discipline-specific mailing lists and my professional network. The literature search terms can be found in Supplementary Table S3. An example search query is provided as Supplementary Information. The literature search and screening protocol, as well as the search queries for each database, can be accessed at https://osf.io/cezmu/.
After removing duplicates, the title, abstract, and keywords of the search results were screened manually using the free online tool Abstrackr (Wallace et al., 2012). Initially, two coders screened a sample containing the same 100 records (see inclusion and exclusion criteria in Supplementary Table S4). Cohen’s Kappa for the categories “excluded” vs. “not excluded” was computed. “Not excluded” records included records tagged as “included” and “uncertain” that would later be eligible for a full screening. A moderate Cohen’s Kappa of 0.463 resulted, which was considered sufficient based on the criteria proposed by Landis and Koch (1977). The coding results were thoroughly discussed among the coders, and disagreements were solved by consensus. Thus, two coders rated all identified records. We included records (1) if they relied on empirical research, (2) on text-based chatbots, (3) with human-likeness as an independent variable, and (4) at least one social response as the dependent variable. We excluded records when retracted or when they referred to embodied robots, augmented reality avatars, or voice-based agents. Studies on children, conference prefaces, newspaper or blog articles, reviews, and qualitative papers were also excluded. Due to personal language constraints, we only included papers published in English or German. Then, I conducted the second screening of the search results, considering the full eligibility criteria: Records had to report survey or experimental research on samples of healthy adults. Records had to manipulate human-likeness using one or multiple social cues and compare human-like text-based chatbots with chatbots having no or fewer human-like or machine-like characteristics. Records needed to assess single-scale items or indices representing attitude, perception, positive and negative affect, rapport, trust, or behavioral outcomes. I included published manuscripts, contributions in conference proceedings, dissertations, final theses, preprints, and unpublished manuscripts in English and German. Last, records had to report sufficient statistical results to allow the calculation of effect sizes. If possible, I obtained full papers for all records that fulfilled all inclusion criteria. The flow of papers through the screening process is shown in the Supplementary Fig. S1.
After screening 4719 records, 510 full texts were assessed for eligibility, and 142 papers from 2007 to 2023 (Mdn = 2022, mode = 2023) formed the final sample (see Supplementary Information). All final papers were written in English and reported experimental research. Most were journal articles (59.9%), 27.5% were conference papers, 8.5% were theses, and 4.2% were preprints and unpublished manuscripts (two of the three manuscripts that were unpublished at the time of analysis were published afterwards). Papers came from the fields of HCI (26.1%), business (19%), information systems (14.8%), psychology (13.4%), communication (9.9%), and computer science (7%). About 10% of papers were located in an interdisciplinary context. Most papers were published at the time of submission (97.9%), of which 89.1% had been peer-reviewed.
Information for the computation of effect sizes and variances, e.g., Cohen’s d, means, and standard deviations, and regarding the moderators, was systematically extracted and coded by the author and a student assistant in Microsoft Excel. The moderator variables extracted and used in this meta-analysis are shown in Table 2.
First, I calculated the standardized mean difference Cohen’s d, and its variance for every effect identified in the extraction process (Lenhard and Lenhard, 2022; Wilson, 2023). As Cohen’s d has been shown to have a slight positive bias in small samples, Hedges’ g, its corrected, unbiased estimate, was used (Borenstein and Hedges, 2019). Hedges’ g can be interpreted like Cohen’s d: a Hedges’ g of <|0.2| denotes no effect, ≥|0.2| < |0.5| a small effect, ≥|0.5| < |0.8| a moderate effect, and ≥|0.8| a large effect (Cohen, 1988). A positive g indicates that human-likeness leads to a positive social response. The signs of effect sizes were adapted so that positive effect sizes consistently stand for beneficial outcomes. Most effect sizes could be calculated based on the means, standard deviations, and group sizes provided in the papers. Other frequently reported information included unstandardized or standardized regression coefficients, t, η2partial, and F values. I calculated 800 effect sizes from 199 datasets, mostly automatically, with the help of an R script roughly following the approach by Roesler et al. (2021b). Less common effect size transformations were done manually, either based on the formulas provided by Wilson (2023) or using Psychometrica, an openly available effect size calculator (Lenhard and Lenhard, 2022). Effect sizes from comparisons using the same control group were averaged to avoid double-counting of participants (Higgins et al. 2024, section 23.3.4 How to include multiple groups from one study); comparisons of mean values from factorial trials were included separately (Higgins et al. 2024, section 23.3.6 Factorial trials). After calculating Cohen’s d and its variance for each effect, I transformed them to Hedges’ g and its variance for further analysis (Wilson, 2023). Both the effect size data file, which contains all effect sizes as well as detailed information on how they were calculated, as well as the formulas used to calculate the effect sizes, are available on OSF: https://osf.io/cezmu/.
The majority of dataset samples consisted of non-students (vs. students) (70.1%), directly interacted with a chatbot as stimulus (vs. were exposed to prerecorded materials) (64.8%), and participated in an online (vs. lab or field) experiment (84.3%). Many samples stemmed from the USA (45%), followed by China (16.8%), Germany (13.4%), the Netherlands (8.1%), and others (16.8%). The datasets comprised data from 41,642 participants. Participants were, on average, 31.4 years old; around half of them identified themselves as female (50.8%). Most participants indicated prior experience with chatbots (70%). A minority of datasets were collected in the context of a preregistered study (4.5%).
I assessed risk of bias using Nudelman and Otto’s (2020) Risk of Bias Utilized for Surveys Tool (ROBUST; this assessment was not preregistered). The tool consists of eight questions for eight topics answered by the author for each dataset, e.g., whether the sample size was sufficient, whether the paper provided the sample’s basic demographic information, and whether measurements were reliable. A risk of bias score was created by counting the “no” responses; higher numbers thus indicated a higher risk of bias (M = 2.79, Mdn = 3, range = 1–7).
To answer the first research question, I computed meta-analytic random-effects models for the human-likeness effects on the outcome variables. The data were analyzed using R (version 4.3.1) (R Core Team, 2023) and the metafor package (version 4.6–0) (Viechtbauer, 2010). I chose random-effects models as the datasets showed differences in employed methods and sample characteristics, which may lead to heterogeneity among the true effects. Heterogeneity was estimated using restricted maximum-likelihood estimation (REML) (Viechtbauer, 2010). To use all available effect size information and to account for potential non-independence induced by multiple effect sizes calculated on the same data and multiple datasets reported in the same paper, I conducted four-level meta-analyses (Hansen et al., 2022; Moeyaert et al., 2017). Before computing the models, I performed a systematic search for influential cases (Viechtbauer, 2010). The search yielded potentially influential effect sizes in all outcome categories, which I inspected in the effect sizes dataset and the papers. As I found no obvious errors or misbehavior, I included all effect sizes (N = 800). Figure 2 gives an overview of the results. The detailed results of the meta-analyses and the assessment of heterogeneity are given in Table 3.
k = number of effect sizes. Estimate = Hedges’ g. 95% CI = 95% confidence interval. The size of the squares is proportional to the precision of the estimates.
The first research question can be answered as follows: The analyses yielded a small positive overall human-likeness effect on users’ social responses. Human-like chatbot characteristics had a significant medium-sized positive effect on perception. Significantly small to medium-sized positive effects emerged on rapport, positive affect, and trust, and a small effect on attitude. A very small significant positive effect was found on behavioral outcomes. No significant effect emerged on negative affect. A substantial amount of heterogeneity was present, as indicated by the significant Q tests for all outcome categories. Except for negative affect, I2 values indicated that over 90% of the total variability in each category was due to the influence of factors not considered in the respective model. The prediction intervals also supported heterogeneity in outcomes. Across outcome categories, they indicated that human-like characteristics can have minor, moderate, and large true effects. Although the average results were estimated to be positive, the true outcome in some datasets may be negative.
Although I took measures to counter publication bias, e.g., by sending out a call for unpublished research, the non-publication of studies with unfavorable or non-significant results might have affected the results of this meta-analysis (Vevea et al., 2019). Thus, I performed Egger’s regression tests for funnel plot asymmetry (Egger et al., 1997; Rodgers and Pustejovsky, 2021) and visual inspection of contour-enhanced funnel plots that indicate publication bias due to the exclusion of non-significant results (Peters et al., 2008). Egger’s regression test indicated asymmetric funnel plots for the overall model (p = 0.001), attitude (p = 0.030), and perception (p < 0.001) but not for positive affect (p = 0.944), negative affect (p = 0.080), rapport (p = 0.080), trust (p = 0.355), and behavioral outcomes (p = 0.862). Visual inspection of the overall funnel plot suggested a slight rightward bias; the estimates appeared to be rather over- than underestimated (Fig. 3).
Plots for overall social responses and per subcategory, reference line = 0, white background: p > 0.10, dark gray background: 0.05 < p < 0.10, light gray background: 0.01 < p < 0.05, region outside of funnel: p < 0.01.
Due to the substantial heterogeneity detected (Table 3), I investigated the extent to which selected sample-, agent-, context-, and method-related factors influenced the main effect estimates. I conducted moderator analyses using multilevel mixed-effects models, with heterogeneity estimated via REML (Viechtbauer, 2010). First, the average effect sizes for each moderator level were estimated. Next, a test of moderators was used to check if the effect sizes differed significantly. A significant F test indicates significant differences in the effect sizes between the moderator levels. Figures 47 contain visual overviews of the moderator results. I present the results of the subgroup analyses for the complete sample (N = 800) in Table 4. The full subgroup analysis results for each outcome category can be found in the Supplementary Tables S5S11.
The plots show the regression coefficients and the corresponding 95% confidence intervals for the influence of sample-related moderators on the overall and individual social responses. The size of the squares is proportional to the number of included effect sizes. No data were available to compute the moderating effect of prior use experience on negative affect.
The plots show the human-likeness effects overall and for each social response (Hedges’ g) and the corresponding 95% confidence intervals for the agent-related moderator subgroups. The size of the squares is proportional to the number of included effect sizes.
The plots show the human-likeness effects overall and for each social response (Hedges’ g) and the corresponding 95% confidence intervals for the context-related moderator subgroups. The size of the squares is proportional to the number of included effect sizes.
The plots show the human-likeness effects overall and for each social response (Hedges’ g) and the corresponding 95% confidence intervals for the method-related moderator subgroups. The size of the squares is proportional to the number of included effect sizes. In the “negative affect” category, all papers were published, peer-reviewed, and not preregistered.
No significant moderator effects of sample age, sample gender, and previous chatbot use were present. Regarding the agent-related moderators, the interaction mechanism and the type of social cues the chatbot possessed moderated the human-likeness effect: The overall effect was significant when the interaction mechanism was unstructured. Similar effects were found for attitude, b = 0.54, F(1, 51) = 10.18, p = 0.002, perception, b = 0.38, F(1, 133) = 6.19, p = 0.014, positive affect, b = 0.61, F(1, 17) = 4.55, p = 0.048, and trust, b = 0.43, F(1, 26) = 4.65, p = 0.040. A larger overall effect was found when human-likeness was implemented by verbal (vs. other) cues. A similar effect emerged in the models for attitude, b = 0.20, F(1, 132) = 5.10, p = 0.026, positive affect, b = 0.38, F(1, 23) = 8.67, p = 0.007, rapport, b = 0.32, F(1, 93) = 13.89, p < 0.001, and behavioral outcomes, b = 0.24, F(1, 129) = 5.47, p = 0.021. Assigned chatbot gender, avatar, and technology did not moderate human-likeness effects.
Application context did not significantly moderate the overall effect, but task criticality did: The effect was larger in settings where the chatbot performed non-critical tasks.
The only method-related moderator that significantly influenced the overall human-likeness effect was the scientific field; the effect was larger when the paper was not located in the field of communication. While no other method-related moderators influenced the overall effect, some played a role in positive and negative affect and rapport. Sample type significantly moderated the effect on positive affect, b = 0.31, F(1, 20) = 4.42, p = 0.048, with the effect being larger in student samples. On average, the effect on negative affect was larger for laboratory or field (vs. online) studies, b = 1.25, F(1, 13) = 9.36, p = 0.009. However, only one effect size could be included for the laboratory/field category. The human-likeness effect on rapport was significantly smaller when the sample was US-based (vs. not US-based), indicated by the negative regression coefficient, b = −0.30, F(1, 47) = 4.08, p = 0.049. Additionally, the effect on rapport was larger when the study employed prerecorded materials (vs. interactions), b = 0.48, F(1, 56) = 12.57, p < 0.001. Publication status, year of publication, and whether the study was peer-reviewed or preregistered did not moderate the human-likeness effects.
In an exploratory moderator analysis, I investigated whether the study’s risk of bias moderated the human-likeness effects on our outcomes of interest. Risk of bias did not significantly moderate the overall effect, b = 0.05, F(1, 197) = 2.34, p = 0.128. However, it significantly positively influenced the effects on attitude, b = 0.12, F(1, 67) = 4.86, p = 0.031, positive affect, b = 0.18, F(1, 20) = 4.80, p = 0.040, and rapport, b = 0.14, F(1, 67) = 6.93, p = 0.011, i.e., the higher the risk of bias, the larger the effects.
Category-wise mixed-effects moderator models were run, simultaneously regressing the effect sizes on multiple moderators to validate the robustness of the results (Viechtbauer, 2010). Due to sample size considerations and to keep the results comparable, I calculated models including all moderators for which significant effects emerged in the subgroup analyses, including risk of bias (Blut et al. 2021). Multicollinearity among moderator variables was assessed using variance inflation factors (VIF). The highest VIF observed was 1.36, indicating acceptable levels of multicollinearity. The results of the meta-regression models are presented in Table 5.
For the overall human-likeness effect, the meta-regression model largely replicated the results from the separate analyses. The effect of human-like chatbot characteristics was larger when chatbots employed verbal (vs. other) cues and when the interaction was unstructured (vs. structured). However, the effects of task criticality and scientific field vanished in the meta-regression model.
Regarding attitude, the moderator effect of the interaction mechanism persisted while the one of social cues became non-significant. The interaction mechanism also moderated the effect on perception. The effect on positive affect was merely moderated by the type of social cue in the meta-regression. Like in the subgroup analyses, the study setting significantly influenced the human-likeness effect on negative affect. Regarding rapport, the manipulation of social cues and stimulus type had a significant influence; the effect of country turned non-significant. No significant moderator effects emerged for trust or behavioral outcomes, as indicated by the non-significant Tests of Moderators. The second research question can be answered as follows: while most moderators investigated did not influence human-likeness effects, the type of social cues the chatbot is equipped with, and its interaction mechanism significantly moderated the effects of social cues on social responses. The findings also suggest that the effects are dependent on task criticality, although this effect became non-significant when other moderators were included in the model.
This meta-analysis investigated the effects of human-likeness in text-based conversational agents on a broad range of social responses and the influence of moderators related to sample, agent, context, and method. Analysis of 800 effect sizes from 199 experimental datasets yielded a small positive overall effect of chatbots’ human-like social cues on users’ social responses (RQ1). This effect varied depending on the social response category and specific moderators, in particular, whether a chatbot used verbal (vs. other) social cues to convey human-likeness, whether the interaction was unstructured (vs. structured), and whether the task performed by the chatbot was non-critical (vs. critical) (RQ2).
The main finding aligns with previous meta-analyses on the impact of human-likeness in intelligent media agents on human-related outcomes (Blut et al., 2021; Roesler et al., 2021a). However, in contrast to the moderate overall human-likeness effect for social robots found by Roesler et al. (2021a) and the strong correlation between perceived anthropomorphism and intention to use service robots found by Blut et al. (2021), the effect for text-based conversational agents appears to be somewhat smaller. This might be because variations in text-based chatbots’ social cues might not be as salient to users as those of more embodied agents that can leverage the potential of voice and movement. For instance, in their meta-analysis on the effectiveness of social cues in human–robot interaction, Xu et al. (2023) showed that facial, kinetic, and haptic cues have a greater influence on social presence and trust as perceived by users than verbal cues.
The size of the positive effect varied for the different outcome categories, being moderate for perception, small to moderate for outcomes related to positive affect, rapport, trust, small for attitudinal, very small for behavioral outcomes, and not significant for negative affect. This suggests that social cues are moderately effective in improving perceptual, affective, and trust-related outcomes but ineffective in improving how people (intend to) act and in improving their negative emotions. The relative difference between the larger human-likeness effects on perceptional, affective, and trust-related outcomes and the small human-likeness effect on behavior aligns with prior meta-analytic research on AI-based agents (Huang and Wang, 2023; Roesler et al., 2021a). Perceptions of the agent and the interaction, trust, and positive feelings appear to be elicited quite easily using social cues, possibly reflected by the large number of effect sizes in these categories. Behavioral intentions and actual behavior appear harder to change with human-likeness as factors connected to the functioning of the chatbot might have superseded the effects of social cues. The findings further suggest that human-likeness is less effective in mitigating negative emotions like frustration and stress than the literature suggests (Benner et al., 2021; De Sá Siqueira et al., 2023). There is first evidence that human-likeness might increase frustration in erroneous chatbot interactions (Brendel et al., 2023) and decrease users’ evaluation of a firm and purchase intention when they are already angry when contacting a chatbot (Crolic et al., 2022). However, future research should include other negative emotions, such as stress and feelings of eeriness, as well as the factors determining the relationship between human-likeness and negative affect.
The moderator analyses revealed several interesting findings. Surprisingly, but in line with prior meta-analyses (Blut et al., 2021; Huang and Wang, 2023), sample characteristics did not influence the human-likeness effects on our outcomes of interest.
Strikingly, gender-specific chatbot cues did not moderate human-likeness effects, contrasting with earlier meta-analyses suggesting stronger effects for female-gendered chatbots (Blut et al., 2021). As female cues in chatbots are very common and the current meta-analysis includes more recent papers (median year of publication: 2022), users might have gotten used to them. Thus, human-likeness effects might be similar across gender cues. Future research might disentangle gender effects by answering how users of different gender identities respond to chatbots with different gender cues. The chatbot’s task should also be considered, as it might influence gender-specific stereotypes. Notably, many chatbots lacked gender-specific cues, yet participants might still have inferred gender (Abercrombie et al., 2023).
Human-likeness effects in studies that equipped their human-like chatbot with only human-like visual, invisible, or interaction cues were, on average, smaller than effects from studies that manipulated human-likeness using verbal cues. This trend also emerged for attitude, positive affect, rapport, and behavioral outcomes. The potential of social cues appears to be more fully utilized in interactions where the user communicates with the chatbot via free text (overall and in the categories attitude, perception, positive affect, and trust). A reason might be that free text interaction comes more naturally to participants because it is more similar to a chat interaction with a human than the interaction via clicking on predefined answer options (Jain et al., 2018).
The results further suggest that task criticality (i.e., whether stakes for users are high) matters. Specifically, the overall human-likeness effect was larger for uncritical tasks (i.e., when stakes were low). This aligns with findings from meta-analyses on more embodied agents that concluded that perceived anthropomorphism plays a larger role in non-critical services (Blut et al., 2021) and that human-likeness is more effective in social contexts, including entertainment, but also therapy and education (Roesler et al., 2021a). However, as this moderator effect vanished when other moderators were included in the model, it must be interpreted with a grain of salt.
The sample’s composition appeared to influence effect sizes; specifically, student samples tended to yield larger effects (in the positive affect category) because they are likely more homogenous regarding socio-economic background, resulting in less variance than more diverse general population samples (Orsingher et al., 2010). The effect on rapport was larger in studies that used prerecorded materials. This finding is puzzling but might be explained by the advantage of vignette studies that participants can inspect the stimulus materials more closely (Abendschein et al., 2021).
At the same time, the fact that many of the included factors, including demographic characteristics of the sample and the application context, did not moderate human-likeness effects but generalized across moderator levels presents an interesting finding in itself.
This meta-analysis contributes to the research field of HMC, particularly the literature around human-likeness, the CASA paradigm, and the emerging stream on the MASA paradigm. The findings support the CASA and MASA paradigms, suggesting that social cues elicit social responses (Lombard and Xu, 2021; Nass and Moon, 2000). By showing that verbal cues in text-based chatbots are at least as effective in eliciting social responses compared to other types of social cues, this current study challenges the proposition that verbal cues are secondary social cues and per se less effective in eliciting social responses (Lombard and Xu, 2021; Xu et al., 2023). Future experimental studies should systematically answer whether a hierarchy of social cues exists that applies to all media agents alike or if such a hierarchy differs between different degrees of embodiment or modalities.
The study also revealed that social cues do not improve all social responses to the same extent, as illustrated by the discrepancy between moderate effects on perceptions and very small effects on behavioral outcomes. This finding challenges the CASA and MASA paradigms by signaling that “not all social responses are indicative of mindlessness” (Lee, 2024, p. 2). While for attitude, perception, trust, positive affect, and rapport, the idea of mindless responses is supported, conscious, systematic cognitive processes might be activated regarding users’ intention to use a chatbot. Future research should avoid seeing social responses as a single entity and instead systematically investigate differences in the effect of a social cue on different social responses (Lee, 2024). Also, perceptions of the agent and the interaction, trust, and positive feelings toward the agent have been shown to mediate human-likeness effects on behavioral outcomes (Gopinath and Kasilingam, 2023). The analyses to investigate questions like these (e.g., using structural equation modeling) were unfortunately not in the scope of this meta-analysis but should be conducted in the future.
The analyses revealed important boundary conditions for CASA and MASA. The findings indicate that the interaction should not be structured and as free as possible to leverage the benefits of social cues. In addition, human-likeness might work differently for text-based chatbots, where social cues concentrate on language cues. When few social cues are available to users, it becomes increasingly difficult to distinguish whether textual information is coming from an AI/an AI-based chatbot or a human (Dou et al., 2022; Gunser et al., 2022). This issue is particularly relevant at the time of writing due to the introduction of LLM-based open-domain chatbots like OpenAI’s ChatGPT (OpenAI, 2023). It is therefore important to systematically research the negative consequences of chatbot anthropomorphism on societal issues, e.g., the spread of misinformation and disinformation and the replication of biases and stereotypes (Abercrombie et al., 2023).
This meta-analysis also uncovered which social responses are often investigated in research on human-like chatbots. Most effect sizes emerged for perception, including scales from the Godspeed questionnaire, and rapport, including scales to assess social presence. Rapport, trust, and affective outcomes comprised the fewest effect sizes. The behavioral outcomes category also included intentions, which is probably why the number of effect sizes is comparable to the attitude category. Scales assessing self-reports are generally easier to research than actual behavior (Roesler et al., 2021a). However, they are also susceptible to biases arising from social desirability or the tendency to agree to statements (Wetzel and Greiff, 2018).
While the study setting was not a significant moderator, there was a discrepancy between the large number of online studies and the relatively small number of laboratory and field experiments. In the last few years, online samples have been established as a convenient and fast way of data collection, likely fueled by the COVID-19 pandemic. I recommend that researchers go into the field again and investigate the impact of real chatbots on real users. Finally, only a minority of studies were preregistered (4.5%), indicating room for improvement in open science practices.
The findings indicate that social cues can improve social responses overall to a small to moderate extent. This especially applies to perceptions, trust, positive affect, and rapport (i.e., user-chatbot relationship outcomes). Social cues are less relevant for changing users’ behaviors and behavioral intentions as well as for alleviating users’ negative emotions. In light of effectiveness, practitioners are thus generally recommended to keep implementing human-like chatbots as prior research has shown that good relationships between users and organizations can increase crucial marketing outcomes like word of mouth, customer loyalty, and continuity expectation (Verma et al., 2016). However, suppose the primary goal is to enhance usage-related outcomes for which the smallest effect was found. In that case, practitioners are advised to carefully weigh the costs and benefits of equipping the chatbot with human-like cues. Nonetheless, even small effects of subtle variations in the design of chatbots deployed to a large audience might cumulatively lead to favorable outcomes (Funder and Ozer, 2019).
The results of the moderator analyses also have important implications for chatbot designers. Employing visual, identity, or other non-verbal cues alone might not be as effective for improving user responses—the chatbot needs to communicate in a human-like manner. The results suggest that practitioners can better leverage the potential of human-like features when the chatbot’s interaction mechanism is free text. However, previous research has shown that a chatbot needs to function reliably to fully realize the benefits of a more human-like chatbot interaction (Klein et al., 2025). With LLM-supported chatbots that function smoothly, human–like interactions could become possible like never before. The findings also provide the first evidence that human-likeness can be applied more effectively in chatbots that perform non-critical, low-risk tasks.
Many designers endow chatbots with female gender cues (what Erscoi et al. (2023) refer to as “humanization via feminization” (p. 21)), on the assumption that this enhances the user experience (Gnewuch et al., 2022). However, coding chatbots as female might reinforce stereotypes that associate women with warmth, subservience, and a higher aptitude for secretarial and emotional work (Erscoi et al., 2023). In the current study, female gender cues did not increase the effectiveness of human-likeness, suggesting that the presumed benefits of female gender cues may not outweigh their negative societal consequences.
It is important to note that human-like design does not always align with users’ expectations; e.g., human-likeness might signal that the chatbot possesses competencies it does not actually have (Shanahan, 2024). Some researchers have thus recommended avoiding using social cues that signal, e.g., personality, altogether (Abercrombie et al., 2023). In addition, as this meta-analysis has shown, human-likeness can lead to increased trust in or credibility of information. Prior research has shown that information provided by conversational agents like chatbots is perceived as more credible than information from a web page, regardless of whether the information is accurate (Anderl et al., 2024). Especially for text-based chatbots, where social cues are restricted to subtle verbal cues, concerns emerge because users could more easily believe they are interacting with another human. Hence, human-like chatbots can lead users to share their (personal) data with chatbots and, thus, with the individuals and organizations behind them (Singh et al., 2023). These risks have to be considered when designing chatbots. Clear and transparent design, e.g., through labels and explanations, and improving users’ AI literacy might help reduce and adjust overly high expectations and unintended data exposure (Sundar and Liao, 2023).
This meta-analysis has several limitations. The final sample only included papers written in English, which is why particularly conference papers and student theses written in other languages might be missing. However, the papers included comprised samples from various countries. Another limitation refers to the screening process, where the interrater reliability coefficient indicated moderate agreement between the coders. Although classified as moderate according to established criteria (Landis and Koch, 1977; Schaer, 2012), the magnitude of Cohen’s Kappa might be seen as insufficient, particularly in more critical areas like medical studies (McHugh, 2012). To improve the initial reliability, the coding results were discussed among coders, and disagreements were resolved by consensus. Thus, I am confident that this limitation does not alter the overall conclusions of this meta-analysis.
Heterogeneity was generally present in the analyses. High degrees of heterogeneity can complicate the interpretation of meta-analytic findings because the individual study findings might not be comparable. Potential reasons for heterogeneous effect sizes include grouping various (single and combinations of) social cues that can signal different things (e.g., empathy, humor, personality, responsiveness) into one independent variable and combining several constructs in one subcategory. In addition, crucial HMC constructs like perceived human-likeness and social presence have often been operationalized differently (see Ischen et al., 2023 for an overview of human-likeness measures). In the current study, random-effects models have been employed to consider heterogeneity (Viechtbauer, 2007). In addition, heterogeneity was assessed and explored by conducting subgroup and meta-regression analyses, which showed that selected factors moderate human-likeness effects and, thus, explain some variance in effect sizes.
Certain factors related to risk of bias limit the validity of the study’s conclusions: About 7% of eligible papers did not report sufficient information to calculate effect sizes and had to be excluded beforehand, which could have reduced the final set of papers and effect sizes. Risk of bias, as assessed through the ROBUST criteria (Nudelman and Otto, 2020), was only a significant moderator in the rapport category. Furthermore, neither publication status nor whether it was a peer-reviewed publication—two indicators of a study’s quality—moderated the effects of human likeness.
I also want to point out that meta-analytic moderator analyses are generally based on observational data (i.e., paper- and dataset-related characteristics) and, therefore, do not allow conclusions about causality (Viechtbauer, 2007). Although moderators were carefully selected based on theory and previous empirical research, unobserved factors might have been confounded with one or several moderators and might have distorted the findings. As the number of effect sizes was low in some categories (e.g., positive and negative affect and trust), either because the dependent variable was not investigated often or because the respective moderator information was not reported, statistical power might not have been sufficient to detect significant moderator effects in some cases. Additionally, I could not perform some planned moderator analyses due to insufficient data.
The stimulus chatbots in the included studies were merely text-based. However, recent technological advances have produced various other agents with which users interact in their everyday lives, e.g., voice-based assistants and virtual reality avatars (Lu et al., 2024). Whether and how human-likeness works differently in voice-based or virtual agents that do not mainly communicate via text but at the same time are not as embodied as physical robots has yet to be explored. To this end, future studies should systematically compare text-based agents and other types of agents concerning the effectiveness of social cues.
This meta-analysis identified a small positive effect of human-likeness on user responses toward text-based chatbots, indicating that social cues generally enhance social responses. However, these cues appear less impactful compared to more embodied agents. Consistent with prior research, human-likeness effects varied by outcome, showing a moderate impact on perception, a small to moderate impact on rapport, trust, and positive affect, a small impact on attitude, a very small impact on behavioral outcomes, and no impact on negative affect. The results suggest that human-likeness is more effective in unstructured chatbot interactions, when the chatbot communicates in a human-like manner and handles non-critical tasks. The results challenge MASA’s proposition that secondary cues, such as language use, are less effective in eliciting social responses than primary cues, such as a human-like picture. As human-like text-based chatbots can lead to greater self-disclosure and trust in information, the findings offer guidance for designing both effective and ethically responsible chatbot interactions.
The datasets generated and analysed during the current study are available in the Open Science Framework (OSF) repository, https://osf.io/cezmu/.
Abd-Alrazaq AA, Rababeh A, Alajlani M, Bewick BM, Househ M (2020) Effectiveness and safety of using chatbots to improve mental health: Systematic review and meta-analysis. J Med Internet Res 22(7):16021. https://doi.org/10.2196/16021
Article  Google Scholar 
Abendschein B, Edwards C, Edwards A (2021) The influence of agent and message type on perceptions of social support in human-machine communication. Commun Res Rep. 38(5):304–314. https://doi.org/10.1080/08824096.2021.1966405
Article  Google Scholar 
Abercrombie G, Cercas Curry A, Dinkar T, Rieser V, Talat Z (2023) Mirages. On anthropomorphism in dialogue systems. In: Bouamor H, Pino J, Bali K (eds) Proceedings conference on empirical methods in natural language processing, Singapore, 6–10 December 2023. Association for Computational Linguistics, p 4776–4790
Anderl C, Klein SH, Ehrhardt N, Utz S (2024) Einfluss psychologischer Faktoren auf die KI-Nutzung und -Wahrnehmung. In: Hug T, Missomelius P, Ortner H (eds) Künstliche Intelligenz im Diskurs: Interdisziplinäre Perspektiven zur Gegenwart und Zukunft von KI-Anwendungen. Innsbruck Univ Press, p 43–57
Anderl C, Klein SH, Sarigül B, Schneider FM, Han J, Fiedler PL, Utz S (2024) Conversational presentation mode increases credibility judgements during information search with ChatGPT. Sci Rep 14(1):17127. https://doi.org/10.1038/s41598-024-67829-6
Article  CAS  PubMed  PubMed Central  Google Scholar 
Araujo T (2018) Living up to the chatbot hype: the influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Comput Hum Behav 85:183–189. https://doi.org/10.1016/j.chb.2018.03.051
Article  Google Scholar 
Beattie A, Edwards AP, Edwards C (2020) A bot and a smile: interpersonal impressions of chatbots and humans using emoji in computer-mediated communication. Commun Stud 71(3):409–427. https://doi.org/10.1080/10510974.2020.1725082
Article  Google Scholar 
Benner D, Elshan E, Schöbel S, Janson A (2021) What do you mean? A review on recovery strategies to overcome conversational breakdowns of conversational agents. In: International Conference on Information Systems (ICIS), Austin, Texas, pp 12–15
Blut M, Wang C, Wünderlich NV, Brock C (2021) Understanding anthropomorphism in service provision: a meta-analysis of physical robots, chatbots, and other AI. J Acad Mark Sci 49:632–658. https://doi.org/10.1007/s11747-020-00762-y
Article  Google Scholar 
Borenstein M (2019) Heterogeneity in meta-analysis. In: Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis, 3rd edn. Russell Sage Foundation, New York, pp 453–468
Borenstein M, Hedges LV (2019) Effect sizes for continuous data. In: Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis, 3rd edn. Russell Sage Foundation, New York, pp 207–243
Brandtzaeg PB, Følstad A (2017) Why people use chatbots. In: Kompatsiaris Y (ed) Internet Science. INSCI 2017 Lecture Notes in Computer Science, Vol 10673. Springer, Cham, pp 377–392
Brandtzaeg PB, Følstad A (2018) Chatbots: changing user needs and motivations. Interactions 25(5):38–43. https://doi.org/10.1145/3236669
Article  Google Scholar 
Brendel AB, Hildebrandt F, Dennis AR, Riquel J (2023) The paradoxical role of humanness in aggression toward conversational agents. J Manag Inf Syst 40(3):883–913. https://doi.org/10.1080/07421222.2023.2229127
Article  Google Scholar 
Chattaraman V, Kwon W, Gilbert JE, In Shim S (2011) Virtual agents in e‐commerce: representational characteristics for seniors. J Res Interact Mark 5(4):276–297. https://doi.org/10.1108/17505931111191492
Article  Google Scholar 
Chaves AP, Gerosa MA (2021) How should my chatbot interact? A survey on social characteristics in human-chatbot interaction design. Int J Hum–Comput Interact 37(8):729–758. https://doi.org/10.1080/10447318.2020.1841438
Article  Google Scholar 
Chen J, Guo F, Ren Z, Li M, Ham J (2023) Effects of anthropomorphic design cues of chatbots on users’ perception and visual behaviors. Int J Hum–Comput Interact 40(14):3636–3654. https://doi.org/10.1080/10447318.2023.2193514
Article  Google Scholar 
Ciechanowski L, Przegalinska A, Magnuski M, Gloor P (2019) In the shades of the uncanny valley: an experimental study of human–chatbot interaction. Future Gener Comput Syst 92:539–548. https://doi.org/10.1016/j.future.2018.01.055
Article  Google Scholar 
Cohen J (1988) Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Routledge, New York
Crolic C, Thomaz F, Hadi R, Stephen AT (2022) Blame the bot: anthropomorphism and anger in customer–chatbot interactions. J Mark Res 86(1):132–148. https://doi.org/10.1177/00222429211045687
Article  Google Scholar 
De Sá Siqueira MA, Müller BCN, Bosse T (2023) When do we accept mistakes from chatbots? The impact of human-like communication on user experience in chatbots that make mistakes. Int J Hum–Comput Interact 40(11):2862–2872. https://doi.org/10.1080/10447318.2023.2175158
Article  Google Scholar 
Diederich S, Brendel AB, Lichtenberg S, Kolbe L (2019) Design for fast request fulfillment or natural interaction? Insights from an experiment with a conversational agent. Paper presented at the 27th European Conference on Information Systems (ECIS), Stockholm & Uppsala, Sweden, 8–14 June 2019
Dou Y, Forbes M, Koncel-Kedziorski R, Smith NA, Choi Y (2022) Is GPT-3 text indistinguishable from human text? Scarecrow: a framework for scrutinizing machine text. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the Association for Computational Linguistics (Long Papers), Dublin, Ireland, May 2022. Vol 1. Association for Computational Linguistics, pp 7250–7274
Egger M, Davey Smith G, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ (Clin Res Ed) 315(7109):629–634. https://doi.org/10.1136/bmj.315.7109.629
Article  CAS  Google Scholar 
Epley N, Waytz A, Cacioppo JT (2007) On seeing human: a three-factor theory of anthropomorphism. Psychol Rev 114(4):864–886. https://doi.org/10.1037/0033-295X.114.4.864
Article  PubMed  Google Scholar 
Erscoi L, Kleinherenbrink A, Guest O (2023) Pygmalion displacement: when humanising AI dehumanises women. OSF. https://doi.org/10.31235/osf.io/jqxb6
Feine J, Gnewuch U, Morana S, Maedche A (2019) A taxonomy of social cues for conversational agents. Int J Hum–Comput Stud 132:138–161. https://doi.org/10.1016/j.ijhcs.2019.07.009
Article  Google Scholar 
Feine J, Gnewuch U, Morana S, Maedche A (2020) Gender bias in chatbot design. In: Følstad A, Araujo T, Papadopoulos S, Law EL-C, Granmo O-C, Luger E, Brandtzaeg PB (eds) Chatbot research and design. CONVERSATIONS 2019, Lecture notes in Computer Science, vol 11970. Springer, Cham, pp 79–93
Fiore SM, Wiltshire TJ, Lobato EJ-C, Jentsch FG, Huang WH, Axelrod B (2013) Toward understanding social cues and signals in human-robot interaction: effects of robot gaze and proxemic behavior. Front Psychol 4(859):2–15. https://doi.org/10.3389/fpsyg.2013.00859
Article  Google Scholar 
Fishbach A, Labroo AA (2007) Be better or be merry: How mood affects self-control. J Pers Soc Psychol 93:158–173. https://doi.org/10.1037/0022-3514.93.2.158
Article  PubMed  Google Scholar 
Følstad A, Brandtzaeg PB (2020) Users’ experiences with chatbots: findings from a questionnaire study. Qual User Exp 5 (1). https://doi.org/10.1007/s41233-020-00033-2
Funder DC, Ozer DJ (2019) Evaluating effect size in psychological research: sense and nonsense. Adv Methods Pract Psychol Sci 2(2):156–168. https://doi.org/10.1177/2515245919847202
Article  Google Scholar 
Gambino A, Fox J, Ratan R (2020) Building a stronger CASA: extending the computers are social actors paradigm. Hum-Mach Commun 1:71–86. https://doi.org/10.30658/hmc.1.5
Article  Google Scholar 
Gnewuch U, Morana S, Adam MTP, Maedche A (2018) Faster is not always better: understanding the effect of dynamic response delays in human-chatbot interaction. In: Paper presented at 26th European Conference on Information Systems (ECIS), Portsmouth, UK, 23–28 June 2018
Gnewuch U, Morana S, Adam MTP, Maedche A (2022) Opposing effects of response time in human-chatbot interaction. Bus Inf Syst Eng 64:773–791. https://doi.org/10.1007/s12599-022-00755-x
Article  Google Scholar 
Go E, Sundar SS (2019) Humanizing chatbots: the effects of visual, identity and conversational cues on humanness perceptions. Comput Hum Behav 97:304–316. https://doi.org/10.1016/j.chb.2019.01.020
Article  Google Scholar 
Gopinath K, Kasilingam D (2023) Antecedents of intention to use chatbots in service encounters: a meta-analytic review. Int J Consum Stud 47(6):2367–2395. https://doi.org/10.1111/ijcs.12933
Article  Google Scholar 
Greussing E, Gaiser F, Klein SH, Straßmann C, Ischen C, Eimler S, Frehmann K, Gieselmann M, Knorr C, Lermann Henestrosa A, Räder A, Utz S (2022) Researching interactions between humans and machines: methodological challenges. Publizistik 67(4):531–554. https://doi.org/10.1007/s11616-022-00759-3
Article  PubMed Central  Google Scholar 
Gunser VE, Gottschling S, Brucker B, Richter S, Çakir D, Gerjets P (2022) The pure poet: how good is the subjective credibility and stylistic quality of literary short texts written by an artificial intelligence tool as compared to texts written by human authors? Proceedings of the first workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022), Dublin, Ireland, May 2022. Association for Computational Linguistics, pp 60–61
Hansen C, Steinmetz H, Block J (2022) How to conduct a meta-analysis in eight steps: a practical guide. Manag Rev Q 72(1):1–19. https://doi.org/10.1007/s11301-021-00247-4
Article  Google Scholar 
Haugeland IKF, Følstad A, Taylor C, Bjørkli CA (2022) Understanding the user experience of customer service chatbots: An experimental study of chatbot interaction design. Int J Hum-Comput St 161:102788. https://doi.org/10.1016/j.ijhcs.2022.102788
Article  Google Scholar 
Higgins JPT, Eldridge S, Li T (eds) (2024) Including variants on randomized trials. In: Cochrane Handbook for Systematic Reviews of Interventions version 6.5 (updated August 2024). Cochrane. https://training.cochrane.org/handbook/current/chapter-23. Accessed 26 Sep 2024
Holtgraves TM, Ross SJ, Weywadt CR, Han TL (2007) Perceiving artificial social agents. Comput Hum Behav 23(5):2163–2174. https://doi.org/10.1016/j.chb.2006.02.017
Article  Google Scholar 
Huang G, Wang S (2023) Is artificial intelligence more persuasive than humans? A meta-analysis. J Commun 73(6):552–562. https://doi.org/10.1093/joc/jqad024
Article  Google Scholar 
Ischen C, Smit E, Wang E (2023) Assessing human-likeness perceptions: Measurement scales of conversational agents. Paper presented at CONVERSATIONS 2023—the 7th International Workshop on Chatbot Research and Design, Oslo, Norway, 22–23 November 2023
Jacobson J, Gorea I (2023) Human–machine communication in retail. In: Guzman A, McEwen R, Jones S (eds) The Sage handbook of human–machine communication. SAGE Publications Ltd., Thousand Oaks, California, pp 532–539
Jain M, Kumar P, Kota R, Patel SN (2018) Evaluating and informing the design of chatbots. Proceedings of Designing Interactive Systems Conference, Hong Kong, China, 9–13 June 2018. Association for Computing Machinery, New York, pp 895–906
Kelleher T (2009) Conversational voice, communicated commitment, and public relations outcomes in interactive online communication. J Commun 59(1):172–188. https://doi.org/10.1111/j.1460-2466.2008.01410.x
Article  Google Scholar 
Kim S, Lee SY, Lee J (2021) Male, female, or robot?: Effects of task type and user gender on expected gender of chatbots. J Korea Multimedia Soc 24(2):320–327. https://doi.org/10.9717/KMMS.2020.24.2.320
Article  Google Scholar 
Klein SH, Papies D, Utz S (2025) How interaction mechanism and error responses influence users’ responses to customer service chatbots. Int J Hum-Comput Interact 41(7):4300–4318. https://doi.org/10.1080/10447318.2024.2351707
Article  Google Scholar 
Koenig AM (2018) Comparing prescriptive and descriptive gender stereotypes about children, adults, and the elderly. Front Psychol 9. https://doi.org/10.3389/fpsyg.2018.01086
Krämer NC, Bente G, Eschenburg F, Troitzsch H (2009) Embodied conversational agents. Soc Psychol 40(1):26–36. https://doi.org/10.1027/1864-9335.40.1.26
Article  Google Scholar 
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174. https://doi.org/10.2307/2529310
Article  CAS  PubMed  MATH  Google Scholar 
Le Bigot L, Jamet E, Rouet JF (2004) Searching information with a natural language dialogue system: a comparison of spoken vs. written modalities. Appl Erg 35(6):557–564. https://doi.org/10.1016/j.apergo.2004.06.001
Article  Google Scholar 
Lee E-J (2024) Minding the source: toward an integrative theory of human–machine communication. Hum Commun Res 50(2):184–193. https://doi.org/10.1093/hcr/hqad034
Article  Google Scholar 
Lee JD, See KA (2004) Trust in automation: designing for appropriate reliance. Hum Factors 46(1):50–80. https://doi.org/10.1518/hfes.46.1.50_30392
Article  PubMed  Google Scholar 
Lee S, Lee N, Sah YJ (2020) Perceiving a mind in a chatbot: effect of mind perception and social cues on co-presence, closeness, and intention to use. Int J Hum-Comput Interact 36(10):930–940. https://doi.org/10.1080/10447318.2019.1699748
Article  Google Scholar 
Lenhard W, Lenhard A (2022) Computation of effect sizes. Psychometrica. https://doi.org/10.13140/RG.2.2.17823.92329. Accessed 26 Sep 2024
Letheren K, Kuhn K-AL, Lings I, Pope NKLL (2016) Individual difference factors related to anthropomorphic tendency. Eur J Mark 50(5/6):973–1002. https://doi.org/10.1108/EJM-05-2014-0291
Article  Google Scholar 
Liu B, Sundar SS (2018) Should machines express sympathy and empathy? Experiments with a health advice chatbot. Cyberpsychol Behav Soc Netw 21(10):625–636. https://doi.org/10.1089/cyber.2018.0110
Article  PubMed  Google Scholar 
Liu W, Yao MZ (2023) Human–machine communication in marketing and advertising. In: Guzman A, McEwen R, Jones S (eds) The SAGE handbook of human–machine communication, SAGE Publications Ltd, Thousand Oaks, California, p 524–531
LoDolce M, Brackenbury J (2023, June 15) Gartner survey reveals only 8% of customers used a chatbot during their most recent customer service interaction. Gartner. https://www.gartner.com/en/newsroom/press-releases/2023-06-15-gartner-survey-reveals-only-8-percent-of-customers-used-a-chatbot-during-their-most-recent-customer-service-interaction. Accessed 12 May 2025
Lombard M, Xu K (2021) Social responses to media technologies in the 21st century: the media are social actors paradigm. Hum-Mach Commun 2:29–55. https://doi.org/10.30658/hmc.2.2
Article  Google Scholar 
Lu X, Shi Z, Zhang X, Cao L, Liang S (2024) Exploring the field of virtual avatar: a bibliometric and content analysis. Int J Hum-Comput Interact:1–17. https://doi.org/10.1080/10447318.2024.2400405
McHugh ML (2012) Interrater reliability: the Kappa statistic. Biochem Med 22(3):276
Moeyaert M, Ugille M, Natasha Beretvas S, Ferron J, Bunuan R, Van den Noortgate W (2017) Methods for dealing with multiple outcomes in meta-analysis: a comparison between averaging effect sizes, robust variance estimation and multilevel meta-analysis. Int J Soc Res Methodol 20(6):559–572. https://doi.org/10.1080/13645579.2016.1252189
Article  Google Scholar 
Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA(2015) Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Syst Rev 4:1. https://doi.org/10.1186/2046-4053-4-1
Article  PubMed  PubMed Central  Google Scholar 
Mori M (1970) Bukimi no tani—The uncanny valley: (K. F. MacDorman & T. Minato, Trans.). Energy 7:33–35
Google Scholar 
Nass C, Moon Y (2000) Machines and mindlessness: social responses to computers. J Soc Issues 56(1):81–103. https://doi.org/10.1111/0022-4537.00153
Article  Google Scholar 
Nudelman G, Otto K (2020) The development of a new generic risk-of-bias measure for systematic reviews of surveys. Methodology 16(4):278–298. https://doi.org/10.5964/meth.4329
Article  Google Scholar 
Oliveira R, Arriaga P, Santos FP, Mascarenhas S, Paiva A (2021) Towards prosocial design: a scoping review of the use of robots and virtual agents to trigger prosocial behaviour. Comput Hum Behav 114:106547. https://doi.org/10.1016/j.chb.2020.106547
Article  Google Scholar 
OpenAI (2023) Documentation. https://platform.openai.com/docs/introduction. Accessed 26 Sep 2024
Orsingher C, Valentini S, de Angelis M (2010) A meta-analysis of satisfaction with complaint handling in services. J Acad Mark Sci 38(2):169–186. https://doi.org/10.1007/s11747-009-0155-z
Article  Google Scholar 
Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, Moher D (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372:n71. https://doi.org/10.1136/bmj.n71
Article  PubMed  PubMed Central  Google Scholar 
Park G, Chung J, Lee S (2023) Effect of AI chatbot emotional disclosure on user satisfaction and reuse intention for mental health counseling: a serial mediation model. Curr Psychol 42:28663–28673. https://doi.org/10.1007/s12144-022-03932-z
Article  Google Scholar 
Park G, Lee S, Chung J (2023) Do anthropomorphic chatbots increase counseling satisfaction and reuse intention? The moderated mediation of social rapport and social anxiety. Cyberpsychol Behav Soc Netw 26(5):357–365. https://doi.org/10.1089/cyber.2023.0008
Article  CAS  PubMed  Google Scholar 
Pelau C, Niculescu M, Bojescu I (2021) Gender specific preferences towards anthropomorphic AI devices and robots. In: Pamfilie R, Dinu V, Tǎchiciu L, Pleșea D, Vasiliu C (eds) 7th BASIQ international conference on new trends in sustainable business and consumption, Foggia, Italy, 3–5 June 2021. ASE, Bucharest, pp 784–792
Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L (2008) Contour-enhanced meta-analysis funnel plots help distinguish publication bias from other causes of asymmetry. J Clin Epidemiol 61(10):991–996. https://doi.org/10.1016/j.jclinepi.2007.11.010
Article  PubMed  Google Scholar 
R Core Team (2023) R (Version 4.3.1) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/. Accessed 26 Sep 2024
Rapp A, Curti L, Boldi A (2021) The human side of human-chatbot interaction: a systematic literature review of ten years of research on text-based chatbots. Int J Hum-Comput St 151:102630. https://doi.org/10.1016/j.ijhcs.2021.102630
Article  Google Scholar 
Reeves B, Nass CI (1996) The media equation: how people treat computers, television, and new media like real people and places. CSLI Publications, Stanford, California
Google Scholar 
Reuters (2025, February 20) OpenAI’s weekly active users surpass 400 million. Reuters. https://www.reuters.com/technology/artificial-intelligence/openais-weekly-active-users-surpass-400-million-2025-02-20/. Accessed 9 May 2025
Richardson WS, Wilson MC, Nishikawa J, Hayward RS (1995) The well-built clinical question: a key to evidence-based decisions. ACP J Club 123(3):A12–A13
Article  CAS  PubMed  Google Scholar 
Rodgers MA, Pustejovsky JE (2021) Evaluating meta-analytic methods to detect selective reporting in the presence of dependent effect sizes. Psychol Methods 26(2):141–160. https://doi.org/10.1037/met0000300
Article  PubMed  Google Scholar 
Roesler E, Manzey D, Onnasch L (2021a) A meta-analysis on the effectiveness of anthropomorphism in human-robot interaction. Sci Robot 6(58):5425. https://doi.org/10.1126/scirobotics.abj5425
Article  Google Scholar 
Roesler E, Manzey D, Onnasch L (2021b) Same same, but different—a meta-analysis regarding the consequences of anthropomorphism in human-robot interaction. Data_analysis.R. OSF. https://doi.org/10.17605/OSF.IO/EGTK6
Rzepka C, Berger B, Hess T (2022) Voice assistant vs. chatbot—examining the fit between conversational agents’ interaction modalities and information search tasks. Inf Syst Front 24(3):839–856. https://doi.org/10.1007/s10796-021-10226-5
Article  Google Scholar 
Scopelliti M, Giuliani MV, Fornara F (2005) Robots in a domestic setting: a psychological approach. Univers Access Inf Soc 4(2):146–155. https://doi.org/10.1007/s10209-005-0118-1
Article  Google Scholar 
Schaer P (2012) Better than their reputation? On the reliability of relevance assessments with sudents. In T Catarci, P Forner, D Hiemstra, A Peñas, G Santucci (eds): information access evaluation. Multilinguality, multimodality, and visual analytics, Springer, Berlin, Heidelberg, pp 124–135
Scherer A, Fischer PM, Schmitt B, Egli J (2020) Wow, that’s great! The effect of phatic cues in chatbot conversations. European Marketing Academy Conference (EMAC), Budapest, Hungary (online), 26–29 May, 2020
Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA (2015) Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ 350:7647. https://doi.org/10.1136/bmj.g7647
Article  Google Scholar 
Shanahan M (2024) Talking about large language models. Commun ACM 67(2):68–79. https://doi.org/10.1145/3624724
Article  Google Scholar 
Shawar BA, Atwell E (2007) Chatbots: are they really useful? J Lang Technol Comput Linguist 22(1):29–49. https://doi.org/10.21248/jlcl.22.2007.88
Article  Google Scholar 
Singh B, Olds T, Brinsley J, Dumuid D, Virgara R, Matricciani L, Watson A, Szeto K, Eglitis E, Miatke A, Simpson CE, Vandelanotte C, Maher C (2023) Systematic review and meta-analysis of the effectiveness of chatbots on lifestyle behaviours. Npj Digit Med 6(1):1–10. https://doi.org/10.1038/s41746-023-00856-1
Article  CAS  Google Scholar 
Steinhoff L, Arli D, Weaven S, Kozlenkova IV (2019) Online relationship marketing. J Acad Mark Sci 47(3):369–393. https://doi.org/10.1007/S11747-018-0621-6
Article  Google Scholar 
Straßmann C, Rosenthal-von der Pütten AM, Krämer NC (2018) With or against each other? The influence of a virtual agent’s (non)cooperative behavior on user’s cooperation behavior in the Prisoners’ Dilemma. Adv Hum-Comput Interact 2018:2589542. https://doi.org/10.1155/2018/2589542
Article  Google Scholar 
Sundar SS, Kim J (2019) Machine heuristic: when we trust computers more than humans with our personal information. In: CHI conference on human factors in computing systems, Association for Computing Machinery, New York, pp 1–9
Sundar SS, Liao M (2023) Calling BS on ChatGPT: reflections on AI as a communication source. J Mass Commun Q 25(2):165–180. https://doi.org/10.1177/15226379231167135
Article  Google Scholar 
Van Berlo ZMC, Meijers MHCC, Eelen J, Voorveld HAM, Eisend M (2023) When the medium is the message: a meta-analysis of creative media advertising effects. J Advert 53(2):278–295. https://doi.org/10.1080/00913367.2023.2186986
Article  Google Scholar 
Van den Akker OR, Van Assen MALM, Bakker M, Elsherif M, Wong TK, Wicherts JM (2023) Preregistration in practice: a comparison of preregistered and non-preregistered studies in psychology. Behav Res Methods 56:5424–5433. https://doi.org/10.3758/s13428-023-02277-0
Article  PubMed  PubMed Central  Google Scholar 
Van Pinxteren MM, Pluymaekers M, Lemmink JGAM (2020) Human-like communication in conversational agents: a literature review and research agenda. J Serv Manag 31(2):203–225. https://doi.org/10.1108/JOSM-06-2019-0175
Article  Google Scholar 
Verma V, Sharma D, Sheth J (2016) Does relationship marketing matter in online retailing? A meta-analytic approach. J Acad Mark Sci 44(2):206–217. https://doi.org/10.1007/s11747-015-0429-6
Article  Google Scholar 
Vevea JL, Coburn K, Sutton A (2019) Publication bias. In: Borenstein M, Cooper H, Hedges LV, Valentine JC (eds) The handbook of research synthesis and meta-analysis, 3rd edn. Russell Sage Foundation, New York, pp 383–429
Viechtbauer W (2007) Accounting for heterogeneity via random-effects models and moderator analyses in meta-analysis. J Psychol 215(2):104–121. https://doi.org/10.1027/0044-3409.215.2.104
Article  Google Scholar 
Viechtbauer W (2010) Conducting meta-analyses in R with the metafor package. J Stat Softw 36(3). https://doi.org/10.18637/jss.v036.i03
Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA (2012) Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In: Paper presented at 2nd ACM SIGHIT International Health Informatics Symposium, Miami, Florida, 28–30 January, 2012. Association for Computing Machinery, New York, p 819–824
Wetzel E, Greiff S (2018) The world beyond rating scales: why we should think more carefully about the response format in questionnaires. Eur J Psychol Assess 34(1):1–5. https://doi.org/10.1027/1015-5759/a000469
Article  Google Scholar 
Wilson DB (2023) Practical meta-analysis effect size calculator (Version 2023.11.27) [Computer software]. https://www.campbellcollaboration.org/research-resources/effect-size-calculator.html. Accessed 26 Sep 2024
Wirtz J, Patterson PG, Kunz WH, Gruber T, Lu VN, Paluch S, Martins A (2018) Brave new world: service robots in the frontline. J Serv Manag 29(5):907–931. https://doi.org/10.1108/JOSM-04-2018-0119
Article  Google Scholar 
Xu K, Chen M, You L (2023) The hitchhiker’s guide to a credible and socially present robot: two meta-analyses of the power of social cues in human-robot interaction. Int J Soc Robot 15:269–295. https://doi.org/10.1007/s12369-022-00961-3
Article  Google Scholar 
Zierau N, Elshan E, Visini C, Janson A (2020) A review of the empirical literature on conversational agents and future research directions. In: ICIS 2020 Proceedings, India (online), 13–16 December 2020
Zierau N, Hildebrand C, Bergner A, Busquet F, Schmitt A, Marco Leimeister J (2023) Voice bots on the frontline: voice-based interfaces enhance flow-like consumer experiences & boost service outcomes. J Acad Mark Sci 51(4):823–842. https://doi.org/10.1007/s11747-022-00868-5
Article  Google Scholar 
Download references
The author would like to thank Margot van der Goot, Zeph van Berlo, and the members of the UvA PhD Club for their valuable feedback on the protocol, Anne Bucher for her assistance in coding the records, Nora Wickelmaier for her statistical support, and Sonja Utz, Dominik Papies, and Angelica Henestrosa for their helpful input at different stages of the project.
Open Access funding enabled and organized by Projekt DEAL.
Leibniz-Institut für Wissensmedien, Tübingen, Germany
Stefanie Helene Klein
PubMed Google Scholar
SHK conceptualized and designed the study, collected and analyzed the data, interpreted the results, and prepared the manuscript.
Correspondence to Stefanie Helene Klein.
The author declares no competing interests.
This article does not contain any studies with human participants performed by the author.
This article does not contain any studies with human participants performed by the author.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
Klein, S.H. The effects of human-like social cues on social responses towards text-based conversational agents—a meta-analysis. Humanit Soc Sci Commun 12, 1322 (2025). https://doi.org/10.1057/s41599-025-05618-w
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1057/s41599-025-05618-w
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
Collection
Advertisement
Humanities and Social Sciences Communications (Humanit Soc Sci Commun)
ISSN 2662-9992 (online)
© 2026 Springer Nature Limited

source

Scroll to Top