Sign In

A subscription to JoVE is required to view this content. Sign in or start your free trial.

In This Article

  • Summary
  • Abstract
  • Introduction
  • Protocol
  • Results
  • Discussion
  • Disclosures
  • Acknowledgements
  • Materials
  • References
  • Reprints and Permissions

Summary

This research aimed to make a comparison between L1-L2-English and L1-L2 Portuguese to check how much the effect of a foreign accent accounts for both metrics and prosodic-acoustic parameters, as well as for the choice of the target voice in a voice lineup.

Abstract

This research aims to examine both the prosodic-acoustic features and the perceptual correlates of foreign-accented English and foreign-accented Brazilian Portuguese and check how the speakers' productions of foreign and native accents are correlated to the listeners' perception. In the Methodology, we conducted a speech production procedure with a group of American speakers of L2 Brazilian Portuguese and a group of Brazilian speakers of L2 English, and a speech perception procedure in which we performed voice lineups for both languages.For the speech production statistical analysis, we ran Generalized Additive Models to evaluate the effect of the language groups on each class (metric or prosodic-acoustic) of features controlled for the smoothing effect of the covariate(s) of the opposite class. For the speech perception statistical analysis, we ran a Kruskal-Wallis test and a post-hoc Dunn's test to evaluate the effect of the voices of the lineups on the scores judged by the listeners. We nevertheless conducted acoustic (voice) similarity tests based on Cosine and Euclidean distances. Results showed significant acoustic differences between the language groups in terms of variability of the f0, duration, and voice quality. For the lineups, the results indicated that prosodic features of f0, intensity, and voice quality correlated to the listeners' perceived judgments.

Introduction

The accent is a salient and dynamic aspect of communication and fluency, both in the native language (L1) and in a foreign language (L2)1. Foreign accent represents the L2 phonetic features of a target language, and it can change (over time)in response to the speaker’s L2 experience, speaking style, input quality, exposition, among other variables. A foreign accent can be quantified as a (scalar) degree of difference between L2 speech produced by a foreign speaker and a local or reference accent of the target language2,3,4,5.

This research aims to examine both the prosodic-acoustic features and the perceptual correlates of foreign-accented English and foreign-accented Brazilian Portuguese (BP), as well as to check to which extent the speakers' productions of foreign and native accents are correlated to the listeners' perception. Prior research in the forensic field has demonstrated the robustness of vowels and consonants in foreign accent identification as either being stable for a long-term analysis of an individual (The Lo Case6) or referring to the high ID accuracy of a speaker (The Lindbergh Case7). However, the exploration of prosodic-acoustic features based on duration, fundamental frequency (f0, i.e., the acoustic correlate of pitch), intensity, and voice quality (VQ) has gained increasing attention8,9. Thus, the choice for prosodic-acoustic features in this study represents a promising avenue in the forensic phonetics field8,10,11,12,13.

The present research is supported by studies dedicated to foreign accents as a form of voice disguise in forensic cases14,15, as well as in the preparation of voice lineups for speaker recognition16,17. For instance, speech rate played an important role in the identification of German, Italian, Japanese, and Brazilian speakers of English18,19,20,21,22. Besides speech rate, long-term spectral and f0 features challenge L2 proficient speakers with familiarity with the target language because the brain and cognitive bases suffer a deficit in memory, attention, and emotion, reflecting in the speaker's phonetic performance during long speech turns23. The view of foreign accents in the forensic field is that the definition of what really sounds like a foreign accent depends much on the listener's unfamiliarity rather than having a none-to-extreme degree of foreign-accented speech24.

In the perceptual domain, a common forensic tool used since the mid-1990s for criminals' recognition is the Voice Lineup (auditory recognition), which is analogous to visual recognition used to identify a perpetrator in a crime scene16,25. In a voice lineup, the suspect's voice is presented alongside foils-voices similar in sociolinguistic aspects such as age, sex, geographical location, dialect, and cultural status-for identification by an earwitness. The success or failure of a voice lineup will depend on the number of voice samples and the sample durations25,26. Furthermore, for real-world samples, it is considered that the audio quality consistently impacts the accuracy of voice recognition. Poor audio quality can distort the unique characteristics of a voice27. In the case of voice similarity, fine phonetic detail based on f0 can confuse the listener during voice recognition28,29. Such acoustic features extend beyond f0 and encompass elements of duration, and spectral features of intensity and VQ30. This view of multiple prosodic features is crucial in the context of forensic voice comparison to ensure accurate speaker identification9,14,15,29,31.

In summary, studies in forensic phonetics have shown some variation regarding foreign accent identification over the last decades. On the one hand, a foreign accent does not seem to affect the process of identifying a speaker32,33 (especially if the speaker is unfamiliar with the target foreign accent34). On the other hand, there are findings in the opposite direction12,34,35.

Protocol

This work received approval from a human research ethics committee. Furthermore, informed consent was obtained from all participants involved in this study to use and publish their data.

1. Speech production

NOTE: We collected speech from a reading task on both 'L1 English-L2 BP' produced by Group 1: The American English (from the U.S.A.) Speakers (AmE-S), and on both 'L1 BP-L2 English' produced by Group 2: The Brazilian Speakers (Bra-S). See Figure 1 for a flowchart for speech production.

figure-protocol-827
Figure 1: Schematic flowchart for speech production. Please click here to view a larger version of this figure.

  1. Participants
    1. Determine the number of participants, the language (L1 English, L2 BP; L1 BP, L2 English), the sex (female or male), the age (mean, standard deviation), the group characteristics (professionals or undergraduate students), and the L2 proficiency level (advanced or well-advanced) for each group.
      NOTE: For the present research, both AmE-S and Bra-S were considered L2 proficient (both groups were qualified as B2-C136). The AmE-S had lived for two years in Brazil when the procedures were conducted. The Bra-S had lived more than two years in the U.S.A when the procedures were conducted. While living abroad, both groups used to speak their L2 for studying and working purposes at least 6 days a week for ~4-5 h a day.
    2. Allocate the participants to a comfortable and quiet room and present the reading material for each group.
  2. Data collection
    NOTE: Speech data must be collected from a reading task in the following languages: L1-English, and L2-BP; L1-BP, and L2-English. Let the participants read the texts beforehand if necessary.
    1. Recording procedures
      1. Record the speech data in a quiet place with appropriate acoustic conditions.
      2. Use a digital voice recorder (see the Table of Materials)37 and a unidirectional electromagnetic-isolated cardioid microphone (see the Table of Materials)38.
      3. Record the audio data in '.wav' form.
      4. Set up the sampling rate at 48 kHz and the quantization rate at 16 bits.
        NOTE: The audio format and the configuration for the sampling and quantization rates described in steps 1.2.1.3 and 1.2.1.4 are applied to ensure high quality and noise reduction to preserve the spectral features used for later acoustic analysis.
  3. Acoustic analysis
    NOTE: Divide acoustic analysis procedures into three steps: forced-alignment, realignment, and acoustic feature extraction.
    1. Write the linguistic transcription (in a '.txt' file) for each audio file.
    2. Tag the pair of '.txt'/'.wav' files with the same name (i.e., 'my_file.wav'/ 'my_file.txt').
      NOTE: To enhance the performance of the procedure outlined in section 1.3.7, it is highly recommended that the initial three characters of the ‘.txt/.wav’ file tags represent the Language, Dialect, or Accent, while the fourth to sixth characters denote the Sex (e.g., EL1FEM for English L1 Female). From the seventh character onward, the user should indicate the speaker number (e.g., 001 for the first speaker). Consequently, the first ‘.txt/.wav’ pair is EL1FEM001.
    3. Create a folder for each L1-L2 language.
      NOTE: A folder for L1-L2 English and a folder for L1-L2 BP.
    4. Certify that all file pairs of the same language are in the same folder.
    5. Conduct the forced alignment.
      1. Access the web interface of Munich Automatic Segmentation (MAUS) forced aligner (webMAUS)39 at https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface/Pipeline.
      2. Drag and drop each pair of .wav / .txt files from the folder to the dashed rectangle in Files (or click inside the rectangle, Figure 2A).
      3. Click the Upload button to upload the files into the aligner (see red arrow in Figure 2A).
      4. Select the following options in the Service options menu (Figure 2B): G2P-MAUS-PHO2SYL for Pipeline name; English (US) (for Language) if L1-L2 English data; Italian (IT) (for Language) if L1-L2 BP data.
        NOTE: We chose 'Italian' for the BP data because webMAUS does not provide pretrained acoustic models for BP forced alignment. The phonetic literature poses that Italian phonology has a somewhat comparable symmetric seven-vowel inventory, just like BP40, as well as consonantal acoustic similarities41,42.
      5. Keep the default options for 'Output format' and 'Keep everything'.
      6. Check the Run option box for accepting the terms of usage (see green arrow in Figure 2C).
      7. Click the Run Web Service button to run the uploaded files in the aligner.
        NOTE: For each audio file, MAUS forced aligner returns a Praat TextGrid object (a Praat pre-formatted '.txt' file containing the annotation of words, phonological syllables, and phones based on the linguistic transcription extracted from the '.txt' file described in step 1.3.1).
      8. Click the Download as ZIP-File button to download the TextGrid files as a zipped file (Figure 2C).
        NOTE: Make sure that the zipped TextGrid files are downloaded in the same folder as the audio files.
      9. Extract the TextGrid files for later realignment in the phonetic analysis software43.
    6. Conduct the realignment.
      1. Access and download the script for Praat VVUnitAligner44 from https://github.com/leonidasjr/VVunitAlignerCode_webMAUS/blob/main/VVunitAligner.praat.
      2. Certify that all file pairs of the same language and the VVUnitAligner script are in the same folder.
        NOTE: A folder for the L1-L2 English files and the VVunitAligner, and a folder for the L1-L2 BP files and the VVunitAligner.
      3. Open the phonetic analysis software.
      4. Click Praat | Open Praat script… to call the script from the object window.
      5. Click the Run button once.
        NOTE: A form called Phonetic syllable alignment containing the settings for using the script will pop up on the screen (Figure 3A).
      6. Click the Language button to choose from 'English (US),' 'Portuguese (BR),' 'French (FR),' or 'Spanish (ES)' languages.
      7. Click the Chunk segmentation button to choose from 'Automatic,' 'Forced (manual),' or 'None' segmentation procedure.
      8. Check the Save TextGrid files option to automatically save the new TextGrid files.
      9. Click Ok | Run buttons for a realignment of the phonetic units from step 1.3.5.7 .
        NOTE: For each audio file, VVUnitAligner will generate a new TextGrid file for section 1.3.7 (Figure 3B).
    7. Conduct the automatic extraction of the acoustic features.
      1. Access and download the script SpeechRhythmExtractor45 from https://github.com/leonidasjr/SpeechRhythmCode/blob/main/SpeechRhythmExtractor.praat for automatic extraction of the prosodic-acoustic features.
      2. Create a new folder and put SpeechRhythmExtractor along with all pairs of audio/TextGrid files of all languages.
      3. Open the phonetic analysis software.
      4. Click Praat | Open Praat script… to call the script from the object window.
      5. Click the Run button only once.
        NOTE: A form containing the script settings will pop up on the screen. In the boxes of the output '.txt' file names, rename the files accordingly or leave the default names.
      6. Check the voice quality parameters option to save the Output file VQ for voice quality (Figure 3C).
        NOTE: This second output file (the Output file VQ) contains the parameters of the difference between the 1st and the 2nd harmonics (H1-H2) and the Cepstral Prominence Peak (CPP)9.
      7. Check the Linguistic target option to choose from the labels 'Language,' 'Dialect,' or 'Accent' (Figure 3C).
      8. Check the Unit option to choose the f0 features in Hz or in Semitones (Figure 3C).
      9. Set up in the values for F0 threshold, the minimum and maximum f0 thresholds (Figure 3C).
        NOTE: Unless the research has specific purposes or pre-set specific audio features, it is strongly recommended to leave the parameters of step 1.3.7.9 with the default values.
      10. Click Ok | Run for the automatic extraction of the acoustic features .
        NOTE: The script SpeechRhythmExtractor returns a tab-delimited '.txt' file (Output file/ Output file VQ) containing the acoustic features extracted from the speakers.
  4. Statistical analysis
    1. Upload the spreadsheet containing the acoustic features into the R46 environment (or any statistical software/environment of choice).
    2. Perform Generalized Additive Models (GAMs) non-parametric statistics.
      1. Perform GAMs in R.
      2. Type the following commands and press Enter.
        library(mgcv)
        model = gam(the metric/prosodic-acoustic feature in analysis ~ the Language + s(the chosen metric feature, by = the Language) + other metric/prosodic-acoustic features, data = the data frame)
        NOTE: We decided to perform the test statistics of the protocol in R programming language because of its increasing popularity among phoneticians (and linguists) in the academic community. R has been largely used in phonetic fieldwork research47. Keep in mind that step 1.4.2.2 contains a pseudo-code. Write the code according to the research variables.

figure-protocol-12344
Figure 2: Screenshot from phonetic alignment using MAUS forced aligner. (A) The dashed rectangle is meant for dragging and dropping 'my_file.wav'/' my_file.txt' files or clicking inside for searching such files from the folder; the upload button is indicated by the red arrow. (B) The uploaded files from panel A (see blue arrow), the pipeline to be used, the language for the pairs of files, the file format to be returned, and a 'true/false' button for keeping all files. (C) The checkbox terms of Usage (see green arrow), the Run Web Services button, and the Results (TextGrid files to be downloaded). Please click here to view a larger version of this figure.

figure-protocol-13453
Figure 3: Screenshot of the realignment procedure. (A) Input settings form for the realignment procedure. (B) Partial waveform, broadband spectrogram with f0 (blue) contour, and six tiers segmented (and labeled) as tier 1: units of vowel onset to the next vowel onset (V_to_V); vowel onset (V_to_V); tier 2: units of vowel (V), consonant (C), and pause (#); tier 3: phonic representations V_to_V; tier 4: some words from the text; tier 5: some chunks (CH) of speech from the text; tier 6: tonal tier containing the highest (H), and the lowest (L) tone of each speech chunk produced by a female AmE-S. (C) Input settings for the automatic extraction of the acoustic features. Please click here to view a larger version of this figure.

2. Speech perception

NOTE: We carried out four voice lineups in English with American listeners and four lineups in BP with Brazilian listeners. See Figure 4 for a flowchart for speech perception.

figure-protocol-14959
Figure 4: Schematic flowchart for speech perception. Please click here to view a larger version of this figure.

  1. Participants
    1. Choose different participants for each group from the ones who participated in the speech production protocol.
      NOTE: Two groups of participants were selected for this part of the protocol: Group 1: The American English (from the U.S.A.) Listeners (AmE-L), and Group 2: The Brazilian Listeners (Bra-L). For the present research, both AmE-L and Bra-L were considered L2 proficient (both groups were qualified as B2-C136 The AmE-L had lived two years in Brazil when the procedures were conducted. The Bra-L had lived more than two years in the U.S.A when the procedures were conducted. While living abroad, both groups used to speak their L2 for studying and working purposes (at least six days a week for about 4 to 5 hours a day).
    2. Determine the number of participants, the language (L1 English, L2 BP; L1 BP, L2 English), the sex (female or male), the age (mean, standard deviation), the group characteristics (professionals or undergraduate students) and the L2 proficiency level for each group.
  2. The voice lineups
    NOTE: Divide the voice lineups' procedures into two different steps: preparing and running the voice lineups.
    1. Prepare four voice lineups for English and four for BP.
      1. Get audio files from the speakers of section 1: Speech production.
      2. Certify that the audio files of each language factor are in separate folders.
      3. Randomly choose six voice chunks in L1 English or L1 BP.
        NOTE: Six voices in the lineup represent one target voice and five foils.
      4. Choose a voice chunk in L2 English or L2 BP from one of the speakers included in step 2.2.1.3.
        NOTE: The voice chunk in step 2.2.1.4 is the reference voice. Chunks must be approximately 20 s long.8.
      5. Access and download the script for Praat CreateLineup48 from https://github.com/pabarbosa/prosody-scripts-CreateLineUp.
      6. Certify that the L2 reference voice, the L1 foils, and the L1 target voice are in the same folder before running the CreateLineup script (Figure 5).
      7. Open the phonetic analysis software.
      8. From the object window, click Praat | Open Praat script… to call the script.
      9. Click Run | Run.
        NOTE: The script returns a file in the following order: (the L2 reference voice) + (the L1 target voice and the foils randomly distributed) (Figure 5).
    2. Running the voice lineups
      1. Create an online space to host the lineups on any platform of choice (e.g., SurveyMonkey. See the Table of Materials) for remotely conducting the voice lineups.
      2. Access the online space link.
      3. Upload the files returned from the CreateLineup script to the platform.
      4. Run the procedure before the participants to test every step.
        NOTE: It is recommended to previously access the link and run the lineups to check if everything is working out normally.
  3. Statistical analysis
    1. Upload the spreadsheet containing the scores of the listeners' judgments into the R environment (or any statistical software/environment of choice).
      1. Perform the Kruskal-Wallis test in R.
      2. Type the following commands and press Enter.
        ​model = kruskal.test(judgments ~ each voice lineup, data = the data frame)
    2. Perform a post-hoc Dunn's test.
      1. Perform Dunn's test in R.
      2. Type the following commands and press Enter.
        library(FSA)
        model = dunnTest(judgements ~ each voice lineup, data = data, method = "bonferroni")
        NOTE: The codes in steps 2.3.1.2 and 2.3.2.2 are pseudo-codes (see NOTE 1.4.2.2).
  4. Acoustic similarity analysis
    1. Select the lineups (cf. NOTE in step 2.2.1.3) that presented non-significant differences between the target and any of the foils.
    2. Repeat procedures of steps 1.3.1 to 1.3.7.5, and steps 1.3.7.8 to 1.3.7.10.
    3. Access and download the script for Python49, AcousticSimilarity_cosine_euclidean50 from https://github.com/leonidasjr/AcousticSimilarity/blob/main/AcousticSmilarity_cosine_euclidean.py.
      NOTE: The script returns three matrices (in '.txt' and '.csv'): one for Cosine similarity51,52, one for Euclidean distance52,53, and one for Transformed Euclidean distance values, as well as a pairwise comparison between the target voice and each foil.
    4. Certify that the script is downloaded in the same folder of the lineup dataset.
    5. Click on Open file… button to call the script.
    6. Click Run | Run Without Debugging buttons.
      NOTE: The second Run button may be tagged as Run or Run Without Debugging or Run Script. They all execute the same commands. It simply depends on the Python environment used.
    7. Perform voice similarity tests based on acoustic features.
      ​NOTE: Cosine similarity (or cosine distance) is a technique applied in Artificial Intelligence (AI), particularly in machine learning in automatic speech recognition (ASR) systems. It is a measure of similarity between zero and one. A cosine similarity close to one means that two voices are quite likely to be similar. A cosine similarity close to zero means that the voices are quite likely to be dissimilar52. The Euclidean distance, often referred to as Euclidean similarity, is also widely used in AI, machine learning, and ASR. It represents the straight-line distance between two points in Euclidean space53, i.e., the closer the points (shorter values of distance), the more similar the voices are. For a clearer understanding of the reported results of both techniques, we performed a transformation of the Euclidean distance raw scores into values from zero (less voice similarity) to one (more voice similarity)54.

figure-protocol-22293
Figure 5: Directory setup for speech perception. Lineup folders. Each folder contains Six L1 voices, the L2 target voice, the "CreateLineup" script, and the voice lineup audio file (returned after running the script). Please click here to view a larger version of this figure.

Results

Results for speech production
In this section, we described the performance of the statistically significant prosodic-acoustic features and rhythm metrics. Such prosodic features were speech, articulation, and pause rates, which are related to duration, and shimmer, which is related to voice quality. The rhythm metrics were standard deviation (SD) of syllable duration, SD of consonant, SD of vocalic or consonantal duration, and the variation coefficient of syllable duration (see the Supplem...

Discussion

The current protocol presents a novelty in the field of (forensic) phonetics. It is divided into two phases: one based on production (acoustic analysis) and one based on perception (judgement analysis). The production analysis phase comprises the Data preparation and Forced alignment, Realignment, and Automatic extraction of prosodic-acoustic features besides the statistics. This protocol connects the stage of data collection to the data analysis in a faster and more efficient way than the traditional protocols based on ...

Disclosures

The authors have no conflicts of interest to declare.

Acknowledgements

This study was supported by the National Council for Scientific and Technological Development - CNPq, grant no. 307010/2022-8 for the first author, and grant no. 302194/2019-3 for the second author. The authors would like to express their sincere gratitude to the participants of this research for their generous cooperation and invaluable contributions.

Materials

NameCompanyCatalog NumberComments
CreateLineupPersonal collection#Script for praat for voice lineups preparation
Dell I3 (with solid-state drive - SSD) Dell#Laptop computer
PraatPaul Boersma & David Weenink#Software for phonetic analysis
Python 3Python Software Foundation#Interpreted, high-level, general-purpose programming language 
RThe R Project for Statistical Computing#Programming language for stattistical computing
Shure Beta SM7BShure#Microphone
SpeechRhythmExtractorPersonal collection#Script for praat for automatic extraction of acoustic features
SurveyMonkeySurveyMonkey Inc.#Assemble of free customizable surveys, as well as a suite of back-end programs that include data analysis, sample selection, debiasing, and data representation.
Tascam DR-100 MKIITascam#Digital voice recorder
The Munich Automatic Segmentation System MAUSUniversity of Munich#Forced-aligner of audio (.wav) and linguistic information (.txt) files
VVUnitAlignerPersonal collection#Script for praat for automatic realignment and post-processing of phonetic units

References

  1. Moyer, A. . Foreign Accent: The Phenomenon of Non-native Speech. , (2013).
  2. Munro, M., Derwing, T. Foreign accent, comprehensibility and intelligibility, redux. J Second Lang Pronunciation. 6 (3), 283-309 (2020).
  3. Levis, J. . Intelligibility, Oral Communication, and the Teaching of Pronunciation. , (2018).
  4. Munro, M. . Applying Phonetics: Speech Science in Everyday Life. , (2022).
  5. Gut, U., Muller, C. . Speaker Classification. , 75-87 (2007).
  6. Rogers, H. Foreign accent in voice discrimination: a case study. Forensic Linguistics. 5 (2), 203-208 (1998).
  7. Solan, L., Tiersma, P. Hearing Voices: Speaker Identification in Court. Hastings Law Journal. 54, 373-436 (2003).
  8. Alcaraz, J. The long-term average spectrum in forensic phonetics: From collation to discrimination of speakers. Estudios de Fonética Experimental / Journal o Experimental Phonetics. 32, 87-110 (2023).
  9. Silva, L., Barbosa, P. A. Voice disguise and foreign accent: Prosodic aspects of English produced by Brazilian Portuguese speakers. Estudios de Fonética Experimental / Journal o Experimental Phonetics. 32, 195-226 (2023).
  10. Munro, M., Derwing, T. Modeling perceptions of the accentedness and comprehensibility of L2 speech. Studies in Second Language Acquisition. 23 (4), 451-468 (2001).
  11. Keating, P., Esposito, C. Linguistic Voice Quality. Working Papers in Phonetics. , 85-91 (2007).
  12. Niebuhr, O., Skarnitzl, R., Tylečková, L. The acoustic fingerprint of a charismatic voice -Initial evidence from correlations between long-term spectral features and listener ratings. Proceedings of Speech Prosody. , 359-363 (2018).
  13. Segundo San, E. International survey on voice quality: Forensic practitioners versus voice therapists. Estudios de Fonética Experimental. 29, 8-34 (2021).
  14. Farrús, M. Fusing prosodic and acoustic information for speaker recognition. International Journal of Speech, Language and the Law. 16 (1), 169 (2009).
  15. Farrús, M. Voice disguise in automatic speaker recognition. ACM Computing Surveys. 51 (4), 1-2 (2018).
  16. Nolan, F. A recent voice parade. The International Journal of Speech, Language and the Law. 10 (2), 277-291 (2003).
  17. McDougall, K., Bernardasci, C., Dipino, D., Garassino, D., Negrinelli, S., Pellegrino, E., Schmid, S. Ear-catching versus eye-catching? Some developments and current challenges in earwitness identification evidence. Speaker Individuality in Phonetics and Speech Sciences. , 33-56 (2021).
  18. Gut, U. Rhythm in L2 speech. Speech and Language Technology. 14 (15), 83-94 (2012).
  19. Urbani, M. Pitch Range in L1/L2 English. An Analysis of F0 using LTD and Linguistic Measures. Coop. , (2012).
  20. Gonzales, A., Ishihara, S., Tsurutani, C. Perception modeling of native and foreign-accented Japanese speech based on prosodic features of pitch accent. J Acoust Soc Am. 133 (5), 3572 (2013).
  21. Silva, L., Barbosa, P. A. Speech rhythm of English as L2: an investigation of prosodic variables on the production of Brazilian Portuguese speakers. J Speech Sci. 8 (2), 37-57 (2019).
  22. Silva, L., Barbosa, P. A. Foreign accent and L2 speech rhythm of English a pilot study based on metric and prosodic parameters. 1, 41-50 (2023).
  23. Costa, A. . El cerebro bilingüe: La neurociencia del lenguaje. , (2017).
  24. Eriksson, A. Tutorial on forensic speech science: Part I. Forensic Phonetics. , (2005).
  25. Harvey, M., Giroux, M., Price, H. Lineup size influences voice identification accuracy. Applied Cognitive Psychology. 37 (5), 42-89 (2023).
  26. Pautz, N., et al. Identifying unfamiliar voices: Examining the system variables of sample duration and parade size. Q J Exp Psychol (Hove). 76 (12), 2804-2822 (2023).
  27. McDougall, K., Nolan, F., Hudson, T. Telephone transmission and earwitnesses: Performance on voice parades controlled for voice similarity. Phonetica. 72 (4), 257-272 (2015).
  28. Nolan, F., McDougall, K., Hudson, T. Some acoustic correlates of perceived (dis) similarity between same-accent voices. Proceedings of the International Congress of Phonetic Sciences (ICPhS 2011). , 1506-1509 (2011).
  29. Sheoran, S., Mahna, D. Voice identification and speech recognition: an arena of voice acoustics. Eur Chem Bull. 12 (5), 50-60 (2023).
  30. Hudson, T., McDougall, K., Hughes, V., Knight, R. -. A., Setter, J. Forensic phonetics. The Cambridge Handbook of Phonetics. , 631-656 (2021).
  31. Eriksson, A., Llamas, C., Watt, D. The disguised voice: Imitating accents or speech styles and impersonating individuals. Language and identities. , 86-96 (2010).
  32. Köster, O., Schiller, N. Different influences of the native language of a listener on speaker recognition. Forensic Linguistics. 4 (1), 18-27 (1997).
  33. Köster, O., Schiller, N., Künzel, H. The influence of native-language background on speaker recognition. , 306-309 (1995).
  34. Thompson, C. P. A language effect in voice identification. Applied Cognitive Psychology. 1, 121-131 (1987).
  35. San Segundo, E., Univaso, P., Gurlekian, J. Sistema multiparamétrico para la comparación forense de hablantes. Estudios de Fonética Experimental. 28, 13-45 (2019).
  36. Council of Europe. . Common European Framework of Reference for Languages: Learning, Teaching, Assessment. , (2001).
  37. . How to use Tascam DR-100 MKII: Getting started Available from: https://www.youtube.com/watch?v=O2E72uV9fWc (2018)
  38. . Getting the most from your Shure SM58 microphone Available from: https://www.youtube.com/watch?v=wweNufW7EXA (2020)
  39. Multilingual processing of speech via web services. Computer Speech & Language Available from: https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface/WebMAUSBasic (2017)
  40. Escudero, P., Boersma, P., Rauber, A., Bion, R. A cross-dialect acoustic description of vowels: Brazilian and European Portuguese. J Acoust Soc Am. 126 (3), 1379-1393 (2009).
  41. Stevens, M., Hajek, J. Post-aspiration in standard Italian: some first cross-regional acoustic evidence. , 1557-1560 (2011).
  42. Barbosa, P., Madureira, S. . Manual de Fonética Acústica Experimental: aplicações a dados do português. , (2015).
  43. . Praat: Doing phonetics by computer. (Version 6.1.38) [Computer program] Available from: https://www.praat.org/ (1992-2021)
  44. . VVUnitAligner. [Computer program] Available from: https://github.com/leonidasjr/VVunitAlignerCode_webMAUS (2022)
  45. . SpeechRhythmExtractor [Computer program] Available from: https://github.com/leonidasjr/VowelCode (2019-2023)
  46. . A language and environment for statistical computing. R Foundation for Statistical Computing Available from: https://www.R-project.org/ (2023)
  47. Pigoli, D., Hadjipantelis, P. Z., Coleman, J. Z., Aston, J. The statistical analysis of acoustic phonetic data. Journal of the Royal Statistical Society. 67 (5), 1103-1145 (2018).
  48. CreateLineup [Computer program]. Available from: https://github.com/pabarbosa/prosody-scripts-CreateLineUp (2021)
  49. Van Rossum, G., Drake, F. L. . Python 3 Reference Manual. , (2009).
  50. . AcousticSimilarity_cosine_euclidean. [Computer program] Available from: https://github.com/leonidasjr/AcousticSimilarity/blob/main/AcousticSmilarity_cosine_euclidean.py (2024)
  51. Gerlach, L., McDougall, K., Kelly, F., Alexander, A. Automatic assessment of voice similarity within and across speaker groups with different accents. , 3785-3789 (2023).
  52. Gahman, N., Elangovan, V. A comparison of document similarity algorithms. International Journal of Artificial Intelligence and Applications. 14 (2), 41-50 (2023).
  53. San Segundo, E., Tsanas, A., Gómez-Vilda, P. Euclidean distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and Filter voice characteristics. Forensic Science International. 270, 25-38 (2017).
  54. Singh, M. K., Singh, N., Singh, A. K. Speaker's voice characteristics and similarity measurement using Euclidean distances. , 317-322 (2019).
  55. Darling-White, M., Banks, W. Speech rate varies with sentence length in typically developing children. J Speech Lang Hear Res. 64 (6), 2385-2391 (2021).
  56. Barbosa, P. A. . Incursões em torno do ritmo da Fala. , (2006).
  57. Golestani, N., Pallier, C. Anatomical correlates of foreign speech sound production. Cereb Cortex. 17 (4), 929-934 (2007).
  58. Trouvain, J., Fauth, C., Möbius, B. Breath and non-breath pauses in fluent and disfluent phases of German and French L1 and L2 read speech. Proceedings of Speech Prosody. , 31-35 (2016).
  59. Waaning, J. . The Lombard effect: the effects of noise exposure and being instructed to speak clearly on speech acoustic parameters. [Master's thesis]. , (2021).
  60. Villegas, J., Perkins, J., Wilson, I. Effects of task and language nativeness on the Lombard effect and on its onset and offset timing. J Acoust Soc Am. 149 (3), 1855 (2021).
  61. Marcoux, K., Ernestus, M. Differences between native and non-native Lombard speech in terms of pitch range. Proceedings of the 23rd International Congress on Acoustics(ICA 2019). , 5713-5720 (2019).
  62. Marcoux, K., Ernestus, M. Pitch in native and non-native Lombard speech. Proceedings of the International Congress of Phonetic Sciences (ICPhS 2019). , 2605-2609 (2019).
  63. Marcoux, K., Ernestus, M. Acoustic characteristics of non-native Lombard speech in the DELNN corpus. Journal of Phonetics. 102, 1-25 (2024).
  64. Gil, J., San Segundo, E., Garayzábal, M., Jiménez, M., Reigosa, M. La cualidad de voz en fonética judicial [Voice quality in judicial phonetics]. Lingüística Forense: la Lingüística en el ámbito legal y policial. , 154-199 (2014).
  65. Gil, J., San Segundo, E., Penas-báñez, M. A. El disimulo de la cualidad de voz en fonética judicial: Estudio perceptivo de la hiponasalidad [Voice quality disguise in judicial phonetics: A perceptual study of hyponasality. Panorama de la fonética española actual. , 321-366 (2013).
  66. Passeti, R., Madureira, S., Barbosa, P. What can voice line-ups tell us about voice similarity. Proceedings of the International Congress of Phonetic Sciences (ICPhS 2023). , 3765-3769 (2023).
  67. Nolan, F. The DyViS database: Style-controlled recordings of 100 homogeneous speakers for forensic phonetic research. International Journal of Speech Language and the Law. 16 (1), 31-57 (2009).
  68. Caroll, D. . Psychology of Language. , (1994).
  69. Ortega-Llebaria, M., Silva, L., Nagao, J. Macro- and micro-rhythm in L2 English: Exploration and Refinement of Measures. Proceedings of the International Congress of Phonetic Sciences (ICPhS 2023). , 1582-1586 (2023).
  70. Fernández-Trinidad, M. Hacia la aplicabilidad de la cualidad de la voz en fonética judicial. Loquens. 9 (1-2), 1-11 (2022).
  71. Brouwer, S. The role of foreign accent and short-term exposure in speech-in-speech recognition. Atten Percep Psychophys. 81, 2053-2062 (2019).
  72. Love, R., Wright, D. Specifying challenges in transcribing covert recordings: Implications for forensic transcription. Front Commun. 6, 1-14 (2021).
  73. Meer, P. Automatic alignment for New Englishes: Applying state-of-the-art aligners to Trinidadian English. J Acoust Soc Am. 147 (4), 2283-2294 (2020).
  74. McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., Sonderegger, M. Montreal Forced Aligner: trainable text-speech alignment using Kaldi. , 498-502 (2017).
  75. . Montreal Forced Aligner: MFA Tutorial Available from: https://zenodo.org/records/7591607 (2023)

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Explore More Articles

Foreign AccentForensic Speaker IdentificationVoice LineupsProsodic acoustic FeaturesSpeech ProductionL2 Brazilian PortugueseL2 EnglishGeneralized Additive ModelsKruskal Wallis TestAcoustic Similarity TestsListener PerceptionF0 VariabilityDurationVoice Quality

This article has been published

Video Coming Soon

JoVE Logo

Privacy

Terms of Use

Policies

Research

Education

ABOUT JoVE

Copyright © 2025 MyJoVE Corporation. All rights reserved