The Influence of Reference Corpus Size on Wordsmith Tools Keywords Extraction

Authors

  • Tony Berber Sardinha

Keywords:

WordSmith Tools, KeyWords, Corpus Linguistics, reference corpus size.

Abstract

A KeyWords analysis (using WordSmith Tools) enables the discovery of lexical items which reveal the main lexical sets in a text or corpus. Such an analysis requires that a reference corpus be compared to the corpus the researcher intends to describe (the study corpus). This paper presents a mathematical method for finding out the influence of reference corpus size on the number of key words extracted by the program. The results reveal that a reference corpus that is at least five times as large as the study corpus allows for drawing an amount of key words that is statistically equivalent to larger reference corpora, thus suggesting five times (as larger as the study corpora) as the minimum order of magnitude for reference corpora.

Author Biography

Tony Berber Sardinha

Tony Berber Sardinha is Associate Professor in both the Postgraduate Program in Applied Linguistics and the Linguistics Department, Catholic University of São Paulo, and a researcher with the Brazilian National Research Council (CNPq). His main interest is in the area of Corpus Linguistic.

Issue

Section

Papers