Exploiting Corpora for Extracting and Describing Specialized Lexicon
Towards a Solid and Sustained Methodology
DOI:
https://doi.org/10.11606/issn.2236-4242.v33i1p69-104Keywords:
Specialized Lexicon Extraction, Methodology, Corpora, Concordances, CollocationsAbstract
The use of corpora for specialized lexicon extraction is a common and consensual method for building lexical resources. However, the methodologies used to achieve this are not openly discussed, rendering the comparison and determination of robust approaches difficult. In order to fill in this gap, in this paper we present and discuss a detailed methodology for extracting specialized lexicon from corpus, combining linguistic and statistical approaches. The proposed method uses specialized and monitor corpora and comprises i) frequency information analyses; ii) concordances and collocations extraction; and iii) textual organization information; accounting for core single and multiword expressions and salient semantic relations extraction. This way, our goal is the determination of a solid and accurate list of potential specialized lexical units that will allow for a swifter final validation and for maximizing the informational value of the interaction with the experts.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2020 Linha D'Água
![Creative Commons License](http://i.creativecommons.org/l/by-nc/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The Editorial Board authorizes free access to and distribution of published contentes, provided that the source is cited, that is, granding credit to the authors and Linha D'Água and preserving the full text. The author is allowed to place the final version (postprint / editor’s PDF) in an institutional/thematic repositor or personal page (site, blog), immediately after publication, provided that it is available for open access and comes without any embargo period. Full reference should be made to the first publication in Linha D'Água. Access to the paper should at least be aligned with the access the journal offers.
As a legal entity, the University of São Paulo at Ribeirão Preto School of Philosophy, Sciences and Languages owns and holds the copyright deriving from the publication. To use the papers, Paidéia adopts the Creative Commons Licence, CC BY-NC non-commercial attribution. This licence permits access, download, print, share, reuse and distribution of papers, provided that this is for non-commercial use and that the source is cited, giving due authorship credit to Linha D'Água. In these cases, neither authors nor editors need any permission.
Partial reproduction of other publications
Citations of more than 500 words, reproductions of one or more figures, tables or other illustrions should be accompanied by written permission from the copyright owner of the original work with a view to reproduction in Linha D'Água. This permission has to be addressed to the author of the submitted manuscript. Secondarily obtained rights will not be transferred under any circumstance.
How to Cite
Funding data
-
Fundação para a Ciência e a Tecnologia
Grant numbers PD/BD/128131/2016