Abstract: |
In 2001, EU defined Corporate Social Responsibility (CSR) as “a concept whereby companies integrate social and environmental concerns in their business operations and in their interaction with their stakeholders on a voluntary basis. Being socially responsible means not only fulfilling legal expectations, but also going beyond compliance and investing more into human capital, the environment, and the relations with stakeholders”. Following this definition, the CSR’ pillars are represented by environmental, social, and economic sustainability, and must be communicated to the society through appropriate reports. Sentiment analysis (SA) represents a fundamental sub-area of natural language processing for studying communication and classifying negative or positive opinions and emotions. Measuring sentiment identifies a task characterized by pitfalls related to the context of analysis, the methods, and the language. The lexicon-based techniques are less time- and resource-intensive than others since they are pre-built, polarized dictionaries that are either domain- or general knowledge-based. Two of the main obstacles are the lack of language resources (different from English) or the polarity classification that depends on the domain, seeing as the meanings of the words are related to the contexts. The strategic communication of CSR has no domain resources for investigating sentiment, neither in English nor in other languages. Thus, our contribution is placed within the sustainability framework, which is constantly evolving, and in a methodological setting characterized by limits and challenges. The innovative feature of our work lies in three aspects: 1) the investigation of an unmapped domain by means of a domain corpus-based approach and the building of a customized lexicon from a general pre-constructed dictionary; 2) the application for the Italian language; 3) the performance assessment of improvements through machine learning. More specifically, we use the corpus of a baseline sample of the social reports of Italian listed companies that closed the financial year on December 31, 2021, to implement the development of an algorithm for the building of a customized lexicon on CSR that implements Italian general lexicons through a multi-stage model that combines text analysis with social network analysis (SNA). We divided our data collection into five random samples under the machine learning perspective: one was utilized as a train set for the implementation, and four were used as test sets. The process revealed a notable increase in performance metrics across all samples. |