L’analisi testuale della disclosure finanziaria: dal machine learning al deep learning


This paper provides a scoping review of the literature on textual analysis in accounting — that is, the application of natural language processing to textual data to measure disclosure tone and readability, to determine similarities or differences in disclosure venues, to assess forward-looking statements, and to detect topics — with a focus on developments over the last decade. In the review, we analyze key contributions on the models developed by prior literature to analyze textual data in accounting, which are based on machine learning, and recently on deep learning. Finally, we provide an overview of areas in which our understanding is still limited and discuss opportunities for future research.


  • AGGARWAL, C. C. 2018. Machine learning for text, Springer, Berlino.
  • ARKSEY, H., O’MALLEY, L. 2005. “Scoping studies: towards a methodological framework”. International Journal of Social Research Methodology, 8(1), 19-32.
  • AMERICAN ACCOUNTING ASSOCIATION. 1977. “Proposed activities for the decade 1976-1986”. The Accounting Review, 1-3, 5-14.
  • BENGIO, Y., DUCHARME, R., VINCENT, P. 2000. “A neural probabilistic language model”. Advances in neural information processing systems, 13, 1137-1155.
  • BERTOMEU, J., CHEYNEL, E., FLOYD, E., PAN, W. 2021. “Using machine learning to detect misstatements”. Review of Accounting Studies, 26(2), 468-519.
  • BLEI, D. M., NG, A. Y., JORDAN, M. I. 2003. “Latent dirichlet allocation”. Journal of Machine Learning Research, 3, 993-1022.
  • BLOOMFIELD, R., NELSON, M. W., AND SOLTES, E. 2016. “Gathering data for archival, field, survey, and experimental accounting research”. Journal of Accounting Research, 54(2), 341-395.
  • BOCHKAY, K., HALES, J., CHAVA, S. 2020. “Hyperbole or reality? Investor response to extreme language in earnings conference calls”. The Accounting Review, 95(2), 31-60.
  • BONSALL, S. B., LEONE, A. J., MILLER, B. P., RENNEKAMP, K. 2017. “A plain English measure of financial reporting readability”. Journal of Accounting and Economics, 63(2-3), 329-357.
  • BREIMAN, L. 2001. “Random forests”. Machine learning, 45(1), 5-32.
  • BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A., STONE, C. J. 2017. Classification and regression trees. Routledge, Oxfordshire, Regno Unito.
  • BROWN, N. C., CROWLEY, R. M., ELLIOTT, W. B. 2020. “What are you saying? Using topic to detect financial misreporting”. Journal of Accounting Research, 58(1), 237-291.
  • BROWN, S. V., HINSON, L. A., TUCKER, J. W. 2021. “Financial statement adequacy and firms’ MD&A disclosures”. SSRN Working paper.
  • BROWN, S. V. KNECHEL, W. R. 2016. “Auditor-client compatibility and audit firm selection”. Journal of Accounting Research, 54(3), 725-775.
  • BROWN, S. V. TUCKER, J. W. 2011. “Large-sample evidence on firms’ year-over-year MD&A modifications”. Journal of Accounting Research, 49(2), 309-346.
  • CAMPBELL, J. L., CHEN, H., DHALIWAL, D. S., LU, H.-M., STEELE, L. B. 2014. “The information content of mandatory risk factor disclosures in corporate filings”. Review of Accounting Studies, 19(1), 396-455.
  • CHARTERED ASSOCIATION OF BUSINESS SCHOOLS. 2018. Academic Journal Guide 2021.
  • COLE, C. J. JONES, C. L. 2005. “Management discussion and analysis: A review and implications for future research”. Journal of Accounting Literature, 24, 135-174.
  • DAUDT, H. M. L., VAN MOSSEL, C., SCOTT, S. J. 2013. “Enhancing the scoping study methodology: A large, interprofessional team’s experience with Arksey and O’Malley’s framework”. BMC Medical Research Methodology, 13-48.
  • DAVIS, A. K., GE, W., MATSUMOTO, D., ZHANG, J. L. 2015. “The effect of manager-specific optimism on the tone of earnings conference calls”. Review of Accounting Studies, 20(2), 639-673.
  • DAVIS, A. K., TAMA-SWEET, I. 2012. “Managers’ use of language across alternative disclosure outlets: Earnings press releases versus MD&A”. Contemporary Accounting Research, 29(3), 804-837.
  • DeVLIN, J., CHANG, M.-W., LEE, K., TOUTANOVA, K. 2018. “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv preprint ar- Xiv:1810.04805.
  • DINH, T., HUSMANNA, A., MELLONI, G. 2022. “Corporate sustainability reporting in Europe: A scoping review. Accounting in Europe, in corso di pubblicazione.
  • DONOVAN, J., JENNINGS, J., KOHARKI, K., LEE, J. 2021. “Measuring credit risk using qualitative disclosure”. Review of Accounting Studies, 26(2), 815-863.
  • FELDMAN, R., GOVINDARAJ, S., LIVNAT, J., SEGAL, B. 2010. “Management’s tone change, post earnings announcement drift and accruals”. Review of Accounting Studies, 15(4), 915-953.
  • FRIEDMAN, J., HASTIE, T., TIBSHIRANI, R. 2001. The elements of statistical learning, Springer series in Statistics, New York.
  • GAULIN, M., PENG, X. 2021. “Compensation disclosure: A study via semantic similarity”. SSRN Working paper.
  • GOODFELLOW, I., BENGIO, Y., COURVILLE, A. 2016. Deep learning. MIT press, Cambridge, Massachusetts.
  • GUAY, W., SAMUELS, D., TAYLOR, D. 2016. “Guiding through the fog: Financial statement complexity and voluntary disclosure”. Journal of Accounting and Economics, 62(2- 3), 234-269.
  • GUNNING, R. 1969. “The fog index after twenty years”. Journal of Business Communication, 6(2), 3-13.
  • HASTIE, T., TIBSHIRANI, R., FRIEDMAN, J. 2009. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, New York
  • HEALY, P. M., HUTTON, A. P., PALEPU, K. G. 1999. “Stock performance and intermediation changes surrounding sustained increases in disclosure”. Contemporary Accounting Research, 16(3), 485-520.
  • HEALY, P. M., PALEPU, K. G. 2001. “Information asymmetry, corporate disclosure, and the capital markets: A review of the empirical disclosure literature”. Journal of Accounting and Economics, 31, 405-440.
  • HEITMANN, M., SIEBERT, C., HARTMANN, J., SCHAMP, C. 2020. “More than a feeling: Benchmarks for sentiment analysis accuracy”. SSRN Working paper.
  • HENRY, E. 2008. “Are investors influenced by how earnings press releases are written?” Journal of Business Communication, 45(4), 363-407.
  • HOBERG, G., LEWIS, C. 2017. “Do fraudulent firms produce abnormal disclosure?” Journal of Corporate Finance, 43, 58-85.
  • HOBERG, G., PHILLIPS, G. 2016. “Text-based network industries and endogenous product differentiation”. Journal of Political Economy, 124(5), 1423-1465.
  • HUANG, A. H., LEHAVY, R., ZANG, A. Y., ZHENG, R. 2017. “Analyst information discovery and interpretation roles: A topic modeling approach”. Management Science, 64(6), 2833- 2855.
  • HUANG, A. H., WANG, H., YANG, Y. 2022. “Finbert: A large language model for extracting information from financial text”. Contemporary Accounting Research, in corso di pubblicazione.
  • HUANG, X., TEOH, S. H., ZHANG, Y. 2014. “Tone management”. The Accounting Review, 89(3), 1083-1113.
  • JONES, M. J., SHOEMAKER, P. A. 1994. “Accounting narratives: A review of empirical studies of content and readability”. Journal of Accounting Literature, 13, 142-163.
  • KOTHARI, S. P., LI, X., SHORT, J. E. 2009. “The effect of disclosures by management, analysts, and business press on cost of capital, return volatility, and analyst forecasts: A study using content analysis”. The Accounting Review, 84(5), 1639-1670.
  • KUSNER, M., SUN, Y., KOLKIN, N., WEINBERGER, K. 2015. “From word embeddings to document distances”. International conference on machine learning, 957-966.
  • LAWRENCE, A. 2013. “Individual investors and financial disclosure”. Journal of Accounting and Economics, 56(1), 130-147.
  • LE, Q., MIKOLOV, T. 2014. “Distributed representations of sentences and documents”. International conference on machine learning, 1188-1196.
  • LECUN, Y., BENGIO, Y., ET AL. 1995. “Convolutional networks for images, speech, and time series”. The handbook of brain theory and neural networks, 3361(10), 255-258.
  • LECUN, Y., BOTTOU, L., BENGIO, Y., HAFFNER, P. 1998. “Gradient-based learning applied to document recognition”. Proceedings of the IEEE, 86(11), 2278-2324.
  • LEUZ, C., VERRECCHIA, R. E. 2000. “The economic consequences of increased disclosure”. Journal of Accounting Research, 38, 91-124.
  • LEWIS, C. M., GROSSETTI, F. 2022. “A statistical approach for optimal topic model identification”. Journal of Machine Learning Research, 23(58), 1-20.
  • LI, F. 2008. “Annual report readability, current earnings, and earnings persistence”. Journal of Accounting and Economics, 45(2-3), 221-247.
  • LI, F. 2010. “The information content of forward-looking statements in corporate filings: A naïve bayesian machine learning approach”. Journal of Accounting Research, 48(5), 1049-1102.
  • LI, Y., ZHANG, L. 2015. “Short selling pressure, stock price behavior, and management forecast precision: Evidence from a natural experiment”. Journal of Accounting Research, 53(1), 79-117.
  • LOUGHRAN, T., MCDONALD, B. 2011. “When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks”. Journal of Finance, 66(1), 35-65.
  • LOUGHRAN, T. AND MCDONALD, B. (2014). Measuring readability in financial disclosures. Journal of Finance, 69(4):1643-1671.
  • MANNING, C., RAGHAVAN, P., SCHÜTZE, H. 2010. “Introduction to information retrieval”. Natural Language Engineering, 16(1), 100-103.
  • MARTINC, M., POLLAK, S., ROBNIK-ŠIKONJA, M. 2021. “Supervised and unsupervised neural approaches to text readability”. Computational Linguistics, 47(1), 141-179.
  • MCAULIFFE, J., BLEI, D. 2007. “Supervised topic models”. Advances in neural information processing systems, 20, 1-8.
  • MIKOLOV, T., CHEN, K., CORRADO, G., DEAN, J. 2013a. “Efficient estimation of word representations in vector space”. arXiv preprint arXiv:1301.3781.
  • MIKOLOV, T., SUTSKEVER, I., CHEN, K., CORRADO, G. S., DEAN, J. 2013b. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 1-9.
  • MIKOLOV, T., YIH, W.-T., ZWEIG, G. 2013c. “Linguistic regularities in continuous space word representations”. Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies, 746-751.
  • MIMNO, D., WALLACH, H.M., TALLEY, E., LEENDERS, M. 2011. “Optimizing semantic coherence in topic models”. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 262-272.
  • MUSLU, V., RADHAKRISHNAN, S., SUBRAMANYAM, K., LIM, D. 2014. “Forward-looking MD&A disclosures and the information environment”. Management Science, 61(5), 931-948.
  • PENNINGTON, J., SOCHER, R., MANNING, C. D. 2014. “Glove: Global vectors for word representation”. Proceedings of the 2014 conference on empirical methods in natural language processing, 1532-1543.
  • PHAM, M. T., RAJIC´, A., GREIG, J. D., SARGEANT, J. M., PAPADOPOULOS, A., MCEWEN, S. A. 2014. “A scoping review of scoping reviews: advancing the approach and enhancing the consistency”. Research Synthesis Methods, 5(4), 371-385.
  • PORTER, M. F. 1980. “An algorithm for suffix stripping”. Program: electronic library and information systems, 14(3), 130-137.
  • RYAN, B., SCAPENS, R., THEOBALD, M. 1992. Research method and methodology in accounting. Academic Press.
  • SCHÜTZE, H., MANNING, C. D., RAGHAVAN, P. 2008. Introduction to information retrieval, Cambridge University Press, Cambridge.
  • SENGUPTA, P. 1998. “Corporate disclosure quality and the cost of debt”. The Accounting Review, 73(4), 459-474.
  • SHILLER, R. J. 2017. “Narrative economics”. American Economic Review, 107(4), 967-1004.
  • VASWANI, A., SHAZEER, N., PARMAR, N., USZKOREIT, J., JONES, L., GOMEZ, A. N., KAISER, Ł., POLOSUKHIN, I. 2017. “Attention is all you need”. Advances in neural information processing systems, 30, 1-11.
  • WELKER, M. 1995. “Disclosure policy, information asymmetry, and liquidity in equity markets”. Contemporary Accounting Research, 11(2), 801-827.
Chiudi [X]

Acquista l'articolo

Inserisci i tuoi dati affinché un funzionario di Giuffrè possa contattarti per perfezionare i termini dell’acquisto

Campi obbligatori*