Analyzing Indian Linguistic Corpora for Financial Literacy Data

My research question explores "How does the availability of financial literacy materials in local languages affect financial literacy outcomes in India?

Jun 07, 2025

Ihita Ghosh

Student, Barnard College

Liked by Evangeline Eastman and 3 others

Over the past two weeks, I have been conducting quantitative analysis on linguistic datasets of the 22 recognized languages in India. Using AI, I was able to extract the percentage of financial literacy terms present in the corpus. Initially, I was going to start directly comparing these percentages to the financial literacy rates of individuals in those language groups. However, I noticed certain distinct trends related to the scripts, zones, and areas where the languages are primarily used. As a result, I decided to visualize this information to get a better understanding.

The financial literacy terms present in recognized Indian language corpora

Comparing languages written in different scripts

Zones Compare — Comparing languages spoken in different geographical regions

Comparing languages spoken in different community types

Visualizing the data allowed me to pinpoint variables that would be helpful in exploring once I reach the qualitative aspect of my project.

In the coming weeks, I look forward to producing a linear regression in relation to financial literacy outcomes, learning about the cultural and socioeconomic factors that affect financial literacy understanding, and exploring the implications of this study. Taking into account some of the limitations of my study, such as the accuracy of AI, I believe this methodology will still be able to open doors in computational linguistics research. I am excited to see where my analysis takes me!

Ihita Ghosh

Student, Barnard College

Please sign in

If you are a registered user on Laidlaw Scholars Network, please sign in

Linguistic Marginalization As a Weapon of Economic Oppression for Scheduled Tribes in India

Linguistic Marginalization as a Weapon of Economic Oppression: An NLP Analysis of Financial Literacy Outcomes in India

How does the eradication of languages lead to the perpetuation of poverty cycles?

Cookies

We and selected partners, use cookies or similar technologies as specified in the cookie policy and privacy policy.

You can consent to the use of such technologies by closing this notice.

Analyzing Indian Linguistic Corpora for Financial Literacy Data

Share this post

Share with...