Big data in Education: Literacy

Jamiil Touré Ali

September 25th, 2020

Your education attainment does not tell whether or not you are literate. Reading, listening, speaking, and writing is not sufficient either to define that one is literate. Yet data are being collected to gauge the literacy rate in the world. According to the UNESCO Institute for Statistics (UIS) literacy could be defined as: “Can [Name] read and write a simple sentence in [Language(s)]” which quote is also part of the questionnaire used for data collection to assess literacy. Being literate or illiterate definitely does not mean belonging to a specific community of language-speakers or being able to speak a particular language. What does it mean then to be literate? And how could big data in education contribute to literacy?

Up to date literacy is still one of the most pressing problems on our planet. Recent statistics evaluate literacy rates among youth (15-24) and adults, as positive (UNICEF, 2019). However, the number of illiterate is still alarming with millions of children and youth still out of school and more than half of those in school not meeting minimum proficiency standards in reading and numeracy at the end of 2019 (UN, 2020). And yet, we are living in a digital era of big data.

Big data in Education

Due to the new technologies’ advances, education has been revolutionized tremendously. The last decade has witnessed the usage of big data in education. In particular, big data storage technologies and analytics tools have helped to collect online data which are capable of producing the following insights on the education system for the benefits of both main actors (teachers and students): 

  • Improved instructions: Learning is not a one size fits all approach and human beings learn differently. This big data information is something that teachers can now use to improve the learning experience of their students and have them going through very tailored content to work on their weaknesses. For instance at the University of Central Florida, instructors had used the adaptive learning platform Realizeit to help students fill in the gap of their math knowledge and identify a solid learning path in order to boost the course-pass rate (builtin, 2020). 

  • Matching students to programs: With the boom of new technologies we now have enormous data about different university curriculum systems. Such information is now used by some universities to help match students with its featured program upon filling out the necessary information. For example, Naviance is a platform that helps students explore and understand the possibilities for furthering their education while matching them to colleges and majors that fit them best. 

  • Matching students to employment: Like with matching students to study programs, big data analytics solutions help map students’ suitability for employment. The data-driven application 12Tweenty has helped for instance advisors and administrators at UCLA Anderson school of management improve their job survey student response rate to over 99% (builtin, 2020)

  • Transparent education financing: If financing one’s education was a problem, nowadays with the data acquired from various university curriculums it’s no big deal that you could have at no cost the amount range to finance your education using for instance an online calculator

  • Efficient system administration: An education system includes human resources (teachers and students) and study resources (books, library, furniture, etc.). Panorama education is an example of a big data company that provides an efficient system administration in a variety of school districts in the US (builtin, 2020).

Big data in education means more control on different aspects of education which affects students, teachers, the administration, etc. in order to achieve better results or statistics and most of all better educated students. It also means getting education from e-learning platforms (self-teaching) and gamification. And as people get education in their own ways, new technologies redesigning our way of learning and teaching, it is clear that education in the future hinges a lot on new technology. However, an educated person is not necessarily a literate person. Yet literacy has more to take from education. Literacy is intimately related to languages, and the advancement of big data in education may help envision a solution for illiteracy around the world. Since 1967, on the 8th of September, UNESCO reminds us of this fight through the celebration of international literacy day.

Literacy in the technological era 

While last year’s International Literacy Day (ILD) celebrations focused on Literacy and Multilingualism highlighting the need to preserve the global world in which we are all living, it’s becoming clearer for us that no dialect should be left behind. UNESCO 2005 report on the importance of mother tongue-based schooling for educational quality clearly states that the common belief that bilingualism causes confusion and that the first language must be pushed aside so that the second language can be learned is a myth. Our challenge then becomes to unlock the potential of our education regardless of the language or dialect and reduce the illiteracy rate. This is probably among some of the drives for the ILD 2019 theme on the global inclusion of languages. And the attainment of such goals falls within the scope of research. 

Natural Language Processing (NLP) for big data in education could help research reduce the burden on the literacy aspect of our education. However, NLP’s current state of the art is really biased toward high resource languages such as English, Spanish, Chinese, Arabic, French, among others. This inequality which is due to the digital language divide, causes the impossibility of digitization in some indigenous languages hence the need to resort to a second language in some countries. As a result, from the approximately seven thousand languages existing on the globe, about thousands are dying (SIL, 2020).  NLP can help us for language development but first, it needs the language to be digitized (UN, 2020). Translation commons through language digitization initiative provides a guide to bring your language online.

For ILD 2020 the theme is Literacy teaching and learning in the COVID-19 crisis and beyond, and it aims at highlighting literacy learning from a lifelong learning perspective. In line with the fact that big data in education for the benefit of literacy would need to call on the digitization of indigenous language in order to enable researchers and teachers to provide digital content in those languages, promote language inclusion in the world and definitely look for the fund in financing access to the internet for the project of literacy teaching and learning in a lifelong perspective. 


Big data usage in education is becoming more and more pervasive. And this has the good benefits of leveraging some of the statistics in education yet problems like literacy still subsist. For big data in education to be a worldwide solution, we might need to take into account several conditions such as, for instance, accessibility to the internet, knowledge on the usage of technological devices, and acknowledge that literacy is not about learning a specific language. And as we recognize these few conditions among many others, the International Literacy Day (ILD) 2019 & 2020 might be a good framework in order to keep combating illiteracy through multilingualism and literacy teaching and learning during the COVID-19 crisis, leaving no one behind.


About the author

Jamiil Toure

Design Engineer in Electricity and Master Graduate in Mathematical Sciences with good knwoledge of programming language such as python and R for the use of data science, machine learning and Deep learning tasks/projects.