DEVELOPMENT OF A MACHINE LEARNING ALGORITHM TO PREDICT AUTHOR’S AGE FROM TEXT

Authors

  • Asogwa D.C Department of Computer Science, Faculty of Physical Sciences, Nnamdi Azikiwe University Awka, Anambra State, Nigeria
  • Anigbogu S.O Department of Computer Science, Faculty of Physical Sciences, Nnamdi Azikiwe University Awka, Anambra State, Nigeria
  • Anigbogu G.N Nwafor Orizu College of Eduation Nsugbe, Nigeria
  • Efozia F.N Prototype Engineering Development Institute (PEDI), Ilesha, Osun State, Nigeria

DOI:

https://doi.org/10.29121/granthaalayah.v7.i10.2019.408

Keywords:

Author Profiling, Machine Learning, Binary Classification, Natural Language Processing

Abstract [English]

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.

Downloads

Download data is not yet available.

References

Al Zuabi Ibrahim Mousa, Assef Jafar and Kadan Aljoumaa, 2019, “Predicting customer’s gender and age depending on mobile phone data”. Journal of big data, DOI: https://doi.org/10.1186/s40537-019-0180-9

https://doi.org/10.1186/s40537 019 0180 9

Charl van Heerden, Etienne Barnard, Marelie Davel, Christiaan van der Walt, Ewald van Dyk, Michael Feld, and Christian Muller. 2010. Combining re-gression and classification methods for improving au-tomatic speaker age recognition. InProc. of ICASSP. DOI: https://doi.org/10.1109/ICASSP.2010.5495006

Clauda Peersman, Walter Daelemans & Leona Van Vaerenbergh, 2010 “Predicting Age and Gender in Online Social Networks” Conference’10, Month 1–2, 2010, City, State, Country. Copyright 2010 ACM 1-58113-000-0/00/0010.

Rao Delip, David Yarowsky, Abhishek Shreevats, and Manaswi Gupta. 2010.“Classifying Latent User Attributes in Twitter”. In: Proceedings of the 2Nd International Workshop on Search and Min-ing User-generated Contents. SMUC ’10. Toronto, ON, Canada: ACM,2010, pp. 37– 44. url: http://doi.acm.org/10.1145/1871985.1871993 (cit. on p. 4). DOI: https://doi.org/10.1145/1871985.1871993

Dong Nyuyen, Noah A., Smith Carolyn P., & Rose, 2011, Author age prediction from text using linear regression, Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 115–123,Portland, OR, USA, 24 June 2011.c©2011 Association for Computational

Elias Lundeqvist & Maria Svensson, 2017, “Author profiling: A machine learning approach towards detecting gender, age and native language of users in social media” M Sc thesis, Department of information technology, Uppsala, http://www.teknat.uu.se/student, UPTEC IT 17013

Federica Barbieri. 2008. Patterns of age-based linguistic variation in American English. Journal of Sociolin-guistics, 12(1):58–88. DOI: https://doi.org/10.1111/j.1467-9841.2008.00353.x

Goswami, Sudeshna Sarkar, and Mayur Rustagi.2009. Stylometric analysis of bloggers’ age and gen-der. InProc. of ICWSM.

Herring, S. C. 2001. Computer-mediated discourse. In Schiffrin, D., Tannen, D., and Hamilton, H.E. (eds.), The Handbook of Discourse Analysis. Blackwell, Malden, Massachusetts, USA, 612 -634. DOI=10.1111/b.9780631205968.2003.x

Pennebaker James W and Lori D. Stone. 2003. Wordsof wisdom: Language use over the lifespan. Journalof Personality and Social Psychology, 85:291–301. DOI: https://doi.org/10.1037/0022-3514.85.2.291

Pennebaker James W., Roger J. Booth, and Martha E. Francis, 2001.Linguistic Inquiry and Word Count (LIWC): A Computerized Text Analysis Program.

Nikesh Garera and David Yarowsky. 2009. Modeling la-tent biographic attributes in conversational genres. InProc. of ACL-IJCNLP. Sumit DOI: https://doi.org/10.3115/1690219.1690245

Morgan-Lopez AA, Kim AE, Chew RF, Ruddle P (2017) Predicting age groups of Twitter users based on language and metadata features. PLoS ONE 12(8): e0183537.

https://doi.org/10.1371/journal.pone.0183537 DOI: https://doi.org/10.1371/journal.pone.0183537

Werner Spiegl, Georg Stemmer, Eva Lasarcyk, Varada Kolhatkar, Andrew Cassidy, Blaise Potard, StephenShum, Young Chol Song, Puyang Xu, Peter Beyer-lein, James Harnsberger, and Elmar N ̈oth. 2009. Ana-lyzing features for automatic age estimation on cross-sectional data. InProc. of INTERSPEECH.

Downloads

Published

2019-10-31

How to Cite

D.C, A., S.O, A., G.N, A., & F.N, E. (2019). DEVELOPMENT OF A MACHINE LEARNING ALGORITHM TO PREDICT AUTHOR’S AGE FROM TEXT. International Journal of Research -GRANTHAALAYAH, 7(10), 380–389. https://doi.org/10.29121/granthaalayah.v7.i10.2019.408