|
ShodhKosh: Journal of Visual and Performing ArtsISSN (Online): 2582-7472
Employability and Visual Self-Presentation: A Study of Skills, Experience, and Digital Portfolios Dr. Vaishali Rahate 1 1 Professor, Datta Meghe Institute of
Management Studies, Nagpur, Maharashtra, India 2 Assistant Professor, Department of
Basic Science, Humanities, Social Science and Management, D Y Patil College of
Engineering, Akurdi Pune, India 3 Assistant Professor, CRC, AAFT University of Media and Arts, Raipur,
Chhattisgarh-492001, India 4 Assistant Professor, Department of Mechanical Engineering,
Vishwakarma Institute of Technology, Pune, Maharashtra, 411037, India 5 Student, Datta Meghe Institute of Management Studies, Nagpur,
Maharashtra, India 6 Assistant Professor, School of Liberal Arts, Noida International
University, Noida, Uttar Pradesh, India
1. INTRODUCTION In the present day knowledge economy, the capacity of a graduate to secure employment has made the capacity of higher education schools and academic programs an important indicator of success. A connection between college qualifications and professional practice is no longer simply present. The shift is now complicated with a web of technical skills and real-life experience combined with social skills. Schools need to adapt as the global job markets are becoming competitive and dynamically shifting. In order to do this, they must incorporate models that enable students to secure job in their instructional models Tamene et al. (2024). Meanwhile, more organised, data-driven methods of estimating and forecasting the usefulness of college graduates are required. It has resulted in the application of the high-order machine learning methods to educational and behavioural data. Previously, determining whether a person was employable was largely a matter of subjective evaluation or global action, and this did not necessarily consider the multidimensional nature of the concept of job readiness Maaliw et al. (2022). On the other hand, it is now possible to gather deep, systematized information on the academic life of a student, the experience based learning, and the personal skills of the student through the use of digital learning platforms, job tracking and behavioural estimation tools. The data contained in resumes that most people would regard as the most critical document to secure a job comes in various forms, ranging between school qualifications, project involvement, and references on skills Shahriyar et al. (2022). Application of natural language processing (NLP) to this type of data may convert it into vectors, able to be analyzed using computers and possess a meaning. Besides that, an internship is also a helpful option in demonstrating readiness to work in a career due to its ability to demonstrate how to adapt to new conditions, experience in a specific area, and collaborating with other people Chopra and Saini (2023). Businesses are also increasingly recognizing that such soft skills as communication, teamwork, flexibility, and emotional intelligence have a role in job performance as well as in long-term career success, and as such, are sometimes more valuable than technical skills. The combination of these various types of data is an opportunity as well as an issue to predictive modelling. Much has been done in applying machine learning techniques in particular, ensemble learning techniques, to uncover hierarchical cycles and multifaceted links in various kinds of data. Ensemble models such as Random Forests, Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost) are used to combine the optimistic predictive capabilities of multiple underlying learners. This enhances performance of generalisation and reduces overfitting Shuker and Sadik (2024). These are particularly effective when one is dealing with high-dimensional data with noise and multicollinearity. In terms of determining whether a certain individual is employable or not, ensemble models can be configured to acquire the interaction between cognitive indicators, experience features and psychological characteristics in complex manners. This renders the results more credible and comprehensive. The accuracy of predictions can also be enhanced with the help of the stacking as an advanced ensemble strategy that uses a stacking of predictions made by a number of base models with the assistance of a meta-learner Nordin et al. (2022). Stacking can be used to identify tiny patterns in data by combining the finest attributes of other models that are complimentary to each other. This is also critical in the case where the feature area contains both structured and unstructured information, such as numbers of scores and text of job details. Additionally, it is easier to perform the analysis of the value of features with the help of ensemble approaches that can assist stakeholders in locating the major factors that influence outcomes referring to hiring. These lessons would be useful in designing courses, career advice programs and policies that will assist the students to acquire job-related skills that the employers desire Celine et al. (2020). Even though predictive analytics for educational results are becoming more popular, not many studies have looked at employment through a view that includes academic, social, and behavioural factors Assegie et al. (2024). This gap shows how important it is to have complete models that not only predict job opportunities but also help institutional partners understand them. A more complete picture of a graduate's employability can be gained by using ensemble learning on a single dataset that includes resume traits, job records, and soft skill tests. In this situation, the suggested study uses a group-based machine learning system to combine these different types of information and accurately guess how employable college graduates will be. The method focusses on both accurate predictions and easy understanding, meeting the needs of both practical use and academic understanding. The system is ready to make a big difference in the fields of education data mining and labour analytics thanks to its thorough model training, cross-validation, and feature analysis. In the end, the study wants to help students, teachers, and lawmakers make decisions based on data in order to create educational environments that focus on employment. 2. Related work Predicting how easily graduates will be able to find work has been a major topic of study in both educational data mining and human resource analytics. There is already writing that looks at different aspects of this problem, but there are still some methodological and cultural problems that need to be fixed. Sharma et al. used school and work experience records to look at how well people were able to get jobs. Using models like Decision Trees and SVM, their study proved that academic success and the image of the school have a modest effect on an individual's ability to get a job Philippine Statistic Authority (2021). But they only looked at organised academic data and didn't look at soft skills or traits that are based on experience, which made them less useful in the real world. Kumar and Bansal used TF-IDF and Word2Vec embeddings to introduce natural language processing (NLP) for resume screening Albina and Sumagaysay (2020) . They showed that resume grammar greatly improves the match between a candidate and a job. The study used raw text data in a new way, but it didn't include group methods or interpretability models, which are important for both accuracy and trustworthiness Monteiro et al. (2020). Patel et al. looked at factors related to gender, CGPA, and speaking skills, as well as social and academic factors. Using simple models like Naïve Bayes and Random Forest, they found that emotional and academic traits played a big part. But the model didn't include data on internships and resume meanings, which are becoming more and more important in current hiring situations. Desai et al. used unstructured methods like K-Means and PCA to look into skill-based grouping Aviso et al. (2020). They showed that skill groups could help make sure that training programs were in line with what jobs needed. But the study didn't link these groups to real results in terms of hiring, so from a predictive analytics point of view, it wasn't very useful. Arora and Gupta used deep learning models like CNN and LSTM and discovered that they worked better than standard methods in large datasets. Still, because they were black boxes, they were hard to understand, and there was no way to measure soft skills. Table 1
In general, earlier research has mostly looked at single aspects, like ordered school records or unorganised resume texts, without putting together a complete dataset. Findings can't be used by everyone because there are holes in the models that can explain them and behavioural or experiential data wasn't included. Because of these problems, this study uses a group-based, multi-source approach that includes jobs, resume embeddings, and measurable soft skills, while still keeping the study's interpretability through SHAP values. 3. System Architecture 3.1. Data Acquisition and Preprocessing The first step is to get the "Job Placement
Dataset" from Kaggle and prepare it by cleaning it up. This dataset
includes information about demographics, education, and employment status. There are both categories and number factors
in the raw information, so an organised preparation workflow is needed. With one-hot encoding, categorical factors
like "gender," "specialisation," and "work experience"
are stored. When you use one-hot
encoding on a category variable
This makes sure that all numbers are in the range [0, 1], which helps gradient-based models agree. Figure 1
Figure 1 Process architecture for Predicting Graduate Employability To identify and treat outliers, interquartile range (IQR) analysis is applied, with outliers defined as:
where
A multivariate function
3.2. Resume Data Feature Extraction Using NLP The process of getting meaningful information from resumes is made possible by turning structured academic data into text entries that look like resumes. Combining things like a college degree, a speciality, licenses, and work experience into story forms that are important to the topic is part of this process. Then, Term Frequency-Inverse Document Frequency (TF-IDF), a common way to describe text in natural language processing (NLP), is used to vectorise each fake resume. Given a set of documents
The term frequency
of The high-dimensional TF-IDF matrix
Dimensionality reduction is achieved by selecting the top (k) singular values, producing a compact semantic feature space:
To describe temporal relevance, we use partial derivatives to get a rough idea of how the meaning of a word changes over versions or time-based entries:
These vectorised features are joined with other factors to make an important input to the forecasting model as a whole. 3.3. Quantification of Internship Experience Internship training is a great way to get real-world experience and learn about a company, and being able to measure it is a key part of improving how well you can predict your hiring. Because the original dataset didn't have enough specifics about the job, fake number traits are made to show the most important factors: the length of time in months (D), the domain alignment score (A) in the range of 0 to 1, and the reputation of the organisation (P) in the range of 1 to 5. Each graduate’s internship profile is encoded as a composite internship score (I) defined by a weighted sum:
where A first-order derivative is used to describe the rate of professional growth during the job. Let E(t) stand for the total amount of experience gained over time t. Then:
This version is pretty close to the rate at which people learn skills that can get them jobs. The general skill efficiency index (S) is also calculated by taking the definite integral of the learning rate over the course of the internship:
After that, the scalar internship score (I) and derived skill index (S) are normalised and added to the main feature matrix. These steps make sure that learning through experience is stored with both temporal and qualitative meaning. This makes the models easier to understand and more accurate predictions. 3.4. Soft Skill Score Integration Soft skills, like conversation, teamwork, leadership, flexibility, and emotional intelligence, are very important for getting a job. Because these qualities can't be seen directly in the raw information, statistical modelling is used to add artificial psychological scores. The rate of skill development is shown by the derivative of this function:
Then, this dynamic function is integrated over a certain time period to get an overall soft skill index:
After normalisation, the soft skill measures are joined with academic, practical, and resume-based vectors. This makes sure that interpersonal and behavioural skills are included in the final feature space. 3.5. Feature Fusion and Dimensionality Reduction After academic, practical, syntactic, and soft skill-based
features are extracted, different types of information are combined to make a
single, high-dimensional picture. Each
individual is thus described by a feature vector
The final matrix,
The principal components are obtained by solving:
where Figure 2
Figure 2 Block diagram of Feature Fusion and Dimensionality Reduction Selecting the top (k) eigenvectors that explain
where
3.6. Model Training Using Ensemble Techniques By combining the skills of many learners, ensemble learning makes it much easier to make accurate predictions about how well someone will do in finding a job. Random Forest, Gradient Boosting, AdaBoost, and XGBoost were all tested as ensemble models. The Extreme Gradient Boosting (XGBoost) method was found to be the most useful because it can generalise better and work on larger datasets. XGBoost builds an additive model by putting together K decision trees:
the predicted score
for sample i is given by
where (l) is the
logistic loss function for binary classification and
where 3.7. Evaluation and Comparative Analysis The comparison table shows that XGBoost does much better than the other ensemble models in all of the metrics that were looked at. It has a score of 91.25 as compared to Gradient Boosting (89.37%), Random Forest (88.21%), and AdaBoost (86.14%). XGBoost is very accurate with a precision of 90.38 and recall of 89.77 indicating that it is effective in identifying graduates who are capable of employment as well as minimizing the false hits. It can be seen that this harmony is demonstrated by the F1-score of 90.07, which also demonstrates the efficiency of the model in terms of dealing with the misalignment of classes. XGBoost scores 93.24% on the AUC-ROC, indicating that it is more useful in distinguishing between those who can be and those who cannot be hired in various cutoff sets. Gradient Boosting, however, has slightly lower AUC-ROC of 91.15, and therefore it is the nearest competitor. Table 2
XGBoost is always superior to other alternatives, and its high accuracy in recognizing the multifaceted factors influencing the capacity of a graduate to secure employment is provided by a set of features (including resume embeddings, soft skill measures, and job quantifiers). Thus, not only statistics but real life experience indicates that the model is quite effective in making predictions. Figure 3
Figure 3 Comparison of Accuracy with Various Model Figure 3 explains the precision of each model in leading to graduate employability. XGBoost has the greatest accuracy of 91.25 and Gradient Boost has 89.37. Random Forest too works well with AdaBoost having the lowest accuracy of 86.14 percent. What is evident in the visual difference between the height of bars is that XGBoost is indeed superior and worthy in the context of managing more complicated feature interactions in the academic, experience, and behavioral spheres. Figure 4
Figure 4 Comparison of Precision of Various Model The Figure 4 is used to test the strength of each model in identifying all the grads that can secure jobs. XGBoost has the best number of 89.77% which indicates that it is quite good at detecting true positives. The nearest one is Gradient Boost. AdaBoost and random forest possess a little less recall. Figure 5
Figure 5
Comparison of Recall
with Various Model The number (5) is used to test the success of each model in identifying all the grads that can be hired. XGBoost has the best number of 89.77% which indicates that it is quite good at detecting true positives. The nearest one is Gradient Boost. AdaBoost and random forest possess a little less recall. The differences are more visible with the bright magma bars and it is evident that the models with boosting processes are more sensitive as compared to models with bagging processes. Fiugre 6
Figure 6 Comparison of F1-Score with Various Model Being a harmonic combination of precision
and memory, the F1 score is represented in the Figure 6. XGBoost wins again with a score of 90.07, which indicates that it has
the most appropriate balance between accuracy and memory. The gradual
improvement of AdaBoost by XGBoost indicates that boosting algorithm is
becoming more proficient at their tasks. The distinct marks and standard curves
enable one to identify gaps on performance and trends within group models
easily. Figure 7
Figure 7 Comparison of AUC-ROC with Various Model The Figure 7 indicates the ability of each model to distinguish between the capabilities of grads to secure jobs at the various levels and those unable to secure employment. XGBoost has the highest number of 93.24, indicating that it has a higher discriminative power. AdaBoost and Random Forest are slightly lagging behind and Gradient Boost is closely trailing it. The bright green smooth changes with smooth shifts are evidence of the high general classification capabilities of XGBoost, which captures the gradient of the quality predicted. 4. Conclusion This paper demonstrated that ensemble learning can effectively forecast the capacity of a graduate to secure employment once various forms of data are incorporated including academic achievements, employment history, soft skills and resume connotations. Among all the ensemble models, which were considered, Extreme Gradient Boosting (XGBoost) performed best in all the significant evaluation metrics including accuracy, precision, recall, F1-score, and AUC-ROC. This performance stability indicates the performance of gradient-boosted systems in relation to various types of data which are only partially organised. A lot of feature engineering and dimensionality reduction was done to convert the data into a format that could be more effectively recognized as a pattern. It was of great importance to add TF-IDF-based resume embeddings and measured soft skill measures since these were the most accurate ones. The traditional academic factors, however, are not as important as the other factors are observed to be. It is an indication that behaviour and practicality are becoming more relevant in employment determination. What is most useful about the model is that it can be explained and understood, in particular via SHAP analysis and scores on gain-based features enable us to understand things in a better way, as well as provide information we can apply. Through these lessons, schools may make more appropriate decisions when it comes to the way they can enhance their courses, conduct skill-building programs as well as provide career counselling services to enable graduates secure better jobs. The results and the methods presented there indicate that machine learning can serve not just a single purpose, such as forecasting the future of the labour force and interpreting the information regarding schools. Researchers can consider adding in the future real-time trends in the labour market, or the long-term success of students to increase the predictive capacity of the future and adapt to the altering job markets.
CONFLICT OF INTERESTS None. ACKNOWLEDGMENTS None. REFERENCES Albina, A., and Sumagaysay, L. (2020). Employability Tracer Study of Information Technology Education Graduates from a State University in the Philippines. Social Sciences and Humanities Open, 100055, 1–6. https://doi.org/10.1016/j.ssaho.2020.100055 Assegie, T. A., Salau, A. O., Chhabra, G., Kaushik, K., and Braide, S. L. (2024). Evaluation of Random Forest and Support Vector Machine Models in Educational Data Mining. 2024 2nd International Conference on Advancement in Computation and Computer Technologies (InCACCT), 131–135. https://doi.org/10.1109/InCACCT61598.2024.10551110 Aviso, K., Janairo, J., Lucas, R., Promentilla, M., Yu, D., and Tan, R. (2020). Predicting Higher Education Outcomes with Hyperbox Machine Learning: What Factors Influence Graduate Employability? Chemical Engineering Transactions, 81, 679–684. Celine, S., Dominic, M. M., and Devi, M. S. (2020). Logistic Regression for Employability Prediction. International Journal of Innovative Technology and Exploring Engineering, 9(3), 2471–2478. https://doi.org/10.35940/ijitee.C8170.019320 Chopra, A., and Saini, M. L. (2023). Comparison Study of Different Neural Network Models for Assessing Employability Skills of IT Graduates. 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), Theni, India, 189–194. https://doi.org/10.1109/ICSCNA58489.2023.10368605 Maaliw, R. R., Quing, K. A. C., Lagman, A. C., Ugalde, B. H., Ballera, M. A., and Ligayo, M. A. D. (2022). Employability Prediction of Engineering Graduates Using Ensemble Classification Modeling. 2022 Ieee 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 288–294. https://doi.org/10.1109/CCWC54503.2022.9720783 Monteiro, S., Almeida, L., Gomes, C., and Sinval, J. (2020). Employability Profiles of Higher Education Graduates: A Person-Oriented Approach. Studies in Higher Education, 1–14. https://doi.org/10.1080/03075079.2020.1761785 Nordin, N. I., Sobri, N. M., Ismail, N. A., Mahmud, M., and Alias, N. A. (2022). Modelling Graduate Unemployment from Students’ Perspectives. Journal of Media and Communication Studies, 8(2), 68–78. https://doi.org/10.24191/jmcs.v8i2.6986 Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., and Lozano, J. (2021). Machine Learning and Knowledge Discovery in Databases. In Research Track 12975. Springer Nature. https://doi.org/10.1007/978-3-030-86523-8 Philippine Statistic Authority. (2021). Unemployment Rate in September 2021 is Estimated at 8.9 Percent. Shahriyar, J., Ahmad, J. B., Zakaria, N. H., and Su, G. E. (2022). Enhancing Prediction of Employability of Students: Automated Machine Learning Approach. 2022 2nd International Conference on Intelligent Cybernetics Technology and Applications (ICICyTA), Bandung, Indonesia, 87–92. https://doi.org/10.1109/ICICyTA57421.2022.10038231 Shuker, F. M., and Sadik, H. H. (2024). A Critical Review on Rural Youth Unemployment in Ethiopia. International Journal of Adolescence and Youth, 29(1), 1–17. https://doi.org/10.1080/02673843.2024.2322564 Tamene, E. H., Salau, A. O., Vats, S., Kaushik, K., Molla, T. L., and Tin, T. T. (2024). Predictive Analysis of Graduate Students' Employability Using Machine Learning Techniques. 2024 International Conference on Artificial Intelligence and Emerging Technology (Global AI Summit), Greater Noida, India, 557–562. https://doi.org/10.1109/GlobalAISummit62156.2024.10947923
© ShodhKosh 2025. All Rights Reserved. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||