Get help from the best in academic writing.

Performance Prediction and Analysis using Decision Tree Algorithms

A Literature Review from 2011 to 2014 on Student’s Academic Performance Prediction and Analysis using Decision Tree Algorithms
Abstract— Success of any educational institute depends upon the success of the students of institute. Student’s performance prediction and its analysis are essential for improvement in various attributes of students like final grades, attendance etc. This prediction helps teachers in identification of weak students and to improve their scores. Various data mining techniques like classification, clustering, are used to perform analysis. In this paper implementation of various decision tree algorithms ID3, J48/C4.5, random tree, Multilayer Perception, Rule Based and random forest have been studied for student’s performance prediction and analysis. The WEKA tool is used to perform evaluation. To evaluate the performance percentage split method or cross validation method is used. Main objective behind this analysis is to improve student’s performance. This review paper explores the use of various decision tree algorithms for student’s academic performance prediction and its analysis.
Keywords— EDM, Decision tree, J48, random tree, ID3, Multilayer Perception, CART, IBI.
I. Introduction A. Data Mining and Educational Data Mining(EDM) Data mining is a process of taking out useful information and patterns from large amount of data. Data Mining is used for solving problems by analyzing data that is present in the databases. [1]
Educational Data Mining (EDM) is a process which is concerned with developing various techniques or methods for extracting the different types of data that come from educational settings, and use of those methods for better understanding of students. Main uses of EDM include student performance prediction and studying students learning to suggest improvements in current educational practice. [2]
B. Student Performance Prediction and Analysis In student performance prediction, we predict the unknown value of a variable that defines the student. In educational sector, the mostly predicted values are student’s performance, their marks, knowledge or score. Student’s performance prediction is very popular application of DM in education sector. Different techniques and models are applied for prediction and analysis of student’s performance like decision trees, neural networks, rule based systems, Bayesian networks etc. This analysis is helpful for someone in predicting student’s performance i.e. prediction about student’s success in a course and prediction about student’s final grade on the basis of features taken from logged data. [2][3]
This paper is organized as follows: In section II we present work related to student performance prediction and analysis. In section III we present comparative study of survey. Conclusion is presented in section IV. In section V we discuss future scope.
II. RELATED WORK Considering the improvements required in students grades or scores, literature survey has been surveyed based on student performance prediction and analysis using decision tree algorithms.
Brijesh Kumar Baradwaj, Saurabh Pal [5] (2011) have discussed that students performance is examined by internal marks and final results. Data set of 50 students was used in this study which was taken from MCA department of VBS Purvanchal University, Uttar Pradesh. Information like previous semester marks, attendance, and assignment and class test marks from previous database of students. They have used decision tree algorithms for student performance prediction and analysis. This overall study will help faculty members in improving student’s scores for future examinations.
R. R. Kabra, R. S. Bichkar [11] (Dec. 2011) collected data from S.G.R. college of engineering and management, Maharashtra. They collected data from 346 students of engineering first year. Evaluation was performed using J48 algorithm by 10 fold cross validation. The accuracy of J48 algorithm was 60.46%. This model is successful in identifying the students who are likely to fail. So it will be helpful for increasing performance of students.
Surjeet Kumar Yadav, Saurabh Pal [6] (2012) conducted analysis on 90 students of engineering department (session 2010) from VBS Purvanchal University, Uttar Pradesh. ID3, C4.5 and CART decision tree algorithms were used for evaluation. Evaluation was performed using 10 fold cross validation method. It has been found that C4.5 has higher accuracy 67.7778% than ID3 and CART algorithm. Model’s True Positive rate for class Fail is high 0.786 for ID3 and C4.5 which means it will successfully identify the fail students. This study will be helpful for those students that need special attention from teachers.
Manpreet Singh Bhullar, Amritpal Kaur [10] (2012) have taken data set of 1892 students from various colleges for student performance prediction and evaluation. J48 algorithm was chosen for evaluation using 10 fold cross validation. Success rate of J48 algorithm was 77.74%. In this way it will be helpful in identifying weak students so that teachers can help them before failure.
Mrinal Pandey, Vivek Kumar Sharma [4] (Jan. 2013) compared J48, Simple Cart, Reptree and NB tree algorithms for predicting performance of engineering students. They have taken data of 524 students for 10 fold cross validation and 178 students for percentage split method. It has been found that J48 decision tree algorithm achieved higher accuracy 80.15% using 10 fold cross validation method. By using percentage split method higher accuracy 82.58% is achieved by J48 algorithm. From this comparison it has been found that J48 performs best than other algorithms in both the cases. J48 decision tree algorithm will be useful for teachers in improving performance of weak students.
Anuja Priyam, Abhijeet, Rahul Gupta, Anju Rathee, and Saurabh Srivastava [12] (June 2013) compared ID3, C4.5 and CART decision tree algorithms on the basis of students data. Evaluation was performed using 10 fold cross validation method. It shows that the CART algorithm has higher accuracy 56.2500%. Model’s True Positive rate for class Fail is high 0.786 for ID3 and C4.5 which means it will successfully identify the fail students. So this model will help teachers in reducing failure rates.
Ramanathan L, Saksham Dhanda, Suresh Kumar D [14] (June-July 2013) performed analysis on 50 students data. They were used nave bayes, J48 and proposed algorithm (Weighted ID3) for evaluation. It shows that WID3 has higher accuracy 93% than J48 and nave bayes. In future you can made user friendly software using WID3 which will be very helpful for teachers.
Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao [7] (September 2013) performed analysis on data set of 182 students using ID3 and C4.5 decision tree algorithms. When they performed bulk evaluation on data set of 173 students both algorithms have same accuracy of 75.145% and when they performed singular evaluation on data set of 9 students then both algorithms have accuracy 77.778%. For 182 students accuracy was approximately 75.257.
Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas [9] ( Jan. 2014) compared J48, Random Forest, Multilayer Perception, IBI and decision tree algorithms using data set of 260 students from various schools. 10 fold cross validation was chosen for evaluation. It has been found that Random Forest has higher accuracy 89.23% and less execution time amongst all other algorithms. This study will be helpful for educational institutions.
Jyoti Namdeo, Naveenkumar Jayakumar [13] (Feb. 2014) collected 51 students data from MCA 2007 batch. Decision tree algorithms used in evaluation were Nave Bayes, Multilayer Perception, J48 and Random Forest. These algorithms were trained on 2007 batch data and tested on 2008 batch data. Evaluation was performed using training, cross validation, percentage split and test on 2008 data. After testing on 2008 data it has been found that nave bayes has higher accuracy 31.57% amongst other algorithms but this accuracy is not according to requirement.
Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad [8] (September 2014) conducted analysis on 399 records of students using nave bayes, rule based and J48 decision tree algorithm. They have used cross validation and percentage split method for evaluation. In cross validation 3, 5, 10 fold cross validation was performed and in percentage split method training: testing 10:90, 20:80, 30:70, 40:60, 50:50, 40:60, 30:70, 20:80, 10:90 percentage split were used. After comparison of 3 classification algorithms it has been found that rule based and J48 decision tree algorithm has higher accuracy 68.8%.
III. COMPARATIVE STUDY OF SURVEY Comparison of survey work based on different parameters
Paper Name
Year of Publication
Size of Data Set
(No. of students)
Algorithms Used
Test Options Used
Algorithm with Higher Accuracy
Accuracy (in %) of Algorithm
Performance Prediction of Engineering Students using Decision Trees
Dec. 2011
346
J48
Cross Validation
J48
60.46%
Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification
2012
90
ID3
C4.5
CART
Cross Validation
C4.5
67.7778%
Use of Data Mining in Education Sector
2012
1892
J48
Cross Validation
J48
77.74%
A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction
Jan. 2013
524
J48
Simple cart
Reptree
NB tree
Cross Validation
J48
80.15%
178
J48
Simple cart
Reptree
NB tree
Percentage Split
J48
82.58%
Comparative Analysis of Decision Tree Classification Algorithms
June 2013
____________
ID3
C4.5
CART
Cross Validation
CART
56.2500%
Predicting Students’ Performance using Modified ID3 Algorithm
June-July 2013
50
Nave bayes
J48
Weighted ID3
____________
Weighted ID3
93%
Predicting Students Performance using ID3 and C4.5 Classification Algorithms
September 2013
173
ID3
C4.5
for bulk evaluation
Cross Validation
ID3
C4.5
75.145%
9
ID3
C4.5
for singular evaluation
Cross Validation
ID3
C4.5
77.778%
An Analysis of students’ performance using classification algorithms
Jan. 2014
260
J48
Random Forest
Multilayer Perception
IBI
Cross Validation
Random Forest
89.23%
Predicting Students Performance Using Data Mining Technique with Rough Set Theory Concepts
Feb. 2014
51
J48
Random Forest
Multilayer Perception
Nave Bayes
Training
Cross Validation
Percentage Split
Test
Nave Bayes
31.57%
First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms
September 2014
399
Nave Bayes
J48
Rule Based
Cross Validation
Percentage Split
J48
68.8%
IV. CONCLUSION Educational data mining’s (EDM) importance is increasing day by day as the student’s performance prediction and analysis requirements are increasing for improvement of student’s academic performance. As given above various authors have implemented different decision tree algorithms: J48, random forest, multilayer perception, nave bayes, rule based, IBI, reptree, NB tree and CART using different data sets. Some authors performed comparison of algorithms to find out the best algorithm from them on the basis of accuracy. The survey done in this paper shows that most probably J48/C4.5 decision tree algorithm is considered best algorithm in terms of accuracy for different data sets. So it is clear from survey that J48 performs well for any size of data set. This is the reason behind wide use of J48 algorithm amongst all decision tree algorithms.
Survey done in the section II will be helpful to various researchers that are working in the field of student’s performance prediction and analysis using decision tree algorithms.
V. FUTURE WORK For growth of any educational institute, student’s academic performance is main contributor. If students perform well academically then institution growth rate goes high. It is necessary in these days to focus on the student’s results so there is a wide scope in this field. To increase student’s performance, student performance prediction and analysis is used. For this purpose decision tree algorithms are used mainly. Various researchers have done lot of research in this field by performing evaluation using single algorithm or by comparing three or four algorithms.
In future researchers can enhance the research by comparing large number of algorithms using large size data sets. So there is a wide scope for researchers in this field.
ACKNOWLEDGMENT First of all I express my sincerest debt of gratitude to the Almighty God who always supports me in my endeavors.
I would like to thank Prof. Neena Madan for their encouragement and support. Then, I would like to thank my family and my friends. I am thankful to all those who helped me in one way or the other at every stage of my work.
REFERENCES Nikita Jain, Vishal Srivastava, “Data mining techniques: A survey paper”, IJRET: International Journal of Research in Engineering and Technology, Volume: 02 Issue: 11, Nov-2013.
Mrs. M.S. Mythili, Dr. A.R.Mohamed Shanavas, “An Analysis of students’ performance using classification algorithms”, IOSR Journal of Computer Engineering, Volume 16, Issue 1, January 2014.
Dr. Mohd Maqsood Ali, “Role of data mining in education sector”, International Journal of Computer Science and Mobile Computing Vol. 2, Issue. 4, April 2013.
Mrinal Pandey, Vivek Kumar Sharma, “A Decision Tree Algorithm Pertaining to the Student Performance Analysis and Prediction”, International Journal of Computer Applications Volume 61, No.13, January 2013.
Brijesh Kumar Baradwaj, Saurabh Pal, “Mining Educational Data to Analyze Students Performance”, International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011.
Surjeet Kumar Yadav, Saurabh Pal, “Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification”, World of Computer Science and Information Technology Journal Vol. 2, No. 2, 2012.
Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao, “Predicting Students Performance using ID3 and C4.5 Classification Algorithms”, International Journal of Data Mining

Skills Of Self Reflection And Communication

What did I learn from the meeting with the Project Mentor, including the presentation I gave to my Project Mentor? Writing the research project was interest part of my life. Being the first research of this nature, I encountered a lot of challenges. I was not really sure how and where to start from writing this project. The first meeting I had with my Project Mentor gave me confidence to write this research project.
There are a lot of things that I have learnt from the Mentor and he guided me throughout the research project preparation.
Firstly, I had to match the skills I posses and the topic I have chosen. By matching, I learnt how to evaluate my knowledge, skill and experience required preparing the research project and therefore developed skills of reflection.
The Mentor provided me with in-depth explanation of the requirements of the topic I have chosen. With this understanding, I was able to come up with questions for the research project that were designed to meet the requirements of the research objectives.
Time management was an important element. I learnt time management skills by setting targets at every stage of my RAP and setting information requirements at every stage of RAP. This was very helpful since I knew exactly what kind of information I was looking for at every stage.
My mentor gave me an overview of what research was all about. I learnt a lot of research techniques from my Mentor. For example, since I was using secondary data, I learnt the need to diversify the sources of information so that I was using reliable information to prepare this research project.
Finally, my Mentor provided me with tips on report writing skills, report structure and prepare presentation in Microsoft Power Point. Presentation skills included preparation, developing presentation style, structuring effective presentation and developing as a presenter.
How has undertaking the RAP helped me in my accountancy studies and/or current employment role? Doing the RAP was very helpful to me. I acquired new skills, new knowledge and made me a thorough revision of what I have already learnt in my ACCA studies. The whole research project was quiet interesting because it gave me an insight on how to apply the knowledge I have acquired in my accountancy studies can be applied in practice. Participating in this program will be very helpful when I will be doing the final papers of ACCA. RAP will help me in the following areas:
Ratios Analysis- accounting techniques.
Doing the RAP was a good revision of the knowledge I acquired in my earlier studies. I also came to thoroughly understand the calculation and interpretation of financial ratios.
A number of books with different explanation were used. I benefitted a lot compared to the one book I used in my earlier ACCA studies. I used following books: BPP, London College of Accountancy manuals for F4 and P4, and Financial Accounting and Reporting by Jamie Elliot and Barry Elliot. Having done this RAP, I am looking forward to studying P3- Business Analysis and P4- Advanced Financial Management.
My skills to calculate and interpret financial ratios have greatly improved which will be of value in real life situation. I really enjoyed the ratios analysis because I was able to move away from just observing the movement in ratios. I have gained the skills to unearth what is causing this movement in the ratios using the financial ratios as snapshots. Doing the RAP has placed me in a better position to read and interpret financial statements to understand what has happened and the future impact of the results.
Business Analysis
RAP has tremendously improved my knowledge of business analysis because it required me understand different business analysis techniques. I therefore read a lot of books including the BPP and Kaplan manuals for ACCA paper P3- Business Analysis and Competitive Strategy by Michael E. Porter (2004) to gain the necessary techniques of business analysis for me to prepare this RAP. I complemented this by visiting different websites which showed me how to look for the relevant information.
I gained detailed knowledge on how to evaluate business using different models such as SWOT (strength, weakness, opportunities and threats), PEST (Political, Economic, Social and Technological factors) and Porter’s Five Forces.
Because of the depth of knowledge that gained by doing the RAP, I have the confidence and zeal to look forward to studying ACCA paper P3- Business analysis.
One of the fascinating elements of doing RAP is the understanding of how analysis is done. I have learnt that first I have to analyse the industry structure in which the company operates in order to determine the competitiveness and attractiveness. Then gain the skills to analyse the strength, weakness, opportunities and threats of the company in the face of these industry forces. I also learnt the imperatives of analysing the financial ratios in conjunction of these factors.
Profession ethics
I have greatly improved my decision making on ethical matters since I have done the RAP. The ACCA Profession Ethics (the requirement before undertaking the RAP) enhanced my decision making on ethical because I had to deal with real life scenarios. This knowledge will be helpful when I will be studying the compulsory ACCA paper P1- Professional Accountants and P7- Advance Audit and Assurance. Following many corporate scandals, the Professional Accountants will make stand out on the job market.
How well I do think I have answered my research questions? I collected data with the aim of answering the research project’s questions. I systematically drafted research objectives that were supposed to be answered. These research questions influenced the type of information to be collected and the research approach.
Having drafted clear and precise research questions, I then collected and evaluated the information that was directed to the research questions. As a result I collected high quality information and saved a lot of time which allowed me finish the project as planed. Doing the ACCA Professional ethics was of tremendous importance because it made me act objectively throughout the RAP. For example, referencing any information used that was not mine.
The combination of the research techniques I learnt from my mentor and the Professional ethics module made me answer the research questions fairly well. It was impossible to thoroughly answer the research questions because of the following limitations:
I was not able to include all the relevant information that I would prefer in the report because I was not able to conduct interviews with key personnel at Pepsico Inc. The interview would have provided me with information that is not contained in the publicly available information.
The subjectivity of some issues made me make some subjective interpretation and judgement especially where I lacked vital information.
The research techniques deployed e.g. ratios, SWOT analysis and the Porter’s five forces have their own limitations as stated in the main report. For example, I could not use these models of analysis to measure the customers’ satisfaction and the Net Promoter Score (NPS) – promoters customers minus detractors customers.
I systematically approached the research project so that I did not lose track of what I wanted to achieve. I gradually did the research in such a way that the preceded stage was the input to the next stage. For example, to analyse Pepsico Inc financial performance, I used the following approach to answer the research questions:
Identification of techniques to be applied- financial ratios, SWOT etc
Industry analysis using Porter’s five forces
Company analysis using SWOT
Calculation of relevant ratios for each of the respective years and analyse in conjunction of SWOT and porter’s five forces
Results’ interpretation of Pepsico Inc
Making a conclusion
Relevant recommendations were made where possible to make improvement with the current situation.
How have I demonstrated my interpersonal and communication skills during the work? Preparing the RAP demanded more than just reading books. The RAP questions required a wide source of information which included face to face meetings with my Mentor. By having a wide source of information I ensured the information was complete and representative. Synthesing the information to prepare project the report that your standard required skills in report writing and effective presentation.
Interpersonal and communication skills that were demonstrated during the RAP were as follows:
Ability to produce the report;
My report is the basis of my findings, analysis and the recommendations. As a result I was able to demonstrate the level of knowledge and skills required to write the report which the RAP meet your requirements.
I was able to demonstrate this knowledge and skills because of reading different books which enhanced these skills. For example, I read books such as; ‘How to write Dissertation and Project’s Report’ by Kathleen McMillan and Jonathan Weyers- August 2007. The RAP helped me demonstrate the ability to communicate the report’s findings by presenting them in a meaningful way.
Project’s presentation to my Mentor;
Being the first time, making a presentation posed a lot of challenges. I had challenges in arranging information for presentation and to confidently stand and make presentation before my mentor. I read a book called ‘Brilliant Presentation’ by Richard Hall which helped with presentation skills.
Computer skills;
I demonstrated my computer skills through the computer programmes I used to present my RAP. The programmes include MS Word, MS Power Point, MS Excel and typing. I improved my computer skills by reading books such as MS Office 2007- All in one desk, by Peter Weverka.

[casanovaaggrev]