December 17, 2019


University and college admission is a complex decision process that goes beyond simply matching test scores and admission requirements. For an aspiring graduate student, choosing which universities to apply to is really a difficult problem. Often, the students wonder if their profile is good enough for a certain university. In this project, this problem has been addressed by modeling a recommender system based on various classification algorithms. The required data was obtained from Based on this data set,various models were trained and one best and some other similar properties carrying universities are suggested for the students such that it maximizes the chances of a student
getting an admit from that university list. Classification algorithms have also been used to predict the acceptance chance of any student on any individual university.
To predict the best University for the particular student his/her GPA score, GRE(Verbal and Quant) Score, TOEFL score has been used as attributes for classification. K nearest neighbor has been used to predict best University and K means clustering has been used to find more similar universities. Support Vector Machine and Random forest has been used to predict the admission chance of particular student on specific University.
Keywords: college admission, aspiring graduate student, recommender system, classification algorithm


1.1 .  Problem Statement

Every year thousands of college graduates apply for the master and PhD programs in US universities from all around the world.  Applying to US universities is not an easy task, it involves many steps and procedures to follow.  Choosing the right universities or colleges is definitely an another hurdle students have to face. Many students apply for the universities in which they have little chance of acceptance. This leads students of poor economic backgrounds to frustration and anxiety as they only lose surplus amount of money just for applying to those universities. This is because overall university application cost is not affordable for students with low economic backgrounds. US universities application cost for top level universities range from $70 to $90. In the same way total cost to send GRE scores to any individual University is $27 and cost of sending TOEFL score to any individual university is $19. These stats show students have to throw away lots of hard works and hard-earned money for nothing if they got rejected in universities they have applied for.

What if there is a system that could guide students and recommend best universities list and predict their admission chance in those universities according to their profile and scores. So, the idea behind ‘University Recommendation and Admission Prediction System’  is the context mentioned above.

1.2 .  Purpose of the Project

University and College research being one part of the university application process is itself an arduous and lengthy task. This issue being a big problem for students have not been solved till now. There are recognized sites which filters the best universities and colleges based on the location, tuition fees, major and degree but none of them have use machine learning algorithm to solve the issue.

Hence, we have done this research project to solve that issue to some extent with the use of data mining techniques.

1.3 .  Significance of the Project

University Application process itself being a tedious task students needs lots of endeavor and determination for completing overall application process. Application packet to US universities and colleges consists of following essential things. It seems students have to work on lots of things when  he/she prepares for application process. It would definitely be easier for students if they get relief from step of selecting best suited universities and colleges for application. This would encourage them to work vigorously on other application components so that their application candidacy will be potent enough to be selected.

List of University Application Components:

  1. Application Form
  2. Application Fee
  3. Attested Transcripts
  4. Financial Documents
  5. GRE Score Report
  6. TOEFL Score Report
  7. Statement of Purpose
  8. Letters of Recommendations
  9. Supporting Materials
    1. Limitations and Delimitations

Results of this project are not applicable to college graduates of each and every major. As there was limitation of information on dataset this system could not predict and recommend universities to students of every major. Nevertheless, the statistical data mining techniques used in this project can be applicable to all majors. If any universities have insufficient data on the major chosen by the student it will return insufficient data for prediction to the user.

  1. Objectives

The main objectives of this project are listed below:

  1. Learning data mining algorithms and implementing them in the real data sets.
    1. Building an efficient university research site for the students who have been planning to apply for master programs in various disciplines.
    1. Recommending best suitable universities to students based on their GRE, GPA and TOEFL scores and also predicting admission probability.

1.6 .  Definitions of Terms

Data Mining. Frawley et al. (1991) defined data mining as the non trivial extraction of implicit, previously unknown, and potentially useful information from data.

Algorithm. A process or set of rules to be followed in calculations or other problem solving operations, especially by a computer.

K-Nearest Neighbor. In pattern recognition, the k-nearest neighborsalgorithm is a nonparametric method used for classification and regression.

Support Vector Machine. In machine learning, support vector machines( SVMs, also support vector networks) are supervised learningmodels with associated learning algorithms that analyze data used for classification and regression analysis.

Random Forest. Random Forest Tree is a Machine Learning Algorithm Based on Decision Trees. Random Trees lies in one of those Class of ML Algorithms which does ‘ensemble’ classification.

K means algorithm. The K-means algorithm involves randomly selecting K initial centroids where K is a user defined number of desired clusters. Each point is then assigned to a closest centroid and the collection of points close to a centroid form a cluster.

1.7 .  Summary

The purpose of this research project is to compare data mining techniques in analysis of the students GPA, GRE , AWA and TOEFL scores to find out best suited university for the student and to predict the acceptance chance of student. This project research will contribute to the meager  research in effectiveness of data mining techniques applied in higher education. It will motivate other researchers to work on data mining techniques and algorithms for solving various issues related to higher education. 

Authors: Er. Kshitiz Shrestha, Er. Pabin Raj Luitel,Er.Sabin Silwal

For More Details