Datasets
We mainly select the following datasets in the pyKT at presents:
Dataset |
question ID |
skill ID |
answering results |
answering duration |
answer submissio n time |
---|---|---|---|---|---|
Statics20 11 |
✔ |
✔ |
✔ |
||
ASSISTmen ts2009 |
✔ |
✔ |
✔ |
||
ASSISTmen ts2012 |
✔ |
✔ |
✔ |
✔ |
✔ |
ASSISTmen ts2015 |
✔ |
✔ |
|||
ASSISTmen ts2017 |
✔ |
✔ |
✔ |
✔ |
✔ |
Algebra20 05 |
✔ |
✔ |
✔ |
✔ |
|
Bridge200 6 |
✔ |
✔ |
✔ |
✔ |
|
Ednet |
✔ |
✔ |
✔ |
✔ |
✔ |
NIPS34 |
✔ |
✔ |
✔ |
✔ |
|
POJ |
✔ |
✔ |
✔ |
Statics2011
This dataset is collected from an engineering statics course taught at the Carnegie Mellon University during Fall 2011. In this dataset, a unique question is constructed by concatenating the problem name and step name and the dataset has 194,947 interactions, 333 students, 1,224 questions.
ASSISTments2009
This dataset is made up of math exercises, collected from the free online tutoring ASSISTments platform in the school year 2009-2010. The dataset consists of 346,860 interactions, 4,217 students, and 26,688 questions and is widely used and has been the standard benchmark for KT methods over the last decade.
ASSISTments2012
This is the ASSISTments data for the school year 2012~2013 with affect predictions. The dataset consists of 2,541,201 interactions, 27,066 students, and 45,716 questions.
https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect
ASSISTments2015
Similar to ASSISTments2009, this dataset is collected from the ASSISTments platform in the year of 2015. It includes 708,631 interactions on 100 distinct KCs from 19,917 students. This dataset has the largest number of students among the other ASSISTments datasets.
https://sites.google.com/site/assistmentsdata/datasets/2015-assistments-skill-builder-data
ASSISTments2017
This dataset is from the 2017 data mining competition. It consists of 942,816 interactions, 686 students, and 102 questions.
https://sites.google.com/view/assistmentsdatamining/dataset?authuser=0
Algebra2005
This dataset is from the KDD Cup 2010 EDM Challenge that contains 13-14 year old students’ responses to Algebra questions. It contains detailed step-level student responses. The unique question construction is similar to the process used in Statics2011, which ends up with 809,694 interactions, 574 students, 210,710 questions and 112 KCs.
Bridge2006
This dataset is also from the KDD Cup 2010 EDM Challenge and the unique question construction is similar to the process used in Statics2011. There are 3,679,199 interactions, 1,146 students, 207,856 questions and 493 KCs in the dataset.
Ednet
The large-scale hierarchical student activity data set collected by Santa (an artificial intelligence guidance system) contains 131317236 interactive information of 784309 students, which is the largest public interactive education system data set released so far.
NIPS34
This dataset is from the Tasks 3 & 4 at the NeurIPS 2020 Education Challenge. It contains students’ answers to multiple-choice diagnostic math questions and is collected from the Eedi platform. For each question, we choose to use the leaf nodes from the subject tree as its KCs, which ends up with 1,382,727 interactions, 948 questions, and 57 KCs.
POJ
This dataset consists of programming exercises and is collected from Peking coding practice online platform. The dataset is originally scraped by Pandey and Srivastava. In total, it has 996,240 interactions, 22,916 students, and 2,750 questions.
https://drive.google.com/drive/folders/1LRljqWfODwTYRMPw6wEJ_mMt1KZ4xBDk