pykt.preprocess package
Submodules
pykt.preprocess.algebra2005_preprocess module
pykt.preprocess.assist2009_preprocess module
pykt.preprocess.assist2012_preprocess module
pykt.preprocess.assist2015_preprocess module
pykt.preprocess.assist2017_preprocess module
pykt.preprocess.bridge2algebra2006_preprocess module
pykt.preprocess.data_proprocess module
pykt.preprocess.ednet_preprocess module
pykt.preprocess.junyi2015_preprocess module
pykt.preprocess.nips_task34_preprocess module
- pykt.preprocess.nips_task34_preprocess.get_user_inters(df)[source]
convert df to user sequences
- Parameters
df (_type_) – the merged df
- Returns
user_inters
- Return type
List
- pykt.preprocess.nips_task34_preprocess.load_nips_data(primary_data_path, meta_data_dir, task_name)[source]
The data downloaded from https://competitions.codalab.org/competitions/25449 The document can be downloaded from https://arxiv.org/abs/2007.12061.
- Parameters
primary_data_path (_type_) – premary data path
meta_data_dir (_type_) – metadata dir
task_name (_type_) – task_1_2 or task_3_4
- Returns
the merge df
- Return type
dataframe
pykt.preprocess.poj_preprocess module
pykt.preprocess.slepemapy_preprocess module
pykt.preprocess.split_datasets module
- pykt.preprocess.split_datasets.generate_question_sequences(df, effective_keys, window=True, min_seq_len=3, maxlen=200, pad_val=-1)[source]
- pykt.preprocess.split_datasets.generate_sequences(df, effective_keys, min_seq_len=3, maxlen=200, pad_val=-1)[source]
- pykt.preprocess.split_datasets.generate_window_sequences(df, effective_keys, maxlen=200, pad_val=-1)[source]
- pykt.preprocess.split_datasets.main(dname, fname, dataset_name, configf, min_seq_len=3, maxlen=200, kfold=5)[source]
split main function
- Parameters
dname (str) – data folder path
fname (str) – the data file used to split, needs 6 columns, format is: (NA indicates the dataset has no corresponding info) uid,seqlen: 50121,4 quetion ids: NA concept ids: 7014,7014,7014,7014 responses: 0,1,1,1 timestamps: NA cost times: NA
dataset_name (str) – dataset name
configf (str) – the dataconfig file path
min_seq_len (int, optional) – the min seqlen, sequences less than this value will be filtered out. Defaults to 3.
maxlen (int, optional) – the max seqlen. Defaults to 200.
kfold (int, optional) – the folds num needs to split. Defaults to 5.
pykt.preprocess.split_datasets_que module
- pykt.preprocess.split_datasets_que.generate_sequences(df, effective_keys, min_seq_len=3, maxlen=200, pad_val=-1)[source]
- pykt.preprocess.split_datasets_que.generate_window_sequences(df, effective_keys, maxlen=200, pad_val=-1)[source]
- pykt.preprocess.split_datasets_que.main(dname, fname, dataset_name, configf, min_seq_len=3, maxlen=200, kfold=5)[source]
split main function
- Parameters
dname (str) – data folder path
fname (str) – the data file used to split, needs 6 columns, format is: (NA indicates the dataset has no corresponding info) uid,seqlen: 50121,4 quetion ids: NA concept ids: 7014,7014,7014,7014 responses: 0,1,1,1 timestamps: NA cost times: NA
dataset_name (str) – dataset name
configf (str) – the dataconfig file path
min_seq_len (int, optional) – the min seqlen, sequences less than this value will be filtered out. Defaults to 3.
maxlen (int, optional) – the max seqlen. Defaults to 200.
kfold (int, optional) – the folds num needs to split. Defaults to 5.
pykt.preprocess.statics2011_preprocess module
pykt.preprocess.utils module
- pykt.preprocess.utils.concept_to_question(df)[source]
Convert df from concept to question :param df: df contains concept :type df: _type_
- Returns
df contains question
- Return type
_type_