pykt.preprocess package

Submodules

pykt.preprocess.algebra2005_preprocess module

pykt.preprocess.algebra2005_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.assist2009_preprocess module

pykt.preprocess.assist2009_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.assist2012_preprocess module

pykt.preprocess.assist2012_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.assist2015_preprocess module

pykt.preprocess.assist2015_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.assist2017_preprocess module

pykt.preprocess.assist2017_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.bridge2algebra2006_preprocess module

pykt.preprocess.bridge2algebra2006_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.data_proprocess module

pykt.preprocess.data_proprocess.process_raw_data(dataset_name, dname2paths)[source]

pykt.preprocess.ednet_preprocess module

pykt.preprocess.ednet_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.junyi2015_preprocess module

pykt.preprocess.junyi2015_preprocess.load_q2c(qname)[source]
pykt.preprocess.junyi2015_preprocess.read_data_from_csv(read_file, write_file, dq2c)[source]

pykt.preprocess.nips_task34_preprocess module

pykt.preprocess.nips_task34_preprocess.get_user_inters(df)[source]

convert df to user sequences

Parameters

df (_type_) – the merged df

Returns

user_inters

Return type

List

pykt.preprocess.nips_task34_preprocess.load_nips_data(primary_data_path, meta_data_dir, task_name)[source]

The data downloaded from https://competitions.codalab.org/competitions/25449 The document can be downloaded from https://arxiv.org/abs/2007.12061.

Parameters
  • primary_data_path (_type_) – premary data path

  • meta_data_dir (_type_) – metadata dir

  • task_name (_type_) – task_1_2 or task_3_4

Returns

the merge df

Return type

dataframe

pykt.preprocess.nips_task34_preprocess.read_data_from_csv(primary_data_path, meta_data_dir, task_name, write_file)[source]

pykt.preprocess.poj_preprocess module

pykt.preprocess.poj_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.slepemapy_preprocess module

pykt.preprocess.slepemapy_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.split_datasets module

pykt.preprocess.split_datasets.KFold_split(df, k=5)[source]
pykt.preprocess.split_datasets.add_qidx(dcur, global_qidx)[source]
pykt.preprocess.split_datasets.calStatistics(df, stares, key)[source]
pykt.preprocess.split_datasets.expand_question(dcur, global_qidx, pad_val=-1)[source]
pykt.preprocess.split_datasets.extend_multi_concepts(df, effective_keys)[source]
pykt.preprocess.split_datasets.generate_question_sequences(df, effective_keys, window=True, min_seq_len=3, maxlen=200, pad_val=-1)[source]
pykt.preprocess.split_datasets.generate_sequences(df, effective_keys, min_seq_len=3, maxlen=200, pad_val=-1)[source]
pykt.preprocess.split_datasets.generate_window_sequences(df, effective_keys, maxlen=200, pad_val=-1)[source]
pykt.preprocess.split_datasets.get_inter_qidx(df)[source]

add global id for each interaction

pykt.preprocess.split_datasets.get_max_concepts(df)[source]
pykt.preprocess.split_datasets.id_mapping(df)[source]
pykt.preprocess.split_datasets.main(dname, fname, dataset_name, configf, min_seq_len=3, maxlen=200, kfold=5)[source]

split main function

Parameters
  • dname (str) – data folder path

  • fname (str) – the data file used to split, needs 6 columns, format is: (NA indicates the dataset has no corresponding info) uid,seqlen: 50121,4 quetion ids: NA concept ids: 7014,7014,7014,7014 responses: 0,1,1,1 timestamps: NA cost times: NA

  • dataset_name (str) – dataset name

  • configf (str) – the dataconfig file path

  • min_seq_len (int, optional) – the min seqlen, sequences less than this value will be filtered out. Defaults to 3.

  • maxlen (int, optional) – the max seqlen. Defaults to 200.

  • kfold (int, optional) – the folds num needs to split. Defaults to 5.

pykt.preprocess.split_datasets.read_data(fname, min_seq_len=3, response_set=[0, 1])[source]
pykt.preprocess.split_datasets.save_dcur(row, effective_keys)[source]
pykt.preprocess.split_datasets.save_id2idx(dkeyid2idx, save_path)[source]
pykt.preprocess.split_datasets.train_test_split(df, test_ratio=0.2)[source]
pykt.preprocess.split_datasets.write_config(dataset_name, dkeyid2idx, effective_keys, configf, dpath, k=5, min_seq_len=3, maxlen=200, flag=False, other_config={})[source]

pykt.preprocess.split_datasets_que module

pykt.preprocess.split_datasets_que.generate_sequences(df, effective_keys, min_seq_len=3, maxlen=200, pad_val=-1)[source]
pykt.preprocess.split_datasets_que.generate_window_sequences(df, effective_keys, maxlen=200, pad_val=-1)[source]
pykt.preprocess.split_datasets_que.id_mapping_que(df)[source]
pykt.preprocess.split_datasets_que.main(dname, fname, dataset_name, configf, min_seq_len=3, maxlen=200, kfold=5)[source]

split main function

Parameters
  • dname (str) – data folder path

  • fname (str) – the data file used to split, needs 6 columns, format is: (NA indicates the dataset has no corresponding info) uid,seqlen: 50121,4 quetion ids: NA concept ids: 7014,7014,7014,7014 responses: 0,1,1,1 timestamps: NA cost times: NA

  • dataset_name (str) – dataset name

  • configf (str) – the dataconfig file path

  • min_seq_len (int, optional) – the min seqlen, sequences less than this value will be filtered out. Defaults to 3.

  • maxlen (int, optional) – the max seqlen. Defaults to 200.

  • kfold (int, optional) – the folds num needs to split. Defaults to 5.

pykt.preprocess.split_datasets_que.save_id2idx(dkeyid2idx, save_path)[source]

pykt.preprocess.statics2011_preprocess module

pykt.preprocess.statics2011_preprocess.change2timestamp(t)[source]
pykt.preprocess.statics2011_preprocess.read_data_from_csv(read_file, write_file)[source]

pykt.preprocess.utils module

pykt.preprocess.utils.change2timestamp(t, hasf=True)[source]
pykt.preprocess.utils.concept_to_question(df)[source]

Convert df from concept to question :param df: df contains concept :type df: _type_

Returns

df contains question

Return type

_type_

pykt.preprocess.utils.format_list2str(input_list)[source]
pykt.preprocess.utils.get_df_from_row(row)[source]
pykt.preprocess.utils.one_row_concept_to_question(row)[source]

Convert one row from concept to question

Parameters

row (_type_) – _description_

Returns

_description_

Return type

_type_

pykt.preprocess.utils.replace_text(text)[source]
pykt.preprocess.utils.sta_infos(df, keys, stares, split_str='_')[source]
pykt.preprocess.utils.write_txt(file, data)[source]

Module contents