pykt.utils package

Submodules

pykt.utils.utils module

pykt.utils.utils.debug_print(text, fuc_name='')[source]

Printing text with function name.

Parameters
  • text (str) – the text will print

  • fuc_name (str, optional) – _description_. Defaults to “”.

pykt.utils.utils.get_now_time()[source]

Return the time string, the format is %Y-%m-%d %H:%M:%S

Returns

now time

Return type

str

pykt.utils.utils.set_seed(seed)[source]

Set the global random seed.

Parameters

seed (int) – random seed

pykt.utils.wandb_utils module

class pykt.utils.wandb_utils.WandbUtils(user, project_name, use_cache=False, print_details=True, cache_dir='results/wandb_result')[source]

Bases: object

wandb utils

wandb_api = WandbUtils(user=’tabchen’, project_name=’pykt_iekt_pred’) >self.sweep_dict is {‘mx2tvwfy’: [‘mx2tvwfy’]}

check_sweep_by_model_dataset_name(dataset_name, model_name, emb_type='qid', metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False, stop=False, n_jobs=5)[source]
check_sweep_by_pattern(sweep_pattern, metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False, stop=False, n_jobs=5)[source]

Check sweeps by pattern

Parameters
  • sweep_pattern (str) – check the sweeps which sweep names start with sweep_pattern

  • metric (str, optional) – the metric to check. Defaults to validauc.

  • metric_type (str, optional) – the type of metric max or min. Defaults to max.

  • min_run_num (int, optional) – the min run num to check. Defaults to 200.

  • patience (int, optional) – the patience to stop. Defaults to 50.

  • force_check_df – always check df, defalut is false.

Returns

the list of dict, each dict is {“id”:id,”state”:state,’df’:df,”num_run”:num_run}, state is ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’,df is the df of the sweep, num_run is the num of sweep run, -1 mean the sweep is finished to save time we will not check it again.

Return type

list

check_sweep_early_stop(id, input_type='sweep_name', metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False)[source]

Check sweep early stop

Parameters
  • id (str) – the sweep name or sweep id.

  • input_type (str, optional) – the type of id. Defaults to sweep_name.

  • metric (str, optional) – the metric to check. Defaults to validauc.

  • metric_type (str, optional) – the type of metric max or min. Defaults to max. metric_type==’min’ todo

  • min_run_num (int, optional) – the min run num to check. Defaults to 200.

  • patience (int, optional) – the patience to stop. Defaults to 50.

  • force_check_df – always check df, defalut is false.

Returns

{“state”:state,’df’:df,”num_run”:num_run}, state is ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’,df is the df of the sweep, num_run is the num of sweep run, -1 mean the sweep is finished to save time we will not check it again.

Return type

dict

check_sweep_list(sweep_key_list, metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False, stop=False, n_jobs=5)[source]
extract_best_models(df, dataset_name, model_name, emb_type='qid', eval_test=True, fpath='./seedwandb/predict.yaml', CONFIG_FILE='../configs/best_model.json', wandb_key='', pred_dir='pred_wandbs', launch_file='start_predict.sh', generate_all=False)[source]

extracting the best models which performance best performance on the validation data for testing

Parameters
  • df – dataframe of best results in each fold

  • dataset_name – dataset_name

  • model_name – model_name

  • emb_type – embedding_type, default:qid

  • eval_test – evaluating on testing set, default:True

  • fpath – the yaml template for prediction in wandb, default: “./seedwandb/predict.yaml”

  • config_file – the config template of generating prediction file, default: “../configs/best_model.json”

  • wandb_key – the key of wandb account

  • pred_wandbs – the directory of prediction yaml files, default: “pred_wandbs”

  • launch_file – the launch file of starting the wandb prediction, default: “start_predict.sh”

  • generate_all – starting all the files on the pred_wandbs directory or not, default:False

Returns

the launch file (e.g., “start_predict.sh”) for wandb prediction of the best models in each fold

extract_prediction_results(dataset_name, model_name, emb_type='qid', print_std=True)[source]

calculating the results on the testing data in the best model in validation set.

Parameters
  • dataset_name – dataset_name

  • model_name – model_name

  • emb_type – embedding_type, default:qid

  • print_std – print the standard deviation results or not, default:True

Returns

the average results of auc, acc in 5-folds and the corresponding standard deviation results

generate_sweep(wandb_key, pred_dir, sweep_shell, ftarget, generate_all)[source]
generate_wandb(dataset_name, model_name, emb_type, fpath, ftarget, model_path)[source]
get_all_fold_name(dataset_name, model_name, emb_type='qid')[source]
get_best_run(dataset_name, model_name, emb_type='qid', metric='validauc', metric_type='max', min_run_num=200, patience=50, save_dir='results/wandb_result', n_jobs=5, force_reget=False, k=5)[source]
get_df(id, input_type='sweep_name', drop_duplicate=False, drop_na=True, only_finish=True)[source]

Get one sweep result

Parameters
  • id (str) – the sweep name or sweep id.

  • input_type (str, optional) – the type of id. Defaults to sweep_name.

Returns

_description_

Return type

pd.Data

get_df_by_model_dataset_name(dataset_name, model_name, emb_type='qid', drop_duplicate=False, drop_na=True, only_finish=True, n_jobs=5)[source]
get_model_run_time(dataset_name, model_name, emb_type='qid', metric='validauc', metric_type='max', min_run_num=200, patience=50, save_dir='results/wandb_result', n_jobs=5)[source]

Get the average run second in one sweep

get_multi_df(id_list=[], input_type='sweep_name', drop_duplicate=False, drop_na=True, only_finish=True, n_jobs=5)[source]

Get multi sweep result

Parameters
  • id_list (list) – the list of sweep name or sweep id.

  • input_type (str, optional) – the type of id. Defaults to sweep_name.

Returns

_description_

Return type

_type_

get_multi_df_by_pattern(sweep_pattern, drop_duplicate=False, drop_na=True, only_finish=True, n_jobs=5)[source]
get_stop_index(df, metric='validauc', metric_type='max', min_run_num=200, patience=50)[source]
get_sweep_dict()[source]

Get sweep dict

get_sweep_info(id, input_type='sweep_name')[source]

Get sweep run status

Parameters
  • id (str) – the sweep name or sweep id.

  • input_type (str, optional) – the type of id. Defaults to sweep_name.

Returns

the state of sweep. ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’

Return type

str

get_sweep_info_by_pattern(sweep_pattern, n_jobs=5, return_df=False)[source]
get_sweep_run_num(id, input_type='sweep_name')[source]

Get sweep run num

Parameters
  • id (str) – the sweep name or sweep id.

  • input_type (str, optional) – the type of id. Defaults to sweep_name.

Returns

the num of sweep run

Return type

int

get_sweep_status(id, input_type='sweep_name')[source]

Get sweep run status

Parameters
  • id (str) – the sweep name or sweep id.

  • input_type (str, optional) – the type of id. Defaults to sweep_name.

Returns

the state of sweep. ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’

Return type

str

stop_sweep(cmd)[source]
write_config(dataset_name, dconfig, CONFIG_FILE)[source]
pykt.utils.wandb_utils.get_runs_result(runs)[source]

Module contents