pykt.utils package
Submodules
pykt.utils.utils module
- pykt.utils.utils.debug_print(text, fuc_name='')[source]
Printing text with function name.
- Parameters
text (str) – the text will print
fuc_name (str, optional) – _description_. Defaults to “”.
pykt.utils.wandb_utils module
- class pykt.utils.wandb_utils.WandbUtils(user, project_name, use_cache=False, print_details=True, cache_dir='results/wandb_result')[source]
Bases:
object
wandb utils
wandb_api = WandbUtils(user=’tabchen’, project_name=’pykt_iekt_pred’) >self.sweep_dict is {‘mx2tvwfy’: [‘mx2tvwfy’]}
- check_sweep_by_model_dataset_name(dataset_name, model_name, emb_type='qid', metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False, stop=False, n_jobs=5)[source]
- check_sweep_by_pattern(sweep_pattern, metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False, stop=False, n_jobs=5)[source]
Check sweeps by pattern
- Parameters
sweep_pattern (str) – check the sweeps which sweep names start with sweep_pattern
metric (str, optional) – the metric to check. Defaults to validauc.
metric_type (str, optional) – the type of metric max or min. Defaults to max.
min_run_num (int, optional) – the min run num to check. Defaults to 200.
patience (int, optional) – the patience to stop. Defaults to 50.
force_check_df – always check df, defalut is false.
- Returns
the list of dict, each dict is {“id”:id,”state”:state,’df’:df,”num_run”:num_run}, state is ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’,df is the df of the sweep, num_run is the num of sweep run, -1 mean the sweep is finished to save time we will not check it again.
- Return type
list
- check_sweep_early_stop(id, input_type='sweep_name', metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False)[source]
Check sweep early stop
- Parameters
id (str) – the sweep name or sweep id.
input_type (str, optional) – the type of id. Defaults to sweep_name.
metric (str, optional) – the metric to check. Defaults to validauc.
metric_type (str, optional) – the type of metric max or min. Defaults to max. metric_type==’min’ todo
min_run_num (int, optional) – the min run num to check. Defaults to 200.
patience (int, optional) – the patience to stop. Defaults to 50.
force_check_df – always check df, defalut is false.
- Returns
{“state”:state,’df’:df,”num_run”:num_run}, state is ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’,df is the df of the sweep, num_run is the num of sweep run, -1 mean the sweep is finished to save time we will not check it again.
- Return type
dict
- check_sweep_list(sweep_key_list, metric='validauc', metric_type='max', min_run_num=200, patience=50, force_check_df=False, stop=False, n_jobs=5)[source]
- extract_best_models(df, dataset_name, model_name, emb_type='qid', eval_test=True, fpath='./seedwandb/predict.yaml', CONFIG_FILE='../configs/best_model.json', wandb_key='', pred_dir='pred_wandbs', launch_file='start_predict.sh', generate_all=False)[source]
extracting the best models which performance best performance on the validation data for testing
- Parameters
df – dataframe of best results in each fold
dataset_name – dataset_name
model_name – model_name
emb_type – embedding_type, default:qid
eval_test – evaluating on testing set, default:True
fpath – the yaml template for prediction in wandb, default: “./seedwandb/predict.yaml”
config_file – the config template of generating prediction file, default: “../configs/best_model.json”
wandb_key – the key of wandb account
pred_wandbs – the directory of prediction yaml files, default: “pred_wandbs”
launch_file – the launch file of starting the wandb prediction, default: “start_predict.sh”
generate_all – starting all the files on the pred_wandbs directory or not, default:False
- Returns
the launch file (e.g., “start_predict.sh”) for wandb prediction of the best models in each fold
- extract_prediction_results(dataset_name, model_name, emb_type='qid', print_std=True)[source]
calculating the results on the testing data in the best model in validation set.
- Parameters
dataset_name – dataset_name
model_name – model_name
emb_type – embedding_type, default:qid
print_std – print the standard deviation results or not, default:True
- Returns
the average results of auc, acc in 5-folds and the corresponding standard deviation results
- get_best_run(dataset_name, model_name, emb_type='qid', metric='validauc', metric_type='max', min_run_num=200, patience=50, save_dir='results/wandb_result', n_jobs=5, force_reget=False, k=5)[source]
- get_df(id, input_type='sweep_name', drop_duplicate=False, drop_na=True, only_finish=True)[source]
Get one sweep result
- Parameters
id (str) – the sweep name or sweep id.
input_type (str, optional) – the type of id. Defaults to sweep_name.
- Returns
_description_
- Return type
pd.Data
- get_df_by_model_dataset_name(dataset_name, model_name, emb_type='qid', drop_duplicate=False, drop_na=True, only_finish=True, n_jobs=5)[source]
- get_model_run_time(dataset_name, model_name, emb_type='qid', metric='validauc', metric_type='max', min_run_num=200, patience=50, save_dir='results/wandb_result', n_jobs=5)[source]
Get the average run second in one sweep
- get_multi_df(id_list=[], input_type='sweep_name', drop_duplicate=False, drop_na=True, only_finish=True, n_jobs=5)[source]
Get multi sweep result
- Parameters
id_list (list) – the list of sweep name or sweep id.
input_type (str, optional) – the type of id. Defaults to sweep_name.
- Returns
_description_
- Return type
_type_
- get_multi_df_by_pattern(sweep_pattern, drop_duplicate=False, drop_na=True, only_finish=True, n_jobs=5)[source]
- get_sweep_info(id, input_type='sweep_name')[source]
Get sweep run status
- Parameters
id (str) – the sweep name or sweep id.
input_type (str, optional) – the type of id. Defaults to sweep_name.
- Returns
the state of sweep. ‘RUNNING’, ‘CANCELED’ or ‘FINISHED’
- Return type
str
- get_sweep_run_num(id, input_type='sweep_name')[source]
Get sweep run num
- Parameters
id (str) – the sweep name or sweep id.
input_type (str, optional) – the type of id. Defaults to sweep_name.
- Returns
the num of sweep run
- Return type
int