aepo.mbr package

Subpackages

aepo.mbr.policy package

Submodules

aepo.mbr.mbr_engine module

aepo.mbr.mbr_engine.run_mbr(sample_dir: str, matrix_dir: str, dmbr_dir: str, num_instructions: str, num_responses: str, sim: str, use_matrix_cache: bool, diverse_k: int, diversity_penalty: float) → DataFrame

Parameters:

sample_dir (str) – the directory of the samples.
matrix_dir (str) – the directory of the similarity matrices.
dmbr_dir (str) – the directory of the diverse MBR results.
num_instructions (int) – the number of instructions.
num_responses (int) – the number of responses per instruction.
sim (str) – the similarity model to use.
use_matrix_cache (bool) – the flag to use the matrix cache.
diverse_k (int) – the number responses to select by diverse MBR.
diversity_penalty (float) – the diversity penalty lambda.

Returns:

the diverse MBR results.

Return type:

pd.DataFrame

Run the diverse MBR algorithm. See https://github.com/CyberAgentAILab/diverse-mbr for more details of diverse MBR.

aepo.mbr.reward_engine module

aepo.mbr.reward_engine.run_reward(reward_model_id: str, instructions: List[str], output_dir: str, output_filename: str, num_instructions: int, num_annotations: int, west_of_n: bool, dmbr_result: DataFrame) → DataFrame

Parameters:

reward_model_id (str) – the Huggingface Hub’s repository name of the reward model.
instructions (List[str]) – the list of instructions.
output_dir (str) – the output directory.
output_filename (str) – the output filename.
num_instructions (int) – the number of instructions.
num_annotations (int) – the number of annotations available per instruction.
west_of_n (bool) – if true, output the best and worst of N samples instead of all annotated responses.
dmbr_result (pd.DataFrame) – the diverse MBR results.

Returns:

the annotation-efficient preference optimization dataset.

Return type:

pd.DataFrame

Annotate the reward for the responses selected by diverse MBR.

aepo.mbr.reward_model module

class aepo.mbr.reward_model.Eurus(reward_model_id)

Bases: RewardModel

Eurus reward model.

get_reward(question, answer)

class aepo.mbr.reward_model.GPTRewardModel(model_path)

Bases: Module

forward(input_ids=None, past_key_values=None, attention_mask=None, position_ids=None): input_ids, attention_mask: torch.Size([bs, seq_len]) return: scores: List[bs]

get_device()

class aepo.mbr.reward_model.OASST(reward_model_id)

Bases: RewardModel

OpenAssistant reward model.

get_reward(question, answer)

class aepo.mbr.reward_model.PairLM(reward_model_id)

Bases: RewardModel

PairLM reward model.

get_pairwise_reward(question: str, answer: str, compared_answer: str) → float

get_reward(question, answer)

get_rewards(question, answers)

get_winratio(question, answer, compared_answers)

tokenize_pair(sources: List[str], candidate1s: List[str], candidate2s: List[str], source_max_length=1224, candidate_max_length=412)

class aepo.mbr.reward_model.RewardModel(reward_model_id)

Bases: object

Base class for reward models.

get_pairwise_reward(question: str, answer: str, compared_answer: str) → float

get_pairwise_rewards(question: str, answers: List[str]) → ndarray

get_reward(question: str, answer: str) → float

get_rewards(question: str, answers: List[str]) → List[float]

class aepo.mbr.reward_model.StanfordNLP(reward_model_id)

Bases: RewardModel

StanfordNLP reward model.

get_pairwise_reward(question: str, answer: str, compared_answer: str) → float

get_reward(question, answer)

class aepo.mbr.reward_model.Starling(reward_model_id)

Bases: RewardModel

get_reward(question, answer)

get_reward_(samples): samples: List[str]

get_rewards(question, answers)

aepo.mbr.reward_model.load_reward_model(reward_model_id: str): Currently it only supports the following reward models. To add a new reward model, implement it in the RewardModel class.

aepo.mbr.utility_func module

aepo.mbr.utility_func.load_similarity(sim: str) → callable

Parameters:: sim (str) – the name of the similarity function to load.
Returns:: the similarity function.
Return type:: callable

Load the similarity function to be used for diverse MBR. For the purpose of AEPO, we use Sentence BERT (sentbert).

aepo.mbr package

Subpackages

Submodules

aepo.mbr.mbr_engine module

aepo.mbr.reward_engine module

aepo.mbr.reward_model module

aepo.mbr.utility_func module

Module contents