aepo.mbr package

Subpackages

Submodules

aepo.mbr.mbr_engine module

aepo.mbr.mbr_engine.run_mbr(sample_dir: str, matrix_dir: str, dmbr_dir: str, num_instructions: str, num_responses: str, sim: str, use_matrix_cache: bool, diverse_k: int, diversity_penalty: float) DataFrame
Parameters:
  • sample_dir (str) – the directory of the samples.

  • matrix_dir (str) – the directory of the similarity matrices.

  • dmbr_dir (str) – the directory of the diverse MBR results.

  • num_instructions (int) – the number of instructions.

  • num_responses (int) – the number of responses per instruction.

  • sim (str) – the similarity model to use.

  • use_matrix_cache (bool) – the flag to use the matrix cache.

  • diverse_k (int) – the number responses to select by diverse MBR.

  • diversity_penalty (float) – the diversity penalty lambda.

Returns:

the diverse MBR results.

Return type:

pd.DataFrame

Run the diverse MBR algorithm. See https://github.com/CyberAgentAILab/diverse-mbr for more details of diverse MBR.

aepo.mbr.reward_engine module

aepo.mbr.reward_engine.run_reward(reward_model_id: str, instructions: List[str], output_dir: str, output_filename: str, num_instructions: int, num_annotations: int, west_of_n: bool, dmbr_result: DataFrame) DataFrame
Parameters:
  • reward_model_id (str) – the Huggingface Hub’s repository name of the reward model.

  • instructions (List[str]) – the list of instructions.

  • output_dir (str) – the output directory.

  • output_filename (str) – the output filename.

  • num_instructions (int) – the number of instructions.

  • num_annotations (int) – the number of annotations available per instruction.

  • west_of_n (bool) – if true, output the best and worst of N samples instead of all annotated responses.

  • dmbr_result (pd.DataFrame) – the diverse MBR results.

Returns:

the annotation-efficient preference optimization dataset.

Return type:

pd.DataFrame

Annotate the reward for the responses selected by diverse MBR.

aepo.mbr.reward_model module

class aepo.mbr.reward_model.Eurus(reward_model_id)

Bases: RewardModel

Eurus reward model.

get_reward(question, answer)
class aepo.mbr.reward_model.GPTRewardModel(model_path)

Bases: Module

forward(input_ids=None, past_key_values=None, attention_mask=None, position_ids=None)

input_ids, attention_mask: torch.Size([bs, seq_len]) return: scores: List[bs]

get_device()
class aepo.mbr.reward_model.OASST(reward_model_id)

Bases: RewardModel

OpenAssistant reward model.

get_reward(question, answer)
class aepo.mbr.reward_model.PairLM(reward_model_id)

Bases: RewardModel

PairLM reward model.

get_pairwise_reward(question: str, answer: str, compared_answer: str) float
get_reward(question, answer)
get_rewards(question, answers)
get_winratio(question, answer, compared_answers)
tokenize_pair(sources: List[str], candidate1s: List[str], candidate2s: List[str], source_max_length=1224, candidate_max_length=412)
class aepo.mbr.reward_model.RewardModel(reward_model_id)

Bases: object

Base class for reward models.

get_pairwise_reward(question: str, answer: str, compared_answer: str) float
get_pairwise_rewards(question: str, answers: List[str]) ndarray
get_reward(question: str, answer: str) float
get_rewards(question: str, answers: List[str]) List[float]
class aepo.mbr.reward_model.StanfordNLP(reward_model_id)

Bases: RewardModel

StanfordNLP reward model.

get_pairwise_reward(question: str, answer: str, compared_answer: str) float
get_reward(question, answer)
class aepo.mbr.reward_model.Starling(reward_model_id)

Bases: RewardModel

get_reward(question, answer)
get_reward_(samples)

samples: List[str]

get_rewards(question, answers)
aepo.mbr.reward_model.load_reward_model(reward_model_id: str)

Currently it only supports the following reward models. To add a new reward model, implement it in the RewardModel class.

aepo.mbr.utility_func module

aepo.mbr.utility_func.load_similarity(sim: str) callable
Parameters:

sim (str) – the name of the similarity function to load.

Returns:

the similarity function.

Return type:

callable

Load the similarity function to be used for diverse MBR. For the purpose of AEPO, we use Sentence BERT (sentbert).

Module contents