aepo.mbr package
Subpackages
Submodules
aepo.mbr.mbr_engine module
- aepo.mbr.mbr_engine.run_mbr(sample_dir: str, matrix_dir: str, dmbr_dir: str, num_instructions: str, num_responses: str, sim: str, use_matrix_cache: bool, diverse_k: int, diversity_penalty: float) DataFrame
- Parameters:
sample_dir (str) – the directory of the samples.
matrix_dir (str) – the directory of the similarity matrices.
dmbr_dir (str) – the directory of the diverse MBR results.
num_instructions (int) – the number of instructions.
num_responses (int) – the number of responses per instruction.
sim (str) – the similarity model to use.
use_matrix_cache (bool) – the flag to use the matrix cache.
diverse_k (int) – the number responses to select by diverse MBR.
diversity_penalty (float) – the diversity penalty lambda.
- Returns:
the diverse MBR results.
- Return type:
pd.DataFrame
Run the diverse MBR algorithm. See https://github.com/CyberAgentAILab/diverse-mbr for more details of diverse MBR.
aepo.mbr.reward_engine module
- aepo.mbr.reward_engine.run_reward(reward_model_id: str, instructions: List[str], output_dir: str, output_filename: str, num_instructions: int, num_annotations: int, west_of_n: bool, dmbr_result: DataFrame) DataFrame
- Parameters:
reward_model_id (str) – the Huggingface Hub’s repository name of the reward model.
instructions (List[str]) – the list of instructions.
output_dir (str) – the output directory.
output_filename (str) – the output filename.
num_instructions (int) – the number of instructions.
num_annotations (int) – the number of annotations available per instruction.
west_of_n (bool) – if true, output the best and worst of N samples instead of all annotated responses.
dmbr_result (pd.DataFrame) – the diverse MBR results.
- Returns:
the annotation-efficient preference optimization dataset.
- Return type:
pd.DataFrame
Annotate the reward for the responses selected by diverse MBR.
aepo.mbr.reward_model module
- class aepo.mbr.reward_model.Eurus(reward_model_id)
Bases:
RewardModelEurus reward model.
- get_reward(question, answer)
- class aepo.mbr.reward_model.GPTRewardModel(model_path)
Bases:
Module- forward(input_ids=None, past_key_values=None, attention_mask=None, position_ids=None)
input_ids, attention_mask: torch.Size([bs, seq_len]) return: scores: List[bs]
- get_device()
- class aepo.mbr.reward_model.OASST(reward_model_id)
Bases:
RewardModelOpenAssistant reward model.
- get_reward(question, answer)
- class aepo.mbr.reward_model.PairLM(reward_model_id)
Bases:
RewardModelPairLM reward model.
- get_pairwise_reward(question: str, answer: str, compared_answer: str) float
- get_reward(question, answer)
- get_rewards(question, answers)
- get_winratio(question, answer, compared_answers)
- tokenize_pair(sources: List[str], candidate1s: List[str], candidate2s: List[str], source_max_length=1224, candidate_max_length=412)
- class aepo.mbr.reward_model.RewardModel(reward_model_id)
Bases:
objectBase class for reward models.
- get_pairwise_reward(question: str, answer: str, compared_answer: str) float
- get_pairwise_rewards(question: str, answers: List[str]) ndarray
- get_reward(question: str, answer: str) float
- get_rewards(question: str, answers: List[str]) List[float]
- class aepo.mbr.reward_model.StanfordNLP(reward_model_id)
Bases:
RewardModelStanfordNLP reward model.
- get_pairwise_reward(question: str, answer: str, compared_answer: str) float
- get_reward(question, answer)
- class aepo.mbr.reward_model.Starling(reward_model_id)
Bases:
RewardModel- get_reward(question, answer)
- get_reward_(samples)
samples: List[str]
- get_rewards(question, answers)
- aepo.mbr.reward_model.load_reward_model(reward_model_id: str)
Currently it only supports the following reward models. To add a new reward model, implement it in the RewardModel class.
aepo.mbr.utility_func module
- aepo.mbr.utility_func.load_similarity(sim: str) callable
- Parameters:
sim (str) – the name of the similarity function to load.
- Returns:
the similarity function.
- Return type:
callable
Load the similarity function to be used for diverse MBR. For the purpose of AEPO, we use Sentence BERT (sentbert).