AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising

About

Cite

AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising

AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising

Peinan Zhang

♠︎

,

Hiroki Ouchi

◇, ♠︎

,

♠︎

CyberAgent

,

Nara Institute of Science and Technology

NAACL 2025

AdTEC

The first public dataset designed for evaluating the quality of ad texts based on real-world AdOps workflows.

Task Details

Task Description

The goal is to predict the overall quality of an ad text with binary labels: acceptable / unacceptable.

Background

As most ad delivery platforms impose text length restrictions, minor grammatical errors are tolerated to enhance the readability and engage customers within limited space. However, excessive compression can mislead customers, and such poor-quality ads should be detected before delivery to avoid negative impacts on the advertiser.

Dataset Card

Task NameTask SetupInputLabelsMetrics#Train#Dev#Test
Ad AcceptabilityClassificationAd textAcceptable/UnacceptableAccuracy/F1 Score13265970980
Ad ConsistencyClassificationAd text, LP textConsistent/InconsistentAccuracy/F1 Score10639945970
Ad Performance EstimationRegressionAd texts, Keywords, IndustryQuality Score ([0, 100])Pearson/Spearman Correlation125087965965
Ad Aspect RecognitionMulti-label ClassificationAd textAspects (e.g., Brand, Price, Quality)F1-micro/-macro Score1856465410
Ad SimilarityRegressionAd text pairSimilarity Score ([0, 1])Pearson/Spearman Correlation4980623629

Examples

Experiments

Evaluator

Ad Acceptability

Accuracy/F1 Score

Ad Consistency

Accuracy/F1 Score

Ad Perf. Est.

Pearson/Spearman Corr.

A3 Recognition

F1-micro/-macro

Ad Similarity

Pearson/Spearman Corr.

TohokuBERT-Base0.685/0.6910.757/0.5040.437/0.4540.753/0.6290.769/0.803
TohokuBERT-Large0.711/0.688**0.767/0.552****0.480/0.497**0.774/**0.694**0.773/0.807
WasedaBERT-Base0.615/0.6390.725/0.3880.444/0.4540.641/0.4420.749/0.797
WasedaBERT-Large0.598/0.6370.755/0.4740.445/0.4570.663/0.5170.740/0.800
XLM-RoBERTa-Base0.694/0.6770.743/0.4650.425/0.4390.730/0.5420.846/0.870
XLM-RoBERTa-Large0.705/0.6900.758/0.5190.453/0.4570.778/**0.677****0.878/0.878**
CALM2-7B0.520/0.1150.381/0.4720.006/0.0130.154/0.0420.036/0.036
ELYZA-7B0.352/0.5200.628/0.7710.003/0.0460.196/0.0440.015/-0.004
GPT-3.50.369/0.4890.528/0.570-0.013/-0.0220.255/0.0640.389/0.385
GPT-40.325/0.4330.583/0.6120.028/0.0730.417/0.1130.776/0.811
ELYZA-7B (Fine-tuned)0.638/0.6380.692/0.6940.240/0.2350.379/0.2800.684/0.740
Human**0.732/0.790**0.703/**0.807**0.564/0.5380.699/0.765

Discussion

BibTeX

@inproceedings{zhang2025adtec,
  title={{AdTEC}: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising},
  author={Peinan Zhang and Yusuke Sakai and Masato Mita and Hiroki Ouchi and Taro Watanabe},
  booktitle={Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL)},
  year={2025},
  publisher={Association for Computational Linguistics},
  eprint={2408.05906},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2408.05906},
}

Copyright © 2025 Peinan Zhang. All rights reserved.

Commons Attribution-ShareAlike 4.0 International License.