AdTEC
Accepted to NAACL 2025

AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising

Peinan Zhang ♠︎

Yusuke Sakai

Masato Mita ♠︎

Hiroki Ouchi ♠︎◇

Taro Watanabe

♠︎ CyberAgent

Nara Institute of Science and Technology

Hugging Face Dataset Paper Code Poster Slides Video

Abstract

As the fluency of ad texts automatically generated by natural language generation technologies continues to improve, there is an increasing demand to assess the quality of these creatives in real-world setting. We propose AdTEC, the first public benchmark to evaluate ad texts from multiple perspectives within practical advertising operations. Our contributions are as follows: (i) Defining five tasks for evaluating the quality of ad texts, as well as constructing a Japanese dataset based on the practical operational experiences of advertising agencies, which are typically maintained in-house. (ii) Validating the performance of existing pre-trained language models (PLMs) and human evaluators on this dataset. (iii) Analyzing the characteristics and providing challenges of the benchmark. Our results show that while PLMs have a practical level of performance in several tasks, humans continue to outperform them in certain domains, indicating that there remains significant potential for further improvement in this area.

Overview

Key Contributions

1

First Public AdOps Dataset

Constructing the first public benchmark for ad text evaluation based on practical, real-world advertising operations (AdOps).

2

Comprehensive Benchmarking

Validating the performance of various PLMs and human evaluators, establishing strong state-of-the-art baselines.

3

In-depth Analysis & Challenges

Analyzing the dataset's characteristics and identifying key challenges to guide future research in the advertising NLP domain.

Tasks

Task Description

The goal is to predict the overall quality of an ad text with binary labels: `acceptable` / `unacceptable`.

Background

As most ad delivery platforms impose text length restrictions, minor grammatical errors are tolerated to enhance readability. However, excessive compression can mislead customers, and such poor-quality ads should be detected before delivery to avoid negative impacts on the advertiser.

An image demonstrating examples of advertising text acceptability tasks

Dataset Statistics

Statistical information of data included in AdTEC benchmark

Task Train Dev Test
Ad Acceptability 13,265 970 980
Ad Consistency 10,635 945 970
Ad Performance Estimation 125,087 965 965
A3 Recognition 1,856 465 410
Ad Similarity 4,980 623 629

Examples

Citation

Please cite our paper when you use this dataset.

@inproceedings{zhang2025adtec,
  title={{AdTEC}: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising},
  author={Peinan Zhang and Yusuke Sakai and Masato Mita and Hiroki Ouchi and Taro Watanabe},
  booktitle={Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL)},
  year={2025},
  publisher={Association for Computational Linguistics},
  eprint={2408.05906},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2408.05906},
}

Frequently Asked Questions

The AdTEC dataset is the first public benchmark for evaluating the quality of ad text in search engine advertising from multiple, practical perspectives. It aims to provide evaluation criteria based on real-world advertising operations, which have been lacking in previous research.

Currently, the dataset consists of Japanese text only. However, the task design and evaluation framework proposed in this research are language-independent and can be applied to other languages.

The dataset is available on the Hugging Face Hub. The associated code can be obtained from the GitHub repository.

The AdTEC dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. This requires attribution, non-commercial use, and sharing any derivatives under the same license.