И
исследовано
@nlp_seminar757 подп.
1.9Kпросмотров
10 июля 2024 г.
Score: 2.1K
Excited to share that we have released RuBLiMP (Russian Benchmark of Linguistic Minimal Pairs), a novel benchmark for evaluating Russian language models (LMs). ❓RuBLiMP consists of 45,000 minimal pairs and includes 12 grammatical phenomena well-represented in Russian linguistics, covering morphology, syntax, and semantics. A minimal pair consists of a grammatical and an ungrammatical sentence (e.g., The cat is on the mat / *The cat are on the mat), and an LM is expected to prefer the grammatical one based on the scoring function. Our approach allows to: 🔸generate minimal pairs at scale from any text domain 🔸estimate if a grammatical sentence appears in the LM's pretraining corpus 💡RuBLiMP can be used for evaluating the sensitivity of LMs to grammatical phenomena in Russian and for developing ranking and grammatical error detection methods. 🔸 Read more in our pre-print: https://arxiv.org/abs/2406.19232 🔸 HuggingFace: https://huggingface.co/datasets/RussianNLP/rublimp 🔸 GitHub: https://github.com/RussianNLP/RuBLiMP
1.9K
просмотров
1037
символов
Нет
эмодзи
Нет
медиа

Другие посты @nlp_seminar

Все посты канала →
Excited to share that we have released RuBLiMP (Russian Benc — @nlp_seminar | PostSniper