Atualize-se com os principais artigos do arXiv por meio de podcasts e vídeos. O programa oferece resumos claros, tornando a pesquisa acadêmica mais acessível.
[QA] Do Large Language Model Benchmarks Test Reliability?
7 mins • Feb 6, 2025
Charts
- 80Decreased by 42
- 152Decreased by 19
- 196Decreased by 49
- 186Decreased by 21
- 123NEW
Episódios recentes
![](https://files.podcastos.com/shows/ygq7hi/jpeg256-0e828f29.jpg)
Feb 6, 2025
[QA] Do Large Language Model Benchmarks Test Reliability?
7 mins
![](https://files.podcastos.com/shows/ygq7hi/jpeg256-0e828f29.jpg)
Feb 6, 2025
Do Large Language Model Benchmarks Test Reliability?
9 mins
![](https://files.podcastos.com/shows/ygq7hi/jpeg256-0e828f29.jpg)
Feb 6, 2025
Detecting Strategic Deception Using Linear Probes
23 mins
![](https://files.podcastos.com/shows/ygq7hi/jpeg256-0e828f29.jpg)
Feb 5, 2025
[QA] Evaluation of Large Language Models via Coupled Token Generation
8 mins
![](https://files.podcastos.com/shows/ygq7hi/jpeg256-0e828f29.jpg)
Feb 5, 2025
Evaluation of Large Language Models via Coupled Token Generation
10 mins
![](https://files.podcastos.com/shows/ygq7hi/jpeg-6d55d43f.jpg)
Idioma
Inglês
País
Estados Unidos
Categorias
Feed Host
Website
Feed
Solicitar uma atualização
As atualizações podem levar alguns minutos.