M
ML Research Hub
@DataScienceT32.4K подп.
266просмотров
0.8%от подписчиков
27 марта 2026 г.
📷 ФотоScore: 293
✨Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition 📝 Summary: CroBo is a visual state representation framework that learns what-is-where composition for robotics. It uses global-to-local reconstruction to encode scene element identities and spatial locations in a compact token. This enables tracking scene dynamics for sequential decision making. 🔹 Publication Date: Published on Mar 14 🔹 Paper Links: • arXiv Page: https://arxiv.org/abs/2603.13904 • PDF: https://arxiv.org/pdf/2603.13904 • Project Page: https://seokminlee-chris.github.io/CroBo-ProjectPage/ • Github: https://github.com/SeokminLee-Chris/CroBo ================================== For more data science resources: ✓ https://t.me/DataScienceT #Robotics #ComputerVision #SceneUnderstanding #AI #StateRepresentation
266
просмотров
833
символов
Нет
эмодзи
Да
медиа

Другие посты @DataScienceT

Все посты канала →
✨Pixel-level Scene Understanding in One Token: Visual States — @DataScienceT | PostSniper