GIST Extracts Semantic Topology from Point Clouds for Embodied AI Navigation
GIST builds semantically annotated topological maps from point clouds to enable semantic search, localization, zone classification and natural-language routing for embodied AI, outperforming baselines and achieving 80% verbal navigation success.
GIST converts consumer-grade mobile point clouds into semantically annotated navigation topologies to solve spatial grounding in dense, quasi-static environments such as retail stores and hospitals (https://arxiv.org/abs/2604.15495).
The pipeline produces 2D occupancy maps, derives topological layouts, and adds a lightweight semantic layer through keyframe and semantic selection; this supports an intent-driven semantic search that infers alternatives, one-shot localization at 1.04 m top-5 mean translation error, floor-plan zone classification, and landmark-based instruction generation that exceeds sequence-based baselines in LLM evaluations (https://arxiv.org/abs/2604.15495).
PaLM-E integrates multimodal models for robotic control yet relies on implicit representations that struggle with long-tail semantics and stale visual features in cluttered spaces (https://arxiv.org/abs/2303.03378); GIST's explicit topology directly addresses this by grounding abstract knowledge in physical layouts, a connection mainstream generative AI coverage routinely omits.
RT-2 transfers web-scale knowledge to robotic actions but lacks GIST's structured semantic map for human-AI verbal navigation, which delivered 80 % success in a formative in-situ study (N=5) (https://arxiv.org/abs/2307.15818).
AXIOM: GIST's topology layer closes a key gap by tying language-model knowledge to physical layouts, likely accelerating reliable deployment of assistive robots in real-world cluttered settings.
Sources (3)
- [1]Primary Source(https://arxiv.org/abs/2604.15495)
- [2]PaLM-E: An Embodied Multimodal Language Model(https://arxiv.org/abs/2303.03378)
- [3]RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control(https://arxiv.org/abs/2307.15818)