Santa Fe Institute

At SFI we will evaluate fundamental visual reasoning abilities in multimodal LLMs, and develop novel neurosymbolic methods to advance such visual reasoning abilities. Our development of new methods will draw on previous work on modeling active visual perception and analogical reasoning, and on insights from the cognitive science of human visual reasoning abilities which show how humans build visual concepts on top of innate core knowledge systems and the integration of concepts with lower-level visual routines. This work will benefit from existing visual reasoning benchmarks as well as new datasets that we will design to adversarially test robustness to visual examples not likely to be similar to training data. This kind of adversarial testing has been common in evaluating text-only language models but such adversarial testing is still largely lacking for multimodal LLMs.