Introduction | GeoTransQuData

From questions to constructing valid answers

Geographic data is essential for understanding challenges such as climate change, urban development, and public health. Yet answering corresponding questions with such data remains difficult. It typically requires expert knowledge of Geographic Information Systems (GIS), as well as the ability to select appropriate data sources and construct analytical workflows.

Recent advances in Artificial Intelligence, especially large language models (LLMs), suggest that natural language can become an interface to data. However, current systems are primarily designed to retrieve or summarize existing information. Moreover, they are limited in terms of validity, prone to hallucinations, and lack an understanding of the purposes underlying information use. As a result, they fail to address a central challenge of geographic analysis: answers often do not exist in advance, they must be constructed from data in a purposeful manner.

GeoTrAnsQData tackles this challenge by developing a new paradigm called transformative Geographic Question Answering (GeoQA). The key idea is to interpret a question as a request for a geo-analytical workflow that transforms maps into valid answer maps. Such a workflow specifies how available geodata sources can be transformed into an answer.

Constructing such workflows is non-trivial. For any given question, there are typically:

multiple possible data sources,
multiple alternative transformation paths, and
varying degrees of validity depending on the analytical purpose.

Current AI systems, including LLMs, lack explicit models of these transformations and therefore cannot reliably generate, compare, or justify alternative analytical solutions.

GeoTrAnsQData addresses this gap by developing:

conceptual transformation graphs that represent geo-analytical processes in a transparent and explainable way,
knowledge graph models of geodata sources that capture their provenance and analytical potential, and
AI methods that interpret natural language questions using formal grammars in order to synthesize candidate workflows.

This approach makes it possible not only to generate answers, but to construct, compare, and justify multiple valid ways of answering a question. In doing so, the project enables a systematic exploration of the analytical potential of geodata repositories and helps to uncover hidden assumptions and biases in spatial analysis.

By going beyond retrieval-based systems and current foundation models, GeoTrAnsQData lays the groundwork for a new generation of AI that supports explainable, purpose-driven, and methodologically sound geographic analysis.