MARS: A Foundational Map Auto-Regressor

Abstract

Map generation tasks, featured by extensive non-structural vectorized data (e.g., points, polylines, and polygons), pose significant challenges to common pixelwise generative models. Past works, by segmenting and then performing various vectorized post-processing, usually sacrifice accuracy. Motivated by the recent huge success of auto-regressive language modeling, we propose the first map foundational model: Map Auto-Regressor (MARS), that is capable of generating both multi-polyline road networks and polygon buildings in a unified manner. We collected by far the largest multi-class map dataset, MAP-3M, to support the robust training. Extensive benchmarks highlight the performance superiority of MARS against literature works. Meanwhile, benefited from the auto-regressive teaching-forcing based training, we develop the “Chat with MARS” capability that enables interactive human-in-the-loop map generation and correction.

Click 1 Demo

This is particularly helpful when a test image is extremely blurry or out-of-domain: once the first vertex predicted by MARS is ill-conditioned, due to error accumulation of auto-regressive nature, the whole sequence may suffer from less detections. In such case, SOS chatting can greatly improve the full image prediction performance.

Original Inference

After Click

Click 2 Demo

Mid-of-sequence (MOS) chatting aims to intercept MARS’s prediction sequence when it drifts from the desired trajectory, which is common in vectorized road generation. This is particularly helpful when certain predictions in an image needs to be adjusted.

Original Inference

After Click

Click 3 Demo

Mid-of-sequence (MOS) chatting aims to intercept MARS’s prediction sequence when it drifts from the desired trajectory, which is common in vectorized road generation. This is particularly helpful when certain predictions in an image needs to be adjusted. End-of-sequence (EOS) chatting aims to augment MARS’s prediction when there are objects missed from the final predictions, which is common for various small map elements.

Original Inference

After Click

Click Jitter Analysis

Robustness to Click Position Variations

The left image shows the ground truth reference. The middle image shows the prediction with correct click position. Use the slider to browse through predictions with different click positions.