MARS: A Foundational Map Auto-Regressor.

Abstract

Map generation tasks, featured by extensive non-structural vectorized data (e.g., points, polylines, and polygons), pose significant challenges to common pixelwise generative models. Past works, by segmenting and then performing various vectorized post-processing, usually sacrifice accuracy. Motivated by the recent huge success of auto-regressive language modeling, we propose the first map foundational model: Map Auto-Regressor (MARS), that is capable of generating both multi-polyline road networks and polygon buildings in a unified manner. We collected by far the largest multi-class map dataset, MAP-3M, to support the robust training. Extensive benchmarks highlight the performance superiority of MARS against literature works. Meanwhile, benefited from the auto-regressive teaching-forcing based training, we develop the “Chat with MARS” capability that enables interactive human-in-the-loop map generation and correction.

Click 1 Demo

This is particularly helpful when a test image is extremely blurry or out-of-domain: once the first vertex predicted by MARS is ill-conditioned, due to error accumulation of auto-regressive nature, the whole sequence may suffer from less detections. In such case, SOS chatting can greatly improve the full image prediction performance.

Original Inference

After Click


Click 2 Demo

Mid-of-sequence (MOS) chatting aims to intercept MARS’s prediction sequence when it drifts from the desired trajectory, which is common in vectorized road generation. This is particularly helpful when certain predictions in an image needs to be adjusted.

Original Inference

Original Inference

After Click

After Click

Click 3 Demo

Mid-of-sequence (MOS) chatting aims to intercept MARS’s prediction sequence when it drifts from the desired trajectory, which is common in vectorized road generation. This is particularly helpful when certain predictions in an image needs to be adjusted. End-of-sequence (EOS) chatting aims to augment MARS’s prediction when there are objects missed from the final predictions, which is common for various small map elements.

Original Inference

After Click


Click Jitter Analysis

Robustness to Click Position Variations

The left image shows the ground truth reference. The middle image shows the prediction with correct click position. Use the slider to browse through predictions with different click positions.

Ground Truth

Ground Truth

Correct Click Position

Correct Click Position

Click Position Variations

Image 1 of 5

Stroke Visualizations

Intersection Examples

Browse through different intersection visualizations.

Image 1 of 14

Roundabout Examples

Browse through different roundabout visualizations.

Image 1 of 11

Splits Examples

Browse through different road splits visualizations.

Image 1 of 11

T-Junction Examples

Browse through different T-junction visualizations.

Image 1 of 1

Challenging Cases

Complex Intersection

Good Examples.

Original Inference
After Click
After Click
After Click
After Click
After Click

Poor Examples

Original Inference
After Click
After Click

Occlusions

Good Examples.

Original Inference
After Click
After Click
After Click
After Click
After Click

Bad Examples.

Original Inference
After Click
After Click