Georeferencing frames means assigning real-world coordinates (a spatial reference) to image or video frames so each pixel — and each camera pose — is expressed in a common coordinate system. In indoor contexts (where GNSS/GPS is unavailable or weak) georeferencing is done by fusing computer-vision (photogrammetry / Visual SLAM), sensor data (IMU, LiDAR), and control references (targets, surveyed control points or known anchors). This lets you convert raw video or 360° imagery into accurate 2D floorplans, 3D meshes/point clouds, and maps that an indoor navigation engine can use.

Step-by-Step engineering pipeline for turning video frames into a georeferenced indoor map, and practical instructions for using that map for indoor wayfinding.

1) Core methods & building blocks (what under the hood)

Feature-based photogrammetry — detect keypoints (SIFT / ORB / AKAZE), match across frames, and compute relative poses. With many overlapping frames you run bundle adjustment to solve camera intrinsics/extrinsics and 3D point positions (sparse point cloud). This is the classic SfM (Structure from Motion) / photogrammetry approach. ArcGIS Pro
Visual Simultaneous Localization And Mapping (Visual SLAM) — a real-time pipeline for tracking camera pose while incrementally building a map. Modern Visual SLAM adds loop-closure detection, relocalization and scale recovery (using stereo/RGB-D or IMU). Libraries such as ORB-SLAM3 and RTAB-Map are reference implementations that support monocular, stereo and RGB-D setups and are used in robotics and mapping systems. GitHub+1
Sensor fusion (VIO / LIDAR + vision) — fusing IMU data (Visual-Inertial Odometry) fixes scale and improves robustness during motion blur or low-texture areas; LIDAR supplies metric depth and dense point clouds that reduce drift and improve alignment for large buildings. CMU Biorobotics
Control/anchor-based georeferencing — introduce known points (surveyed control points, floorplan tie-points, QR markers, BLE/Wi-Fi anchors, or surveyed LiDAR control) and compute a 7-DOF similarity (scale, rotation, translation) or full affine transformation to align the locally-built map to the building coordinate frame or GIS reference. — essentially, map-to-world registration. ArcGIS Pro

2) End-to-end technical pipeline — from video to georeferenced map

This is a practical, engineer-oriented pipeline. For each step I add common tools/algorithm choices.

Step 0 — Decide sensor & data capture method

Choices: smartphone (RGB + IMU), 360° camera, RGB-D (Azure Kinect / RealSense), stereo camera, or LiDAR/multi-cam mobile mapping (NavVis, Matterport Pro).
Tradeoffs: smartphone is cheap but noisy; RGB-D gives metric depth and easier scale; LiDAR/mobile mapping gives survey-grade point clouds for high-accuracy georeferencing. NavVis+1

Step 1 — Data acquisition & metadata capture

Walk planned trajectories ensuring overlap and loop closures; capture IMU at device rate. For video: 30–60% frame overlap is a practical target (more overlap → better matches).
If available, capture occasional GNSS fixes near building entrances to provide coarse geo-tags for exterior matching.
Place a small set of surveyed control points (optical targets or AR markers) at known coordinates if you require metric geo-registration indoors.

Step 2 — Frame extraction & preprocessing

Extract frames from video at 1–5 fps for mapping (denser for high-speed motion).
Undistort frames with camera calibration (intrinsics). Use calibration tools (OpenCV, Kalibr) to compute focal length, principal point and lens distortion.
Optionally run image enhancement (denoise, exposure normalization) for low-light indoor footage.

Step 3 — Feature detection & matching

Use ORB / AKAZE for speed, SIFT / SURF for more robust matching (licensing considerations).
Perform pairwise matching with ratio tests and geometric outlier rejection (RANSAC Essential/Fundamental matrix) to compute relative pose hypotheses.

Step 4 — Pose graph & bundle adjustment (SfM stage)

Build a pose graph where nodes = keyframes and edges = relative poses.
Perform global bundle adjustment (sparse Levenberg–Marquardt) to optimize camera poses and 3D landmarks. Use toolkits like COLMAP, OpenMVG, or a SLAM backend (g2o / Ceres). This yields a consistent sparse 3D map and camera trajectory.

Step 5 — Sensor fusion (optional but recommended)

Fuse IMU (VIO) to recover metric scale and reduce drift; if you have depth (RGB-D or LiDAR), fuse for dense reconstruction (TSDF fusion or Poisson surface reconstruction). Libraries: ORB-SLAM3 (VIO), RTAB-Map (RGB-D/LiDAR integration), OpenVSLAM. GitHub+1

Step 6 — Dense reconstruction & meshing (if needed)

From aligned camera poses + depth, create dense point cloud (MVS / depth fusion). Tools: OpenMVS, COLMAP dense, or LiDAR-based meshing for higher fidelity.

Step 7 — Georeferencing / registration to building coordinates

Use control points (surveyed coordinates) or tie to an existing CAD/BIM floorplan: compute the best-fit similarity transform (7-parameter Helmert) or ICP (for point clouds) to align the SLAM/SfM map with the building coordinate system. Store transform as metadata (ECEF/local building grid). ArcGIS and GIS tools provide standard georeferencing utilities for rasters and point clouds. ArcGIS Pro

Step 8 — Export & packaging

Export floorplans (2D schematics), geo-referenced point clouds (.las/.e57), textured meshes (.obj/.ply), and camera trajectory (poses in JSON). Include the transform/CRS metadata so other systems can locate the map in the same local coordinate frame.

3) Building an indoor wayfinding system from the georeferenced map

Once you have a georeferenced map and camera poses, follow this design to create wayfinding:

A. Map abstraction & graph generation

Floorplan extraction — convert dense reconstruction into 2D floor polygons (walls, doors, obstructions). This can be automated by slicing point clouds by height and applying morphological processing + vectorization.
Navigation graph — build a walkable graph (nodes = POIs, intersections; edges = corridors/stairs) and annotate with edge weights (distance, travel time, accessibility). Include multi-floor connectors (elevators, stairs) with floor transitions.
Semantic enrichment — tag nodes with metadata: room numbers, POIs, amenities, safety exits, and AR anchor IDs.

B. Real-time localization for the user

Implement one or a hybrid of the following localization methods (ideally multi-modal fusion):

Visual relocalization: compare live camera frames to the georeferenced image database (place recognition / feature matching / image retrieval) and compute the camera pose via PnP — robust for AR guidance.
Visual-Inertial Odometry (VIO): track the device’s relative motion between relocalizations.
Beacon/Radio positioning: BLE/Wi-Fi RTT/UWB provide coarse/fine priors and can be fused with VIO for bootstrapping & outage handling.
LiDAR or depth-based localization: for robots or specialized devices, match live LiDAR scans to the global point cloud.
Combine via an EKF / factor graph to produce the best estimate of the user pose.

(Visual relocalization is especially useful in AR smartphone apps because each matched image directly anchors AR overlays to a known camera pose.)

C. Path planning & UX

Run A* or Dijkstra on the navigation graph. For accessibility, apply cost penalties for stairs, narrow corridors, etc.
For AR guidance: project the planned path into camera coordinates (using the live pose) and render wayfinding cues as arrows, breadcrumbs, or billboarded POI labels. For purely 2D apps, render a “blue dot” with a mini-map and turn-by-turn instructions.
Implement continuous relocalization checkpoints every N meters or when visual confidence drops.

D. Offline & drift recovery strategies

Persist recent camera poses & visual descriptors locally for quick relocalization when network is absent.
Periodically re-anchor to known control points (floor markers / QR tags / BLE beacons) to remove accumulated drift.
Use semantic cues (recognizable signage, room labels) for human-in-the-loop corrections when automatic relocalization fails.

4) Accuracy considerations & best practices

Control points matter: a few well-surveyed indoor control points reduce global alignment error dramatically. If you need sub-meter accuracy, include at least 3–5 surveyed anchors per floor.
Ensure coverage & loop closures: plan capture paths that loop back and revisit areas to strengthen loop closure detection in SLAM and reduce drift.
Use RGB-D or LiDAR where possible: depth sensors give metric scale and dense geometry — vital for robust map-to-world registration in feature-poor corridors.
Lighting & texture: low-light or repeated textures (long white corridors) break feature matching. Supplement with active markers (AprilTags/ArUco) or use depth sensors.
Privacy & data governance: indoor imagery may capture people; implement blur/anonymization and follow organizational privacy rules.

5) Key platforms, products & open-source stacks

Below I group products so you can choose by use-case: survey-grade capture, developer SDKs for indoor positioning/wayfinding, and open-source SLAM stacks.

Survey / reality-capture (high-accuracy, often LiDAR)

NavVis (M6, VLX, IndoorViewer) — mobile mapping hardware + digital twin platform; captures survey-grade point clouds and panoramic imagery and supports georegistration / export for downstream use. Great when you need enterprise-scale, accurate digital twins.
Matterport (Pro3, Lidar-enabled capture + cloud processing) — simplified capture-to-digital-twin workflow (smartphone capture to Pro cameras), strong for real-estate, facilities and quick 3D models plus floorplans.

Indoor positioning & wayfinding SDKs / platforms

IndoorAtlas — sensor-fusion indoor positioning SDK (geomagnetic + sensors + calibration) plus mapping and wayfinding APIs; used in airports, hospitals for smartphone wayfinding.
Pointr — commercial indoor location & wayfinding platform focused on accuracy and enterprise integration (retail, airports, hospitals).
Mapwize / Mappedin — indoor mapping & wayfinding platforms that integrate maps with location backends and provide route UX for kiosks and mobile apps.

AR / vision SDKs (for visual relocalization & anchors)

Vuforia Engine — image/object-based AR anchors, integration guides for HoloLens/Unity; useful for marker-based georeferencing and industrial AR overlays.
ARKit / ARCore — platform-native AR with relocalization, cloud anchors, and Visual-Inertial tracking; both can be components of a visual relocalization + wayfinding solution (ARKit Location Anchors / ARCore Cloud Anchors patterns). (Platform docs are the canonical source for implementation patterns.)

Open-source SLAM & mapping stacks (developer & research)

ORB-SLAM3 — state-of-the-art Visual/VIO SLAM supporting monocular/stereo/RGB-D and multi-map setups; excellent for research/prototyping.
RTAB-Map — open-source RGB-D / LIDAR friendly SLAM with long-term mapping and ROS integration suitable for robots and mobile mapping.
OpenVSLAM / OpenMVG / COLMAP — tools for offline SfM / multiview reconstruction and relocalization pipelines. (COLMAP is great for batch SfM; OpenVSLAM for mobile-friendly relocalization.)

6) Example implementation scenarios (quick recipes)

Prototype (phone-only, low-cost)

Use an Android phone (ARCore) or iPhone (ARKit) to record stabilized video + IMU.
Extract frames, run OpenVSLAM / ORB-SLAM3 to get camera poses.
Manually place 3–4 QR markers at known coordinates for map-to-world alignment.
Use visual relocalization on-device for AR wayfinding overlays. (Good for pilot apps.)

Enterprise (survey-grade, robust wayfinding)

Scan with NavVis VLX or Matterport Pro3 to get georegistered point clouds & panoramic imagery.
Import the point cloud into a GIS/BIM tool; align to building coordinate system with survey control points.
Use IndoorAtlas or Pointr for live positioning via smartphones, and Mapwize for route UX and kiosk integration.

7) Practical checklist & pitfalls to avoid

Checklist before capture: camera calibrated, IMU logging enabled, control points marked, capture path planned with loops.
Pitfalls: relying purely on monocular SLAM in long featureless corridors (drift), ignoring multi-floor transitions, not planning for privacy/anonymization, and skipping post-capture registration to a known coordinate frame (causes inconsistent maps).
Measure success: validate by surveying 5–10 check points and compute RMSE of transformed map vs surveyed coordinates. For indoor wayfinding, aim for sub-2m accuracy for casual navigation and sub-0.5m for AR-guided tasks.

8) Where to learn & next steps (resources)

Read Visual SLAM literature surveys for algorithmic tradeoffs (e.g., visual localization under GNSS-denied conditions).
Try ORB-SLAM3 on sample video to understand pose graphs and loop closures.

Product Comparison — NavVis, Matterport, IndoorAtlas, Pointr, Mapwize, ORB-SLAM3, and RTAB-Map

Unlock Smart Spaces with IndoorAtlas | Indoor Location-Based Solutions

Pointr – AI Mapping, Wayfinding, & Positioning

Create Interactive Maps Online | Mapize – Mapize

Pointr Maps

VSLAM: ORB-SLAM3 — Intel Embodied Intelligence SDK Intel Embodied Intelligence SDK documentation