Getting Started with PostGIS: Spatial Databases for GIS Professionals

If you’ve been managing spatial data in shapefiles, juggling coordinate systems across desktop tools, or hitting the ceiling on what your current GIS stack can handle at scale, PostGIS is the upgrade you’ve been waiting for. It turns PostgreSQL into a full-blown spatial database, giving you the power to query, analyze, and manipulate geographic data using plain SQL. No proprietary formats. No per-seat licensing. Just fast, reliable spatial analysis that lives inside your database.

This guide is for GIS professionals who are comfortable with spatial concepts but want a practical on-ramp to PostGIS. By the end, you’ll have a working installation, understand the core data model, and be writing real spatial queries.


Why PostGIS Belongs in Your GIS Stack

Most GIS workflows still rely on desktop tools for heavy lifting and file-based formats for storage. That works until it doesn’t. The moment you need to share data across teams, run analysis at scale, or integrate spatial queries into a web application, files and desktop software become a bottleneck.

PostGIS solves this by treating geometry as a first-class data type inside a relational database. Your spatial data lives alongside your attribute data. You query it with SQL. You index it. You join it. You version control your schema. And you do all of this with a stack that integrates naturally with the rest of your infrastructure.

Here’s what that looks like in practice:

  • Scale: PostGIS handles millions of features without breaking a sweat. Spatial indexes (GiST and SP-GiST) keep queries fast even on large datasets.
  • Interoperability: PostGIS speaks WKT, WKB, GeoJSON, KML, and GML. Getting data in and out is never the hard part.
  • Analysis power: Distance calculations, buffer operations, intersection tests, topology checks, raster analysis, and routing are all built in.
  • Open source: PostGIS is free, well-documented, and backed by a large community. The OSGeo ecosystem is enormous.

Installing PostGIS

On Ubuntu/Debian

sudo apt update
sudo apt install postgresql postgresql-contrib postgis

Once PostgreSQL is running, create a database and enable the extension:

CREATE DATABASE gis_db;
\c gis_db
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology; -- optional, for topology support

Verify the installation:

SELECT PostGIS_Full_Version();

You should see the PostGIS version string, GEOS version, and PROJ version. If all three show up, you’re good.

On macOS

The easiest path is Postgres.app, which bundles PostGIS:

  1. Download Postgres.app
  2. Add the CLI tools to your PATH
  3. Open a terminal and run:
psql -c "CREATE DATABASE gis_db;"
psql -d gis_db -c "CREATE EXTENSION postgis;"

On Windows

Use the EnterpriseDB installer at postgresql.org. During installation, select StackBuilder after setup completes and install the PostGIS bundle from the Spatial Extensions category.

Using Docker (Recommended for Development)

docker run --name postgis \
  -e POSTGRES_PASSWORD=gispassword \
  -e POSTGRES_DB=gis_db \
  -p 5432:5432 \
  -d postgis/postgis:15-3.3

Connect with psql or any PostgreSQL client. The PostGIS extension is pre-installed.


The PostGIS Data Model

Geometry vs. Geography

PostGIS offers two spatial data types:

geometry uses a flat Cartesian coordinate system. Operations are fast, and you have full control over the coordinate reference system (CRS). This is the right choice for most analytical work, especially when your data is projected.

geography uses a spheroidal model of the earth. Distance and area calculations automatically account for the curvature of the earth. This is ideal for global datasets or when you need accurate real-world measurements without manually projecting your data.

-- geometry column using Web Mercator (EPSG:3857)
SELECT ST_Buffer(geom, 500) FROM parks;

-- geography column, distance in meters on the sphere
SELECT ST_Distance(location::geography, ST_MakePoint(-73.98, 40.75)::geography) 
FROM restaurants;

SRID: Spatial Reference ID

Every spatial column in PostGIS is associated with an SRID, which maps to a row in the spatial_ref_sys table. This table ships with thousands of coordinate reference systems.

Common SRIDs you’ll use constantly:

SRIDCRSUse Case
4326WGS 84 (lat/lon)GPS coordinates, web maps
3857Web MercatorTile-based web maps
32618UTM Zone 18NAccurate measurements in eastern North America
27700British National GridUK datasets

To check which SRID a geometry uses:

SELECT ST_SRID(geom) FROM my_table LIMIT 1;

To reproject from one CRS to another:

SELECT ST_Transform(geom, 32618) FROM my_table WHERE ST_SRID(geom) = 4326;

Loading Spatial Data into PostGIS

Using shp2pgsql

shp2pgsql is the standard tool for importing shapefiles. It ships with PostGIS.

# Generate SQL from shapefile and pipe directly to the database
shp2pgsql -s 4326 -I roads.shp public.roads | psql -d gis_db

# Or write to a file first
shp2pgsql -s 4326 roads.shp public.roads > roads.sql
psql -d gis_db -f roads.sql

Key flags:

  • -s [srid] sets the SRID
  • -I creates a spatial index automatically (always use this)
  • -a appends to an existing table instead of creating a new one
  • -t 2D strips Z/M values if you only need 2D geometry

Using GDAL/OGR

ogr2ogr is more flexible and handles dozens of formats:

# Import GeoJSON
ogr2ogr -f "PostgreSQL" PG:"dbname=gis_db user=postgres" input.geojson -nln my_layer

# Import a GeoPackage layer
ogr2ogr -f "PostgreSQL" PG:"dbname=gis_db" data.gpkg layer_name -nln target_table

# Import with reprojection
ogr2ogr -f "PostgreSQL" PG:"dbname=gis_db" input.shp -t_srs EPSG:3857 -nln projected_roads

Creating Tables Manually

Sometimes you want to define the schema yourself:

CREATE TABLE parcels (
    id          SERIAL PRIMARY KEY,
    parcel_id   VARCHAR(20) UNIQUE NOT NULL,
    owner       TEXT,
    land_use    TEXT,
    area_sqm    NUMERIC,
    geom        GEOMETRY(POLYGON, 4326)
);

-- Always create a spatial index
CREATE INDEX parcels_geom_idx ON parcels USING GIST (geom);

Core Spatial Functions

PostGIS has over 800 functions. You won’t need most of them most of the time. Here are the ones you’ll use in almost every project.

Measurement Functions

-- Area of a polygon in square meters (using geography for accuracy)
SELECT parcel_id, ST_Area(geom::geography) AS area_sqm 
FROM parcels;

-- Distance between two points in meters
SELECT 
    a.name AS site_a,
    b.name AS site_b,
    ST_Distance(a.geom::geography, b.geom::geography) AS distance_m
FROM sites a, sites b
WHERE a.id != b.id;

-- Length of a linestring in kilometers
SELECT road_name, ST_Length(geom::geography) / 1000 AS length_km 
FROM roads;

Spatial Relationships

These return TRUE or FALSE and are the backbone of spatial joins:

-- Which parcels intersect a flood zone?
SELECT p.parcel_id 
FROM parcels p
JOIN flood_zones fz ON ST_Intersects(p.geom, fz.geom)
WHERE fz.zone_type = '100yr';

-- Which schools are within 500m of a highway?
SELECT s.name
FROM schools s
JOIN highways h ON ST_DWithin(s.geom::geography, h.geom::geography, 500);

-- Which points are strictly inside a polygon?
SELECT poi.name
FROM points_of_interest poi
JOIN neighborhoods n ON ST_Within(poi.geom, n.geom)
WHERE n.name = 'Downtown';

Geometry Construction

-- Buffer a point by 1km
SELECT name, ST_Buffer(geom::geography, 1000)::geometry AS buffer 
FROM hospitals;

-- Union of all parcels in a neighborhood
SELECT ST_Union(geom) AS neighborhood_outline
FROM parcels
WHERE neighborhood = 'Midtown';

-- Intersection of two layers
SELECT ST_Intersection(a.geom, b.geom) AS clipped_geom
FROM layer_a a
JOIN layer_b b ON ST_Intersects(a.geom, b.geom);

-- Centroid of a polygon
SELECT parcel_id, ST_Centroid(geom) AS center
FROM parcels;

-- Convex hull around a set of points
SELECT ST_ConvexHull(ST_Collect(geom)) AS hull
FROM sample_points;

Format Conversion

-- Export geometry as GeoJSON for web apps
SELECT ST_AsGeoJSON(geom) FROM parks;

-- Export as WKT
SELECT ST_AsText(geom) FROM roads LIMIT 5;

-- Import from WKT
INSERT INTO points (name, geom)
VALUES ('HQ', ST_GeomFromText('POINT(-73.9857 40.7484)', 4326));

Spatial Indexing: The Part Most People Skip

A spatial index is the difference between a query that runs in 40 milliseconds and one that runs in 40 seconds. PostGIS uses GiST (Generalized Search Tree) indexes, which index the bounding box of each geometry.

Always create a spatial index on any geometry column you’ll query:

CREATE INDEX table_geom_idx ON my_table USING GIST (geom);

After bulk data loads, update statistics so the query planner uses the index correctly:

VACUUM ANALYZE my_table;

To verify your queries are using the index, use EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM parcels
WHERE ST_Intersects(geom, ST_MakeEnvelope(-74.01, 40.70, -73.97, 40.75, 4326));

Look for “Index Scan using” in the output. If you see “Seq Scan” on a large table, something is wrong.

The Bounding Box Operators

PostGIS spatial queries often use two-step filtering: first an index scan on bounding boxes, then an exact geometry test. The && operator tests bounding box overlap and is index-accelerated:

-- Fast: uses spatial index for bounding box pre-filter
SELECT * FROM parcels WHERE geom && ST_MakeEnvelope(-74.01, 40.70, -73.97, 40.75, 4326);

-- Exact intersection test (also uses index via ST_Intersects internals)
SELECT * FROM parcels 
WHERE ST_Intersects(geom, ST_MakeEnvelope(-74.01, 40.70, -73.97, 40.75, 4326));

A Practical Workflow: Site Suitability Analysis

Let’s put it together with a real use case. You want to find parcels that are:

  • Zoned for commercial use
  • Within 1km of a transit stop
  • Not in a flood zone
  • Larger than 2,000 square meters
WITH transit_buffers AS (
    -- Create 1km buffers around transit stops
    SELECT ST_Union(ST_Buffer(geom::geography, 1000)::geometry) AS buffer
    FROM transit_stops
),
flood_union AS (
    -- Union all flood zones into one geometry
    SELECT ST_Union(geom) AS flood_geom
    FROM flood_zones
)
SELECT 
    p.parcel_id,
    p.owner,
    ST_Area(p.geom::geography) AS area_sqm,
    ST_AsGeoJSON(p.geom) AS geometry
FROM parcels p
CROSS JOIN transit_buffers tb
CROSS JOIN flood_union fu
WHERE 
    p.land_use = 'commercial'
    AND ST_Intersects(p.geom, tb.buffer)
    AND NOT ST_Intersects(p.geom, fu.flood_geom)
    AND ST_Area(p.geom::geography) > 2000
ORDER BY area_sqm DESC;

This is the kind of multi-criteria overlay analysis that takes 10 minutes to set up in a desktop GIS tool and runs in under a second in PostGIS on a properly indexed dataset.


Connecting PostGIS to Your GIS Tools

QGIS

QGIS has native PostGIS support. In the Browser panel, right-click “PostgreSQL” and add a new connection. Once connected, you can drag layers directly into your map canvas, edit features in place, and run DB Manager queries.

For production data, always connect to PostGIS views or materialized views rather than editing base tables directly from QGIS.

GeoServer

GeoServer connects to PostGIS as a datastore and serves your spatial data as WMS, WFS, and WCS. Add a new PostGIS datastore in the GeoServer admin panel with your connection details, then publish individual tables or views as layers.

Use PostGIS views with pre-calculated fields (like ST_AsGeoJSON or ST_Transform) to control exactly what GeoServer serves without transformation overhead at request time.

Python (psycopg2 + GeoPandas)

import geopandas as gpd
from sqlalchemy import create_engine

engine = create_engine("postgresql://user:password@localhost:5432/gis_db")

# Load a PostGIS layer into a GeoDataFrame
gdf = gpd.read_postgis(
    "SELECT parcel_id, owner, geom FROM parcels WHERE land_use = 'residential'",
    con=engine,
    geom_col="geom",
    crs="EPSG:4326"
)

# Write a GeoDataFrame back to PostGIS
gdf.to_postgis("parcels_updated", engine, if_exists="replace", index=False)

Common Pitfalls and How to Avoid Them

Mixing SRIDs in the same query. PostGIS will throw an error if you try to run a spatial operation on two geometries with different SRIDs. Always reproject to a common CRS before comparing.

-- Wrong: will error if geom columns have different SRIDs
SELECT ST_Intersects(a.geom, b.geom) FROM table_a a, table_b b;

-- Right: reproject b to match a's SRID
SELECT ST_Intersects(a.geom, ST_Transform(b.geom, ST_SRID(a.geom))) 
FROM table_a a, table_b b;

Using geography when you should use geometry. The geography type is more accurate for global data but slower. For data that’s already in a local projected CRS, use geometry and measure in the projection’s native units.

Forgetting to create a spatial index. If you load data with shp2pgsql and use the -I flag, you’re covered. If you create tables manually or bulk-insert data, remember to create the index explicitly.

Using ST_Distance in a WHERE clause without ST_DWithin. This is a common performance anti-pattern. ST_Distance computes exact distances and doesn’t benefit from spatial indexes. ST_DWithin uses the index to filter candidates first.

-- Slow: calculates exact distance for every row
WHERE ST_Distance(a.geom::geography, b.geom::geography) < 500;

-- Fast: uses spatial index
WHERE ST_DWithin(a.geom::geography, b.geom::geography, 500);

What to Learn Next

Once you’re comfortable with the fundamentals, these are the areas worth exploring next:

Topology via postgis_topology gives you topologically correct data models, useful for network analysis and boundary sharing between adjacent features.

Raster support via postgis_raster (formerly raster2pgsql) lets you store and query raster data alongside your vectors. Zonal statistics, raster-vector overlays, and map algebra all run in SQL.

pgRouting extends PostGIS with network routing algorithms. Shortest path, driving distance, and turn restrictions are all supported.

Partitioning and performance at scale requires understanding PostgreSQL table partitioning, parallel query execution, and how to tune PostGIS for read-heavy analytical workloads.

Change tracking and history patterns using temporal tables or triggers let you maintain a full audit log of spatial edits, which is essential for regulated datasets.


Resources Worth Bookmarking


PostGIS rewards the time you put into it. The learning curve coming from desktop GIS tools is real, but the payoff is a workflow that scales, integrates cleanly with modern infrastructure, and puts serious analytical horsepower directly in your database. Start with a dataset you know well, run some spatial queries, and let the speed of a properly indexed spatial database do the rest of the convincing.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *