postgis_helpers.PgSQL module

Wrap up PostgreSQL and PostGIS into a convenient class.

Examples

Create a database and import a shapefile:

>>> import postgis_helpers as pGIS
>>> db = pGIS.PostgreSQL("my_database_name")
>>> db.create()
>>> db.import_geodata("bike_lanes", "http://url.to.shapefile")
>>> bike_gdf = db.query_as_geo_df("select * from bike_lanes")
class postgis_helpers.PgSQL.PostgreSQL(working_db: str, un: str = 'postgres', pw: str = 'password1', host: str = 'localhost', port: int = 5432, sslmode: str = None, super_db: str = 'postgres', super_un: str = 'postgres', super_pw: str = 'password2', active_schema: str = 'public', verbosity: str = 'full', data_inbox: pathlib.Path = PosixPath('/home/docs/sql_data_io/inbox'), data_outbox: pathlib.Path = PosixPath('/home/docs/sql_data_io/outbox'))[source]

Bases: object

This class encapsulates interactions with a PostgreSQL database. It leverages psycopg2, sqlalchemy, and geoalchemy2 as needed. It stores connection information that includes:

  • database name
  • username & password
  • host & port
  • superusername & password
  • the SQL cluster’s master database
  • verbosity level, which controls how much gets printed out
all_databases_on_cluster_as_list() → list[source]

Get a list of all databases on this SQL cluster.

Returns:List of all databases on the cluster
Return type:list
all_spatial_tables_as_dict(schema: str = None) → dict[source]

Get a dictionary of all spatial tables in the database. Return value is formatted as: {table_name: epsg}

Returns:Dictionary with spatial table names as keys and EPSG codes as values.
Return type:dict
all_tables_as_list(schema: str = None) → list[source]

Get a list of all tables in the database. Optionally filter to a schema

Parameters:schema (str) – name of the schema to filter by
Returns:List of tables in the database
Return type:list
connection_details() → dict[source]

Return a dictionary that can be used to instantiate other database connections on the same SQL cluster.

Returns:Dictionary with all of the SQL cluster connection info
Return type:dict
db_create() → None[source]

Create this database if it doesn’t exist yet

db_delete() → None[source]

Delete this database (if it exists)

db_export_pgdump_file(*args, **kwargs)[source]
db_load_pgdump_file(*args, **kwargs)[source]
execute(query: str, autocommit: bool = False)[source]

Execute a query for a persistent result in the database. Use autocommit=True when creating and deleting databases.

Parameters:
  • query (str) – any valid SQL query string
  • autocommit (bool, optional) – flag that will execute against the super db/user, defaults to False
exists() → bool[source]

Does this database exist yet? Returns True or False

Returns:True or False if the database exists on the cluster
Return type:bool
export_all_shapefiles(output_folder: pathlib.Path) → None[source]

Save all spatial tables in the database to shapefile.

Parameters:output_folder (Path) – Folder path to write to
export_shapefile(*args, **kwargs)[source]
import_csv(*args, **kwargs)[source]
import_dataframe(dataframe: pandas.core.frame.DataFrame, table_name: str, if_exists: str = 'fail', schema: str = None) → None[source]

Import an in-memory pandas.DataFrame to the SQL database.

Enforce clean column names (without spaces, caps, or weird symbols).

Parameters:
  • dataframe (pd.DataFrame) – dataframe with data you want to save
  • table_name (str) – name of the table that will get created
  • if_exists (str, optional) – pandas argument to handle overwriting data, defaults to “fail”
import_geodata(table_name: str, data_path: pathlib.Path, src_epsg: Union[int, bool] = False, if_exists: str = 'fail', schema: str = None)[source]

Load geographic data into a geodataframe, then save to SQL.

Parameters:
  • table_name (str) – Name of the table you want to create
  • data_path (Path) – Path to the data. Anything accepted by Geopandas works here.
  • src_epsg (Union[int, bool], optional) – Manually declare the source EPSG if needed, defaults to False
  • if_exists (str, optional) – pandas argument to handle overwriting data, defaults to “replace”
import_geodataframe(gdf: geopandas.geodataframe.GeoDataFrame, table_name: str, src_epsg: Union[int, bool] = False, if_exists: str = 'replace', schema: str = None, uid_col: str = 'uid')[source]

Import an in-memory geopandas.GeoDataFrame to the SQL database.

Parameters:
  • gdf (gpd.GeoDataFrame) – geodataframe with data you want to save
  • table_name (str) – name of the table that will get created
  • src_epsg (Union[int, bool], optional) – The source EPSG code can be passed as an integer. By default this function will try to read the EPSG code directly, but some spatial data is funky and requires that you explicitly declare its projection. Defaults to False
  • if_exists (str, optional) – pandas argument to handle overwriting data, defaults to “replace”
make_geotable_from_query(query: str, new_table_name: str, geom_type: str, epsg: int, schema: str = None, uid_col: str = 'uid') → None[source]

TODO: docstring

make_hexagon_overlay(new_table_name: str, table_to_cover: str, desired_epsg: int, hexagon_size: float, schema: str = None) → None[source]

Create a new spatial hexagon grid covering another spatial table. EPSG must be specified for the hexagons, as well as the size in square KM.

Parameters:
  • new_table_name (str) – Name of the new table to create
  • table_to_cover (str) – Name of the existing table you want to cover
  • desired_epsg (int) – integer for EPSG you want the hexagons to be in
  • hexagon_size (float) – Size of the hexagons, 1 = 1 square KM
pgsql2shp(table_name: str, output_folder: pathlib.Path = None, extra_args: list = None) → pathlib.Path[source]

Use the command-line pgsql2shp utility.

TODO: check if pgsql2shp exists and exit early if not TODO: check if schema is supported

extra_args is a list of tuples, passed in as [(flag1, val1), (flag2, val2)]

For example: extra_args = [("-g", "custom_geom_column"), ("-b", "")]

For more info, see http://www.postgis.net/docs/manual-1.3/ch04.html#id436110

Parameters:
  • table_name (str) – name of the spatial table to dump
  • output_folder (Path, optional) – output folder, defaults to DATA_OUTBOX
  • extra_args (list, optional) – [description], defaults to None
Returns:

path to the newly created shapefile

Return type:

Path

query_as_df(query: str, super_uri: bool = False) → pandas.core.frame.DataFrame[source]

Query the database and get the result as a pandas.DataFrame

Parameters:
  • query (str) – any valid SQL query string
  • super_uri (bool, optional) – flag that will execute against the super db/user, defaults to False
Returns:

dataframe with the query result

Return type:

pd.DataFrame

query_as_geo_df(query: str, geom_col: str = 'geom') → geopandas.geodataframe.GeoDataFrame[source]

Query the database and get the result as a geopandas.GeoDataFrame

Parameters:
  • query (str) – any valid SQL query string
  • geom_col (str) – name of the column that holds the geometry, defaults to ‘geom’
Returns:

geodataframe with the query result

Return type:

gpd.GeoDataFrame

query_as_list(query: str, super_uri: bool = False) → list[source]

Query the database and get the result as a list

Parameters:
  • query (str) – any valid SQL query string
  • super_uri (bool, optional) – flag that will execute against the super db/user, defaults to False
Returns:

list with each item being a row from the query result

Return type:

list

query_as_single_item(query: str, super_uri: bool = False)[source]

Query the database and get the result as a SINGLETON. For when you want to transform [(True,)] into True

Parameters:
  • query (str) – any valid SQL query string
  • super_uri (bool, optional) – flag that will execute against the super db/user, defaults to False
Returns:

result from the query

Return type:

singleton

shp2pgsql(table_name: str, src_shapefile: pathlib.Path, new_epsg: int = None) → str[source]

TODO: Docstring TODO: add schema option

Parameters:
  • table_name (str) – [description]
  • src_shapefile (Path) – [description]
  • new_epsg (int, optional) – [description], defaults to None
Returns:

[description]

Return type:

str

table_add_or_nullify_column(table_name: str, column_name: str, column_type: str, schema: str = None) → None[source]

Add a new column to a table. Overwrite to NULL if it already exists.

Parameters:
  • table_name (str) – Name of the table
  • column_name (str) – Name of the new column
  • column_type (str) – Data type of the column. Must be valid in PgSQL
table_add_spatial_index(table_name: str, schema: str = None) → None[source]

Add a spatial index to the ‘geom’ column in the table.

Parameters:table_name (str) – Name of the table to make the index on
table_add_uid_column(table_name: str, schema: str = None, uid_col: str = 'uid') → None[source]

Add a serial primary key column named ‘uid’ to the table.

Parameters:table_name (str) – Name of the table to add a uid column to
table_columns_as_list(table_name: str, schema: str = None) → list[source]

Get a list of all columns in a table.

Parameters:table_name (str) – Name of the table
Returns:List of all columns in a table
Return type:list
table_delete(table_name: str, schema: str = None) → None[source]

Delete the table, cascade.

Parameters:table_name (str) – Name of the table you want to delete.
table_reproject_spatial_data(table_name: str, old_epsg: Union[int, str], new_epsg: Union[int, str], geom_type: str, schema: str = None) → None[source]

Transform spatial data from one EPSG into another EPSG.

This can also be used with the same old and new EPSG. This is useful when making a new geotable, as this SQL code will update the table’s entry in the geometry_columns table.

Parameters:
  • table_name (str) – name of the table
  • old_epsg (Union[int, str]) – Current EPSG of the data
  • new_epsg (Union[int, str]) – Desired new EPSG for the data
  • geom_type (str) – PostGIS-valid name of the geometry you’re transforming
table_spatialize_points(src_table: str, x_lon_col: str, y_lat_col: str, epsg: int, if_exists: str = 'replace', new_table: str = None, schema: str = None) → geopandas.geodataframe.GeoDataFrame[source]
timer()[source]

Decorator function that will record & report on how long it takes for another function to execute.

Parameters:func (function) – the function to be timed
transfer_data_to_another_db(table_name: str, other_postgresql_db, schema: str = None) → None[source]

Copy data from one SQL database to another.

Parameters:
  • table_name (str) – Name of the table to copy
  • other_postgresql_db (PostgreSQL) – PostgreSQL() object for target database
uri(super_uri: bool = False) → str[source]

Create a connection string URI for this database.

Parameters:super_uri (bool, optional) – Flag that will provide access to cluster root if True, defaults to False
Returns:Connection string URI for PostgreSQL
Return type:str
postgis_helpers.PgSQL.connect_via_uri(uri: str, verbosity: str = 'full', super_db: str = 'postgres', super_user: str = None, super_pw: str = None)[source]

Create a PostgreSQL object from a URI. Note that this process must make assumptions about the super-user of the database. Proceed with caution.

Parameters:
  • uri (str) – database connection string
  • verbosity (str, optional) – level of printout desired, defaults to “full”
  • super_db (str, optional) – name of the SQL cluster master DB, defaults to “postgres”
Returns:

PostgreSQL() object

Return type:

PostgreSQL