retriever.engines package¶
Submodules¶
retriever.engines.csvengine module¶
- class retriever.engines.csvengine.engine¶
Bases:
Engine
Engine instance for writing data to a CSV file.
- abbreviation = 'csv'¶
- auto_column_number = 0¶
- create_db()¶
Override create_db since there is no database just a CSV file
- create_table()¶
Create the table by creating an empty csv file
- datatypes = {'auto': 'INTEGER', 'bigint': 'INTEGER', 'bool': 'INTEGER', 'char': 'TEXT', 'decimal': 'REAL', 'double': 'REAL', 'int': 'INTEGER'}¶
- disconnect()¶
Close the last file in the dataset
- disconnect_files()¶
Close each file after being written
- execute(statement, commit=True)¶
Write a line to the output file
- executemany(statement, values, commit=True)¶
Write a line to the output file
- format_insert_value(value, datatype)¶
Formats a value for an insert statement
- get_connection()¶
Gets the db connection.
- insert_limit = 1000¶
- insert_statement(values)¶
Returns a comma delimited row of values
- name = 'CSV'¶
- required_opts = [('table_name', 'Format of table name', '{db}_{table}.csv'), ('data_dir', 'Install directory', '.')]¶
- table_exists(dbname, tablename)¶
Check to see if the data file currently exists
- table_names = []¶
- to_csv(sort=True, path=None, select_columns=None)¶
Export sorted version of CSV file
retriever.engines.download_only module¶
- retriever.engines.download_only.dummy_method(self, *args, **kwargs)¶
Dummy method template to help with replacing Engine functions
- class retriever.engines.download_only.engine¶
Bases:
Engine
Engine instance for writing data to a CSV file.
- abbreviation = 'download'¶
- all_files = {}¶
- auto_create_table(table, url=None, filename=None, pk=None, make=True)¶
Download the file if it doesn’t exist
- create_db(*args, **kwargs)¶
Dummy method template to help with replacing Engine functions
- create_table(*args, **kwargs)¶
Dummy method template to help with replacing Engine functions
- final_cleanup()¶
Copies downloaded files to desired directory
- find_file(filename)¶
Checks for the given file and adds it to the list of all files
- get_connection()¶
Gets the db connection.
- insert_data_from_file(*args, **kwargs)¶
Dummy method template to help with replacing Engine functions
- insert_data_from_url(url)¶
Insert data from a web resource
- name = 'Download Only'¶
- register_files(filenames)¶
Identify a list of files to be moved by the download
When downloading archives with multiple files the engine needs to be informed of all of the file names so that it can move them.
- required_opts = [('path', 'File path to copy data files', './'), ('sub_dir', 'Install directory', '')]¶
- table_exists(dbname, tablename)¶
Checks if the file to be downloaded already exists
retriever.engines.jsonengine module¶
Engine for writing data to a JSON file
- class retriever.engines.jsonengine.engine¶
Bases:
Engine
Engine instance for writing data to a JSON file.
- abbreviation = 'json'¶
- auto_column_number = 0¶
- create_db()¶
Override create_db since there is no database just a JSON file
- create_table()¶
Create the table by creating an empty json file
- datatypes = {'auto': 'INTEGER', 'bigint': 'INTEGER', 'bool': 'INTEGER', 'char': 'TEXT', 'decimal': 'REAL', 'double': 'REAL', 'int': 'INTEGER'}¶
- disconnect()¶
Close out the JSON with a n]} and close the file.
Close all the file objects that have been created Re-write the files stripping off the last comma and then close with a n]}.
- execute(statement, commit=True)¶
Write a line to the output file
- executemany(statement, values, commit=True)¶
Write a line to the output file
- format_insert_value(value, datatype)¶
Formats a value for an insert statement
- get_connection()¶
Gets the db connection.
- insert_limit = 1000¶
- insert_statement(values)¶
Return SQL statement to insert a set of values.
- name = 'JSON'¶
- required_opts = [('table_name', 'Format of table name', '{db}_{table}.json'), ('data_dir', 'Install directory', '.')]¶
- table_exists(dbname, tablename)¶
Check to see if the data file currently exists
- table_names = []¶
- to_csv(sort=True, path=None, select_columns=None)¶
Export table from json engine to CSV file
retriever.engines.msaccess module¶
- class retriever.engines.msaccess.engine¶
Bases:
Engine
Engine instance for Microsoft Access.
- abbreviation = 'msaccess'¶
- convert_data_type(datatype)¶
MS Access can’t handle complex Decimal types
- create_db()¶
MS Access doesn’t create databases.
- datatypes = {'auto': 'AUTOINCREMENT', 'bigint': 'INTEGER', 'bool': 'BIT', 'char': 'VARCHAR', 'decimal': 'NUMERIC', 'double': 'NUMERIC', 'int': 'INTEGER'}¶
- drop_statement(object_type, object_name)¶
Returns a drop table or database SQL statement.
- get_connection()¶
Gets the db connection.
- insert_data_from_file(filename)¶
Perform a bulk insert.
- insert_limit = 1000¶
- instructions = 'Create a database in Microsoft Access, close Access.\nThen select your database file using this dialog.'¶
- name = 'Microsoft Access'¶
- placeholder = '?'¶
- required_opts = [('file', 'Enter the filename of your Access database', 'access.mdb', 'Access databases (*.mdb, *.accdb)|*.mdb;*.accdb'), ('table_name', 'Format of table name', '[{db} {table}]'), ('data_dir', 'Install directory', '.')]¶
retriever.engines.mysql module¶
- class retriever.engines.mysql.engine¶
Bases:
Engine
Engine instance for MySQL.
- abbreviation = 'mysql'¶
- create_db_statement()¶
Return SQL statement to create a database.
- datatypes = {'auto': 'INT(5) NOT NULL AUTO_INCREMENT', 'bigint': 'BIGINT', 'bool': 'BOOL', 'char': ('TEXT', 'VARCHAR'), 'decimal': 'DECIMAL', 'double': 'DOUBLE', 'int': 'INT'}¶
- get_connection()¶
Get db connection. PyMySQL has changed the default encoding from latin1 to utf8mb4. https://github.com/PyMySQL/PyMySQL/pull/692/files For PyMySQL to work well on CI infrastructure, connect with the preferred charset
- insert_data_from_file(filename)¶
Call MySQL “LOAD DATA LOCAL INFILE” statement to perform a bulk insert.
- insert_limit = 1000¶
- lookup_encoding()¶
Convert well known encoding to MySQL syntax MySQL has a unique way of representing the encoding. For example, latin-1 becomes latin1 in MySQL. Please update the encoding lookup table if the required encoding is not present.
- max_int = 4294967295¶
- name = 'MySQL'¶
- placeholder = '%s'¶
- required_opts = [('user', 'Enter your MySQL username', 'root'), ('password', 'Enter your password', ''), ('host', 'Enter your MySQL host', 'localhost'), ('port', 'Enter your MySQL port', 3306), ('database_name', 'Format of database name', '{db}'), ('table_name', 'Format of table name', '{db}.{table}')]¶
- set_engine_encoding()¶
Set MySQL database encoding to match data encoding
- table_exists(dbname, tablename)¶
Check to see if the given table exists.
retriever.engines.postgres module¶
- class retriever.engines.postgres.engine¶
Bases:
Engine
Engine instance for PostgreSQL.
- abbreviation = 'postgres'¶
- auto_create_table(table, url=None, filename=None, pk=None, make=True)¶
Create a table automatically.
Overwrites the main Engine class. Identifies the type of table to create. For a Raster or vector (Gis) dataset, create the table from the contents downloaded from the url or from the contents in the filename. Otherwise, use the Engine function for a tabular table.
- create_db()¶
Create Engine database.
- create_db_statement()¶
In PostgreSQL, the equivalent of a SQL database is a schema.
- create_table()¶
Create a table and commit.
PostgreSQL needs to commit operations individually. Enable PostGis extensions if a script has a non tabular table.
- datatypes = {'auto': 'serial', 'bigint': 'bigint', 'bool': 'boolean', 'char': 'varchar', 'decimal': 'decimal', 'double': 'double precision', 'int': 'integer'}¶
- db_encoding = 'Latin1'¶
- drop_statement(object_type, object_name)¶
In PostgreSQL, the equivalent of a SQL database is a schema.
- format_insert_value(value, datatype)¶
Format value for an insert statement.
- get_connection()¶
Gets the db connection.
Please update the encoding lookup table if the required encoding is not present.
- insert_data_from_file(filename)¶
Use PostgreSQL’s “COPY FROM” statement to perform a bulk insert.
Current postgres engine bulk only supports comma delimiter
- insert_limit = 1000¶
- insert_raster(path=None, srid=4326)¶
Import Raster into Postgis Table Uses raster2pgsql -Y -M -d -I -s <SRID> <PATH> <SCHEMA>.<DBTABLE> | psql -d <DATABASE> The sql processed by raster2pgsql is run as psql -U postgres -d <gisdb> -f <elev>.sql -Y uses COPY to insert data, -M VACUUM table, -d Drops the table, recreates insert raster data
- insert_statement(values)¶
Return SQL statement to insert a set of values.
- insert_vector(path=None, srid=4326)¶
Import Vector into Postgis Table
– Enable PostGIS (includes raster) CREATE EXTENSION postgis;
– Enable Topology CREATE EXTENSION postgis_topology;
– fuzzy matching needed for Tiger CREATE EXTENSION fuzzystrmatch;
– Enable US Tiger Geocoder CREATE EXTENSION postgis_tiger_geocoder; Uses shp2pgsql -I -s <SRID> <PATH/TO/SHAPEFILE> <SCHEMA>.<DBTABLE> | psql -U postgres -d <DBNAME>>
The sql processed by shp2pgsql is run as psql -U postgres -d <DBNAME>> shp2pgsql -c -D -s 4269 -i -I
- max_int = 2147483647¶
- name = 'PostgreSQL'¶
- placeholder = '%s'¶
- required_opts = [('user', 'Enter your PostgreSQL username', 'postgres'), ('password', 'Enter your password', ''), ('host', 'Enter your PostgreSQL host', 'localhost'), ('port', 'Enter your PostgreSQL port', 5432), ('database', 'Enter your PostgreSQL database name', 'postgres'), ('database_name', 'Format of schema name', '{db}'), ('table_name', 'Format of table name', '{db}.{table}')]¶
- spatial_support = True¶
- supported_raster(path, ext=None)¶
Return the supported Gis raster files from the path
Update the extensions after testing if a given raster type is supported by raster2pgsql.
retriever.engines.sqlite module¶
- class retriever.engines.sqlite.engine¶
Bases:
Engine
Engine instance for SQLite.
- abbreviation = 'sqlite'¶
- create_db()¶
Don’t create database for SQLite
SQLite doesn’t create databases. Each database is a file and needs a separate connection. This overloads`create_db` to do nothing in this case.
- datatypes = {'auto': ('INTEGER', 'AUTOINCREMENT'), 'bigint': 'INTEGER', 'bool': 'INTEGER', 'char': 'TEXT', 'decimal': 'REAL', 'double': 'REAL', 'int': 'INTEGER'}¶
- fetch_tables(dataset, table_names)¶
Return sqlite dataset as list of pandas dataframe.
- get_bulk_insert_statement()¶
Get insert statement for bulk inserts
This places ?’s instead of the actual values so that executemany() can operate as designed
- get_connection()¶
Get db connection.
- insert_data_from_file(filename)¶
Perform a high speed bulk insert
Checks to see if a given file can be bulk inserted, and if so loads it in chunks and inserts those chunks into the database using executemany.
- insert_limit = 1000¶
- name = 'SQLite'¶
- placeholder = '?'¶
- required_opts = [('file', 'Enter the filename of your SQLite database', 'sqlite.db'), ('table_name', 'Format of table name', '{db}_{table}'), ('data_dir', 'Install directory', '.')]¶
retriever.engines.xmlengine module¶
- class retriever.engines.xmlengine.engine¶
Bases:
Engine
Engine instance for writing data to a XML file.
- abbreviation = 'xml'¶
- auto_column_number = 0¶
- create_db()¶
Override create_db since there is no database just an XML file.
- create_table()¶
Create the table by creating an empty XML file.
- datatypes = {'auto': 'INTEGER', 'bigint': 'INTEGER', 'bool': 'INTEGER', 'char': 'TEXT', 'decimal': 'REAL', 'double': 'REAL', 'int': 'INTEGER'}¶
- disconnect()¶
Close out the xml files
Close all the file objects that have been created Re-write the files stripping off the last comma and then close with a closing tag)
- execute(statement, commit=True)¶
Write a line to the output file.
- executemany(statement, values, commit=True)¶
Write a line to the output file.
- format_insert_value(value, datatype)¶
Format value for an insert statement.
- get_connection()¶
Get db connection.
- insert_limit = 1000¶
- insert_statement(values)¶
Create the insert statement.
Wrap each data value with column values(key) using _format_single_row <key> value </key>.
- name = 'XML'¶
- required_opts = [('table_name', 'Format of table name', '{db}_{table}.xml'), ('data_dir', 'Install directory', '.')]¶
- table_names = []¶
- to_csv(sort=True, path=None, select_columns=None)¶
Export table from xml engine to CSV file.
- retriever.engines.xmlengine.format_single_row(keys, line_data)¶
Create an xml string from the keys and line_data values.