Using the Socrata API¶
This tutorial explains the usage of the Socrata API in Data Retriever. It includes both the CLI (Command Line Interface) commands as well as the Python interface for the same.
Note
Currently Data Retriever only supports tabular Socrata datasets (tabular Socrata datasets which are of type map are not supported).
Command Line Interface¶
Listing the Socrata Datasets¶
The retriever ls -s
command displays the Socrata datasets which contain the provided keywords in their title.
$ retriever ls -h
(gives listing options)
usage: retriever ls [-h] [-l L [L ...]] [-k K [K ...]] [-v V [V ...]]
[-s S [S ...]]
optional arguments:
-h, --help show this help message and exit
-l L [L ...] search datasets with specific license(s)
-k K [K ...] search datasets with keyword(s)
-v V [V ...] verbose list of specified dataset(s)
-s S [S ...] search socrata datasets with name(s)
Example
This example will list the names of the socrata datasets which contain the word fishing
.
$ retriever ls -s fishing
Autocomplete suggestions : Total 34 results
[?] Select the dataset name: Recommended Fishing Rivers And Streams
> Recommended Fishing Rivers And Streams
Recommended Fishing Rivers And Streams API
Iowa Fishing Report
Recommended Fishing Rivers, Streams, Lakes and Ponds Map
Public Fishing Rights Parking Areas Map
Fishing Atlas
Cook County - Fishing Lakes
[ARCHIVED] Fishing License Sellers
Public Fishing Rights Parking Areas
Recommended Fishing Lakes and Ponds Map
Recommended Fishing Lakes and Ponds
Delaware Fishing Licenses and Trout Stamps
Cook County - Fishing Lakes - KML
Here the user is prompted to select a dataset name. After selecting a dataset, the command returns some information related to the dataset selected.
Let’s select the Public Fishing Rights Parking Areas
dataset, after pressing Enter, the command returns
some information regarding the dataset selected.
Autocomplete suggestions : Total 34 results
[?] Select the dataset name: Public Fishing Rights Parking Areas
Iowa Fishing Report
Recommended Fishing Rivers, Streams, Lakes and Ponds Map
Fishing Atlas
Public Fishing Rights Parking Areas Map
[ARCHIVED] Fishing License Sellers
Cook County - Fishing Lakes
> Public Fishing Rights Parking Areas
Recommended Fishing Lakes and Ponds Map
Recommended Fishing Lakes and Ponds
Delaware Fishing Licenses and Trout Stamps
Cook County - Fishing Lakes - KML
General Fishing and Salmon Licence Sales
Hunting and Fishing License Sellers
Dataset Information of Public Fishing Rights Parking Areas: Total 1 results
1. Public Fishing Rights Parking Areas
ID : 9vef-6whi
Type : {'dataset': 'tabular'}
Description : The New York State Department of Environmental Con...
Domain : data.ny.gov
Link : https://data.ny.gov/Recreation/Public-Fishing-Rights-Parking-Areas/9vef-6whi
Downloading the Socrata Datasets¶
The retriever download socrata-<socrata id>
command downloads the Socrata dataset which matches the provided socrata id
.
Example
From the example in Listing the Socrata Datasets
section, we selected the Public Fishing Rights Parking Areas dataset.
Since the dataset is of type tabular
, we can download it. The information received in the previous example contains the socrata id
.
We use this socrata id
to download the dataset.
$ retriever download socrata-9vef-6whi
=> Installing socrata-9vef-6whi
Downloading 9vef-6whi.csv: 10.0B [00:03, 2.90B/s]
Done!
The downloaded raw data files are stored in the raw_data
directory in the ~/.retriever
directory.
Installing the Socrata Datasets¶
The retriever install <engine> socrata-<socrata id>
command downloads the raw data, creates the script for it and then installs
the Socrata dataset which matches the provided socrata id
into the provided engine
.
Example
From the example in Listing the Socrata Datasets
section, we selected the Public Fishing Rights Parking Areas dataset.
Since the dataset is of type tabular
, we can install it. The information received in that section contains the socrata id
.
We use this socrata id
to install the dataset.
$ retriever install postgres socrata-9vef-6whi
=> Installing socrata-9vef-6whi
Downloading 9vef-6whi.csv: 10.0B [00:03, 2.69B/s]
Processing... 9vef-6whi.csv
Successfully wrote scripts to /home/user/.retriever/socrata-scripts/9vef_6whi.csv.json
Updating script name to socrata-9vef-6whi.json
Updating the contents of script socrata-9vef-6whi
Successfully updated socrata_9vef_6whi.json
Creating database socrata_9vef_6whi...
Bulk insert on .. socrata_9vef_6whi.socrata_9vef_6whi
Done!
The script created for the Socrata dataset is stored in the socrata-scripts
directory in the ~/.retriever
directory.
Python Interface in Data Retriever¶
Searching Socrata Datasets¶
The function socrata_autocomplete_search
takes a list of strings as input and returns a list of strings which are the autocompleted names.
>>> import retriever as rt
>>> names = rt.socrata_autocomplete_search(['clinic', '2015', '2016'])
>>> for name in names:
... print(name)
...
2016 & 2015 Clinic Quality Comparisons for Clinics with Five or More Service Providers
2015 - 2016 Clinical Quality Comparison (>=5 Providers) by Geography
2016 & 2015 Clinic Quality Comparisons for Clinics with Fewer than Five Service Providers
Socrata Dataset Info by Dataset Name¶
The input argument for the function socrata_dataset_info
should be a string (valid dataset name returned by socrata_autocomplete_search
).
It returns a list of dicts, because there are multiple datasets on socrata with same name (e.g. Building Permits
).
>>> import retriever as rt
>>> resource = rt.socrata_dataset_info('2016 & 2015 Clinic Quality Comparisons for Clinics with Five or More Service Providers')
>>> from pprint import pprint
>>> pprint(resource)
[{'description': 'This data set includes comparative information for clinics '
'with five or more physicians for medical claims in 2015 - '
'2016. \r\n'
'\r\n'
'This data set was calculated by the Utah Department of '
'Health, Office of Healthcare Statistics (OHCS) using Utah’s '
'All Payer Claims Database (APCD).',
'domain': 'opendata.utah.gov',
'id': '35s3-nmpm',
'link': 'https://opendata.utah.gov/Health/2016-2015-Clinic-Quality-Comparisons-for-Clinics-w/35s3-nmpm',
'name': '2016 & 2015 Clinic Quality Comparisons for Clinics with Five or '
'More Service Providers',
'type': {'dataset': 'tabular'}}]
Finding Socrata Dataset by Socrata ID¶
The input argument of the function find_socrata_dataset_by_id
should be the four-by-four socrata dataset identifier (e.g. 35s3-nmpm
).
The function returns a dict which contains metadata about the dataset.
>>> import retriever as rt
>>> from pprint import pprint
>>> resource = rt.find_socrata_dataset_by_id('35s3-nmpm')
>>> pprint(resource)
{'datatype': 'tabular',
'description': 'This data set includes comparative information for clinics '
'with five or more physicians for medical claims in 2015 - '
'2016. \r\n'
'\r\n'
'This data set was calculated by the Utah Department of '
'Health, Office of Healthcare Statistics (OHCS) using Utah’s '
'All Payer Claims Database (APCD).',
'domain': 'opendata.utah.gov',
'homepage': 'https://opendata.utah.gov/Health/2016-2015-Clinic-Quality-Comparisons-for-Clinics-w/35s3-nmpm',
'id': '35s3-nmpm',
'keywords': ['socrata'],
'name': '2016 & 2015 Clinic Quality Comparisons for Clinics with Five or More '
'Service Providers'}
Downloading a Socrata Dataset¶
import retriever as rt
rt.download('socrata-35s3-nmpm')
Installing a Socrata Dataset¶
import retriever as rt
rt.install_postgres('socrata-35s3-nmpm')
Note
For downloading or installing the Socrata Datasets, the dataset should follow the syntax given.
The dataset name should be socrata-<socrata id>
. The socrata id
should be the four-by-four
socrata dataset identifier (e.g. 35s3-nmpm
).
- Example:
- Correct:
socrata-35s3-nmpm
- Incorrect:
socrata35s3-nmpm
,socrata35s3nmpm
- Correct: