{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from pandas.plotting import register_matplotlib_converters\n",
"\n",
"register_matplotlib_converters()\n",
"\n",
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Data Access\n",
"\n",
"### servers, servers everywhere and not a bit to flip\n",
"\n",
"![](https://imgs.xkcd.com/comics/digital_data.png)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## whoami\n",
"\n",
"`ocefpaf` (Filipe Fernandes)\n",
"\n",
"- Physical Oceanographer\n",
"- Data Plumber\n",
"- Code Janitor\n",
"- CI babysitter\n",
"- Amazon-Dash-Button for conda-forge\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## My day job: IOOS\n",
"\n",
"![](https://raw.githubusercontent.com/ocefpaf/2018-SciPy-talk/gh-pages/images/IOOS-RAs.jpg)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Big or small we need data!\n",
"\n",
"- There are various sources: variety of servers, APIs, and web services. Just to\n",
" list a few: OPeNDAP, ERDDAP, THREDDS, ftp, http(s), S3, LAS, etc.\n",
"\n",
"![](https://imgs.xkcd.com/comics/data_pipeline.png)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Feedback\n",
"\n",
"As you suffer from my tutorial on Data Access I'd love that you keep the following questions in mind so we can improve the tutorials. Should this tutorial focus on?\n",
"\n",
"- Leveraging metadata for finding data and exploring data?\n",
"- Software packages to access, slice, and dice data?\n",
"- Data sources?\n",
"- None of the above, we don't need this tutorial!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Web Services/Type of servers\n",
"\n",
"| Data Type | Web Service | Response |\n",
"| -------------------------------------- | ----------- | ----------- |\n",
"| In-situ data
(buoys, stations, etc) | OGC SOS | XML/CSV |\n",
"| Gridded data (models, satellite) | OPeNDAP | Binary |\n",
"| Raster Images | OGC WMS | GeoTIFF/PNG |\n",
"| ERDDAP | Restful API | \\* |\n",
"\n",
"- Your imagination is the limit!\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## What are we going to see in this tutorial?\n",
"\n",
"Browse and access data from:\n",
"\n",
"1. ERDDAP\n",
"2. OPeNDAP\n",
"3. ~~SOS~~\n",
"4. WMS\n",
"5. CSW and CKAN\\*\n",
"\n",
"\n",
"\\* There are many examples on CSW in [the IOOS code lab] jupyter-book (https://ioos.github.io/ioos_code_lab/content/intro.html)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 1) ERDDAP\n",
"\n",
"### Learning objectives:\n",
"\n",
"- Explore an ERDDAP server with the python interface (erddapy);\n",
"- Find a data for a time/region of interest;\n",
"- Download the data with a familiar format and create some plots.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## What is ERDDAP?\n",
"\n",
"- Flexible outputs: .html table, ESRI .asc and .csv, .csvp, Google Earth .kml,\n",
" OPeNDAP binary, .mat, .nc, ODV .txt, .tsv, .json, and .xhtml\n",
"- RESTful API to access the data\n",
"- Standardize dates and time in the results\n",
"- Server-side searching and slicing\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"from erddapy import ERDDAP\n",
"\n",
"server = \"http://erddap.dataexplorer.oceanobservatories.org/erddap\"\n",
"\n",
"e = ERDDAP(server=server, protocol=\"tabledap\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### What services are available in the server?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv(\n",
" e.get_search_url(\n",
" response=\"csv\",\n",
" search_for=\"all\",\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"print(\n",
" f'We have {len(set(df[\"tabledap\"].dropna()))} '\n",
" f'tabledap, {len(set(df[\"griddap\"].dropna()))} '\n",
" f'griddap, and {len(set(df[\"wms\"].dropna()))} wms.'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Let's query all the datasets that have the _standard_name_ of _sea_water_practical_salinity_.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"url = e.get_categorize_url(\n",
" categorize_by=\"standard_name\",\n",
" value=\"sea_water_practical_salinity\",\n",
" response=\"csv\",\n",
")\n",
"\n",
"df = pd.read_csv(url)\n",
"dataset_ids = df.loc[~df[\"tabledap\"].isnull(), \"Dataset ID\"].tolist()\n",
"\n",
"dataset_ids_list = \"\\n\".join(dataset_ids)\n",
"print(f\"Found {len(dataset_ids)} datasets\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Let us narrow our search to deployments that within a lon/lat/time extent.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"from ipyleaflet import FullScreenControl, Map, Rectangle\n",
"\n",
"min_lon, max_lon = -72, -69\n",
"min_lat, max_lat = 38, 41\n",
"\n",
"rectangle = Rectangle(bounds=((min_lat, min_lon), (max_lat, max_lon)))\n",
"\n",
"m = Map(\n",
" center=((min_lat + max_lat) / 2, (min_lon + max_lon) / 2),\n",
" zoom=6,\n",
")\n",
"\n",
"m.add_layer(rectangle)\n",
"m.add_control(FullScreenControl())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"m"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"kw = {\n",
" \"min_time\": \"2016-07-10T00:00:00Z\",\n",
" \"max_time\": \"2017-02-10T00:00:00Z\",\n",
" \"min_lon\": min_lon,\n",
" \"max_lon\": max_lon,\n",
" \"min_lat\": min_lat,\n",
" \"max_lat\": max_lat,\n",
" \"standard_name\": \"sea_water_practical_salinity\",\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"search_url = e.get_search_url(response=\"csv\", **kw)\n",
"search = pd.read_csv(search_url)\n",
"dataset_ids = search[\"Dataset ID\"].values\n",
"\n",
"dataset_ids_list = \"\\n\".join(dataset_ids)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"print(f\"Found {len(dataset_ids)} Datasets:\\n{dataset_ids_list}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"sal = \"sea_water_practical_salinity_profiler_depth_enabled\"\n",
"temp = \"sea_water_temperature_profiler_depth_enabled\"\n",
"\n",
"e.dataset_id = dataset_ids[0]\n",
"\n",
"e.variables = [\n",
" \"z\",\n",
" \"latitude\",\n",
" \"longitude\",\n",
" sal,\n",
" temp,\n",
" \"time\",\n",
"]\n",
"\n",
"url = e.get_download_url()\n",
"print(url)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"df = e.to_pandas(index_col=\"time (UTC)\", parse_dates=True).dropna()\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Exercise: experiment with the `e.to_xarray()` method. Think about why/where use\n",
"one or the other?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"subset = df.loc[df[\"z (m)\"] == df[\"z (m)\"].min()]\n",
"\n",
"fig, ax = plt.subplots(figsize=(13, 3.75))\n",
"subset[f\"{sal} (1e-3)\"][\"2016\"].dropna().plot(ax=ax)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import gsw\n",
"import numpy as np\n",
"\n",
"\n",
"def plot_ts():\n",
" fig, ax = plt.subplots(figsize=(5, 5))\n",
"\n",
" s = np.linspace(0, 42, 100)\n",
" t = np.linspace(-2, 40, 100)\n",
"\n",
" s, t = np.meshgrid(s, t)\n",
" sigma = gsw.sigma0(s, t)\n",
"\n",
" cnt = np.arange(-7, 40, 5)\n",
" cs = ax.contour(s, t, sigma, colors=\"gray\", levels=cnt)\n",
" ax.clabel(cs, fontsize=9, inline=1, fmt=\"%2i\")\n",
"\n",
" ax.set_xlabel(\"Salinity [g kg$^{-1}$]\")\n",
" ax.set_ylabel(\"Temperature [$^{\\circ}$C]\")\n",
" ax.scatter(df[f\"{sal} (1e-3)\"], df[f\"{temp} (degree_Celsius)\"], s=10, alpha=0.25)\n",
"\n",
" ax.grid(True)\n",
" ax.axis([20, 40, 4, 26])\n",
" return fig, ax"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"fig, ax = plot_ts()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"responses = [\"mat\", \"json\", \"ncCF\", \"ncCFHeader\"]\n",
"\n",
"for response in responses:\n",
" print(f\"{e.get_download_url(response=response)}\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Exercise: explore the web interface for the OOI server URL:\n",
"\n",
"http://erddap.dataexplorer.oceanobservatories.org/erddap/index.html\n",
"\n",
"or the IOOS glider dac:\n",
"\n",
"https://gliders.ioos.us/erddap\n",
"\n",
"and find a dataset of interested, download a format that you are familiar with\n",
"and plot it (using the web interface or the Python, your choice).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2) OPeNDAP\n",
"\n",
"### Learning objectives:\n",
"\n",
"- Open model data from a THREDDS server via OPeNDAP with `xarray`;\n",
"- Discuss the differences with an `erddapy` request;\n",
"- Plot it using `xarray` interface.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import cf_xarray\n",
"import xarray as xr\n",
"\n",
"url = (\n",
" \"http://tds.marine.rutgers.edu/thredds/dodsC/roms/doppio/2017_da/avg/Averages_Best\"\n",
")\n",
"ds = xr.open_dataset(url)\n",
"ds.cf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"variable = \"sea_water_potential_temperature\"\n",
"time = \"2022-08-10\"\n",
"surface = -1\n",
"\n",
"selection = ds.cf[variable].sel(time=\"2022-08-10\").isel(s_rho=surface)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import cartopy.crs as ccrs\n",
"import matplotlib.pyplot as plt\n",
"\n",
"fig, ax = plt.subplots(\n",
" figsize=(6, 6),\n",
" subplot_kw={\"projection\": ccrs.PlateCarree()},\n",
")\n",
"\n",
"selection.plot(\n",
" ax=ax,\n",
" x=\"lon_rho\",\n",
" y=\"lat_rho\",\n",
")\n",
"\n",
"ax.coastlines()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 3) ~~SOS~~\n",
"\n",
"### Learning objectives:\n",
"\n",
"- Use searvey to obtain CO-OPS data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"import shapely\n",
"from searvey import coops\n",
"\n",
"secoora = shapely.geometry.box(-87.4, 24.25, -74.7, 36.7)\n",
"df = coops.coops_stations_within_region(secoora)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.loc[df[\"name\"] == \"Duck\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime, timedelta\n",
"\n",
"from searvey.coops import COOPS_Station\n",
"\n",
"station = COOPS_Station(\"Duck\")\n",
"\n",
"ds = station.product(\n",
" \"water_level\",\n",
" start_date=datetime.today() - timedelta(15),\n",
" end_date=datetime.today(),\n",
")\n",
"\n",
"ds[\"v\"].plot()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 4) WMS\n",
"\n",
"### Learning objectives:\n",
"\n",
"- Add a WMS layer to an interactive map. (\"Hurricane viz widget.\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"from ipyleaflet import FullScreenControl, Map, WMSLayer, basemaps\n",
"from ipywidgets import SelectionSlider\n",
"from traitlets import Unicode\n",
"\n",
"time_options = [\n",
" \"13:00\",\n",
" \"13:30\",\n",
" \"14:00\",\n",
" \"14:30\",\n",
" \"15:00\",\n",
" \"15:30\",\n",
" \"16:00\",\n",
" \"16:30\",\n",
" \"17:00\",\n",
" \"17:30\",\n",
" \"18:00\",\n",
" \"18:30\",\n",
"]\n",
"\n",
"slider = SelectionSlider(description=\"Time:\", options=time_options)\n",
"\n",
"\n",
"def update_wms(change):\n",
" time_wms.time = \"2020-07-25T{}\".format(slider.value)\n",
"\n",
"\n",
"slider.observe(update_wms, \"value\")\n",
"\n",
"\n",
"class TimeWMSLayer(WMSLayer):\n",
" time = Unicode(\"\").tag(sync=True, o=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"time_wms = TimeWMSLayer(\n",
" url=\"https://mesonet.agron.iastate.edu/cgi-bin/wms/nexrad/n0r-t.cgi?\",\n",
" layers=\"nexrad-n0r-wmst\",\n",
" time=\"2020-07-25T13:00:00Z\",\n",
" format=\"image/png\",\n",
" transparent=True,\n",
" attribution=\"Weather data © 2012 IEM Nexrad\",\n",
")\n",
"m = Map(basemap=basemaps.CartoDB.Positron, center=(30, -88), zoom=5)\n",
"m.add_layer(time_wms)\n",
"m.add_control(FullScreenControl())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"m"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"slider"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 5) Catalog Service Web (CSW)\n",
"\n",
"### Is there a canonical source for data?\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"![](https://i.kym-cdn.com/photos/images/newsfeed/001/093/557/142.gif)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Well, kind of... The closet thing is are data catalogs like the [IOOS CSW catalog](https://data.ioos.us/) or [pangeo-forge](https://pangeo-forge.readthedocs.io/en/latest/)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Catalog Service for the Web (CSW)\n",
"\n",
"- A single source to find endpoints\n",
"- Nice python interface:
`owslib.csw.CatalogueServiceWeb`\n",
"- Advanced filtering:
`owslib.fes`\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"![](https://raw.githubusercontent.com/ocefpaf/2018-SciPy-talk/gh-pages/images/IOOS.svg)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## For more complex examples on how to find data in the catalog please check the IOOS code gallery:\n",
"\n",
"[https://ioos.github.io/ioos_code_lab/content/intro.html](https://ioos.github.io/ioos_code_lab/content/intro.html)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Where to find data?\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Curated list of ERDDAP servers:\n",
"https://github.com/IrishMarineInstitute/awesome-erddap\n",
"\n",
"Environmental Data Service (EDS) model viewer: https://eds.ioos.us\n",
"\n",
"Exploring THREDDS servers: https://unidata.github.io/siphon/latest\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Extras: how does this all work?\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"## Standards!\n",
"\n",
"![](https://imgs.xkcd.com/comics/standards.png)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Bad example\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import cftime\n",
"import nc_time_axis\n",
"from netCDF4 import Dataset\n",
"\n",
"url = \"http://goosbrasil.org:8080/pirata/B19s34w.nc\"\n",
"nc = Dataset(url)\n",
"\n",
"temp = nc[\"temperature\"][:]\n",
"times = nc[\"time\"]\n",
"temp[temp <= -9999] = np.NaN\n",
"t = cftime.num2date(times[:], times.units, calendar=times.calendar)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mask = (t >= datetime(2008, 1, 1)) & (t <= datetime(2008, 12, 31))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"fig, ax = plt.subplots()\n",
"ax.plot(t[mask], temp[:, 0][mask], \".\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Good example\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"import xarray as xr\n",
"\n",
"ds = xr.open_dataset(url)\n",
"temp = ds[\"temperature\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"temp.sel(depth_t=1.0, time=\"2008\").plot()"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"livereveal": {
"auto_select": "none",
"footer": " ",
"header": "",
"start_slideshow_at": "selected"
}
},
"nbformat": 4,
"nbformat_minor": 4
}