maptile-qc

About a 2 years ago, a client for the company I work for, needed map tiles for an application we were building for them. One requirement was that we cut the tiles from ArcGIS mapdocs that they created, so that they could control the order of their overlapping aerial images. We just needed to cut the tiles and move them to an S3 bucket so that they’d be available for our application. For this we decided to use ArcMap with Arc2Earth which can be configured to push tiles to S3 as they are cut.

This seemed to work great, however, we noticed well after the tiles were cut that a small number of tiles (less that 0.001%) were missing, while other tiles were solid white when they should have shown aerial imagery. We couldn’t be certain of the cause; data issues, software problems, maybe network connectivity, and there was no clear pattern in the invalid or missing tiles.

So, I starting writing some Python to check for rogue tiles. Because of the very large number of tiles, we defined a set of sample points and used those coordinates to retrieve tiles from S3. If a tile was not missing, we checked whether could be opened by Python’s PIL library, and collected descriptive stats about the distribution of pixel values. The results were written to SQL files, loaded into Postgres/PostGIS and viewed in QGIS. This allowed use to identify particular regions where there were tile issues and then re-cut tiles just in those areas. The process ended up saving quite a bit of time and provided a lot more confidence in our tile cache.

Last weekend I decided to write something similar – but this time using NodeJS rather than Python. I’ve worked on a couple of Express projects, but I wanted to get some experience writing Node CLI tools. So, I wrote a very simple version of maptile-qc and pushed it up to github. It’s a very early version, but I hope to have something useful completed soon.

A Quick and Simple PostGIS/Flask App on OpenShift

For the past couple years much of the GIS and programming I’ve done has been applied to biospheric modeling using raster data. I enjoy my work, but sometimes wish I could spend a little more time working with PostGIS and developing web applications. Yesterday I mentioned this to my friend Steve CP (OpenShift tech evangelist) and he kindly offered to stop by and help get me started with an app on OpenShift. The process was a piece of cake, but there were quite a few steps to remember. Here are my notes:

1. Open an OpenShift account and create a Python-2.6 app. https://openshift.redhat.com/app/account/new

2. Create a local git repository for your app. You’ll need to install git locally if it’s not installed already. After you create an app on the OpenShift site, OpenShift will give you the following command line to create a local repository for your app.

cyrus@gazelle:~$ git clone ssh://your_application_ssh_key@npm-cyrus.rhcloud.com/~/git/npm.git/

This will create your application directory at the root of your home directory, so you may want to move it to a more logical location like so:

mv npm Projects/OpenShift/npm

3. Add the postgresql-8.4 cartridge.

cyrus@gazelle:~$ rhc app cartridge add -a npm -c postgresql-8.4

Once your cartidge is added you will see your connection infomation (user, password, etc). You can save this information in a local text file for convenience, but you don’t need to. We will retreive it later.

4. Log into the OpenShift instance for your app and setup postgis.

cyrus@gazelle:~$ ssh your_application_ssh_key@npm-cyrus.rhcloud.com

Once you’ve successfully logged in run the following 3 psql commands:

[npm-cyrus.rhcloud.com ~]\> psql npm -c "create language plpgsql;"
[npm-cyrus.rhcloud.com ~]\> psql -d npm -f /usr/share/pgsql/contrib/postgis-64.sql
[npm-cyrus.rhcloud.com ~]\> psql -d npm -f /usr/share/pgsql/contrib/spatial_ref_sys.sql

PostGIS is now ready so you can hit Ctrl + D to close the connection to your OpenShift instance.

5. Setup the flask and psycopg installations. First navigate to your local app directory and get the openshift flask example (more details here: https://openshift.redhat.com/community/blogs/rest-web-services-with-python-mongodb-and-spatial-data-in-the-cloud-part-2)

cyrus@gazelle:~$ cd Projects/OpenShift/npm/
cyrus@gazelle:~$ git remote add upstream -m master git://github.com/openshift/flask-example.git
cyrus@gazelle:~$ git pull -s recursive -X theirs upstream master
cyrus@gazelle:~$ git push

You’ll now find a setup.py file in your app directory. Pushing this up to your app installs Flask, but we need to edit it so that it will install psycopg as well. Open the setup.py file and in the “install_requires” list add ‘psycopg2>=2.4.4’ like so:

from setuptools import setup

setup(name='YourAppName',
      version='1.0',
      description='OpenShift App',
      author='Your Name',
      author_email='example@example.com',
      url='http://www.python.org/sigs/distutils-sig/',
      install_requires=['Flask>=0.7.2', 'psycopg2>=2.4.4'],
      )

Now save your file, commit and push:

cyrus@gazelle:~$ git commit -am "adding psycopg"
cyrus@gazelle:~$ git push

6. Set up a test database. You may be content to do this using psql commands in your OpenShift instance, but I prefer to manage my database locally using pgadmin. To do this ssh back into your OpenShift app and run the following command to get your database connection information:

cyrus@gazelle:~$ ssh your_application_ssh_key@npm-cyrus.rhcloud.com
[npm-cyrus.rhcloud.com ~]\> env | grep DB

Ctrl+D to close your connection and then open a port forward to your app like so:

cyrus@gazelle:~$ rhc-port-forward -a npm

Now, open pgadmin and create a database connection:

I have named my database ‘npm’ and created a table called ‘test’ with columns ‘name’ and ‘value’ and a serial column for the primary key. Then I added a geometry column in the SQL Editor and added a couple of points with the following:

SELECT AddGeometryColumn('test', 'geom', 4326, 'POINT', 2);
BEGIN;
INSERT INTO test (geom, name, value)
VALUES (GeomFromText('POINT(-122.18064449999997 37.3456834)', 4326),'Acterra', 0);
INSERT INTO test (geom, name, value)
VALUES (GeomFromText('POINT(-122.16763550000002 37.7298953)', 4326),'Evergreen', 1);
COMMIT;

If you would like all of the SQL for creating and populating this table, you can find it at the bottom of this post.

7. Prepare our flask app. Open “myflaskapp.py” in the wsgi folder in your local project directory and edit it like so:

from flask import Flask
import psycopg2
import sys
import os

app = Flask(__name__)
app.config['PROPAGATE_EXCEPTIONS'] = True # Prevents flask from swallowing error messages

@app.route("/")
def hello():

    ip = os.environ['OPENSHIFT_DB_HOST'] #Get an OpenShift environmental variable
    pw = os.environ['OPENSHIFT_DB_PASSWORD']
    conn_string = "host=" + ip + " dbname='npm' user='admin' password=" + pw +""
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    cursor.execute("SELECT id, name, value, AsText(geom) FROM test")
    records = cursor.fetchall() # retrieve the records from the database
    return str(records)

if __name__ == "__main__":
    app.run()

If your table schema is different than mine, be sure those differences are reflected in the select statement passed in the cursor.execute() method.

Navigate back to your local project directory, commit your changes, and push.

cd Projects/Openshift/npm
cyrus@gazelle:~/Projects/OpenShift/npm$ git commit -am "Modified my flask app"
cyrus@gazelle:~/Projects/OpenShift/npm$ git push

Now, your app should be available at your project url. In this case:
http://npm-cyrus.rhcloud.com

..and the result should look something like this:

————————————————————————————–
sql code for creating a postgis table:

-- Table: test

-- DROP TABLE test;

CREATE TABLE test
(
"name" text,
"value" integer DEFAULT 0,
id serial NOT NULL,
geom geometry,
CONSTRAINT idpk PRIMARY KEY (id),
CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'POINT'::text OR geom IS NULL),
CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test OWNER TO "admin";

SELECT AddGeometryColumn('test', 'geom', 4326, 'POINT', 2);

BEGIN;
INSERT INTO test (geom, name, value)
VALUES (GeomFromText('POINT(-122.18064449999997 37.3456834)', 4326),'Acterra', 0);
INSERT INTO test (geom, name, value)
VALUES (GeomFromText('POINT(-122.16763550000002 37.7298953)', 4326),'Evergreen', 1);
COMMIT;

Creating Windrose Plots in R

Recently I had to plot some wind data and found a couple of packages to produce windrose plots. My first choice was Windrose in the circular package. Windrose was easy to use, and I really liked the appearance of the plots.  The only drawback was some difficulty getting the axis labels formatted the way I wanted. So, I decided to try Rosavent.  Rosavent was perfect because the axis labels were exactly what I needed, and it produced a nice legend on the plot.  

Using rosavent is just a little tricky if you’re starting with a table of wind speed and direction because it takes a data frame with the frequency of wind speed for each direction class (Windrose does this work for you). This means you need to create class ranges for both wind direction and wind speed and calculate the number of times each wind speed class occurs in each direction class.

If you have a data frame with columns for wind speed and direction, there are just a few easy steps needed to get your data in the right format for Rosavent. To start we load Climatol, Reshape, and our data.

library(climatol)
library(reshape)
winds <- read.table("/input_dir/winds.csv", sep=",",  header=TRUE)

Next we need to convert our data into a frequency table with columns for our wind direction classes and rows for our wind speed classes. In the first seq argument below I have divided 360 degrees into 16 22.5 degree classes.  In the second I have divided wind speed into 10 classes each representing 1 m/sec.

freq_speed_table <- t(table(cut(winds$wind_direction, seq(0, to=360,
by=22.5)), cut(winds$wind_speed, seq(0, to=10, by=1))))

The data is now organized the way we need it, but rosavent takes a data frame – not a table, so we convert our data into a data frame.

freq_speed_df = data.frame(freq_speed_table, row.names = NULL,
check.rows = FALSE, check.names = TRUE, stringsAsFactors =
default.stringsAsFactors())

To get our dataframe back into the structure of our table, we use cast from the reshape package:

freq_speed_df_cast = cast(freq_speed_df, Var1 ~ Var2)

Finally, we can plot our data with rosavent:

png('/output_dir/example_windrose_plot.png')
rosavent(freq_speed_df_cast, 3, 5, ang=-2*pi/16, flab = 2, main="Example Plot", key = TRUE, margen = c(0, 0, 2, 0), uni = "Wind Speed m/s")
dev.off()

..and this produces a nice windrose plot:

gdal reclassify

A few weeks ago I saw somebody was looking on gis.stackexchange for a command line tool to reclassify the ESA’s Globcover land cover dataset (http://gis.stackexchange.com/a/27382/4701).  I often need to reclassify raster data, but in most cases I need to do it as a step in a larger python script, or I use the QGIS raster calculator. Inspired by the gis.stackexchange post, I decided a standalone tool might be handy for writing bash scripts.

Last weekend I posted the first version on github: https://github.com/chiatt/gdal_reclassify. It still needs more work – the only supported output format is .tif.  It’s also a wee bit slow.  Reclassifying a Landsat band is okay, taking only about 5-10 seconds on my laptop, but a really large raster like Globcover (300m pixel size for the globe = 129600 x 55800) takes just under 45 minutes (ugh!). The good news is that it can handle a file a large as Globcover.  It also gives you some control over the output datatype based on your output class values.  If your output classes are 8-bit unsigned, your output file will be 8-bit unsigned.  Likewise, if your output classes are 32-bit float, your output file will be 32-bit float – regardless of the datatype of your input raster.  Another useful feature is that the user can set a default value for pixels that do not meet any reclass conditions, and the default value can be set as the nodata value.

 

Resuscitating MapRabbit

A few years ago I installed WordPress, but never got around to posting anything. That was a very busy time for me, and after a couple of years of neglecting to keep WordPress updated, the site got hacked. The hacker had replaced my blog with an interesting image of a flaming skull.  It was the most interesting thing that had ever been posted to the site, so I decided that was a sign I didn’t have enough time to bother with blogging and removed WordPress.

Recently, however, a friend of mine suggested I write about one of my projects, and he was surprised to learn that I didn’t maintain a blog.  Having completed my graduate degree last May, I have a bit more free time, so I’ve decided to give MapRabbit another try.  There are many, many GIS blogs to read with more interesting and useful information than anything I expect to post here, but at the very least MapRabbit will serve as a place to keep a few notes about what I’m working on.