More Google Maps Semantic Location History Fun with BigQuery

In the last post, I covered the basics of how to transform and load your Google Maps data, including Google’s parsed history data, into BigQuery.

In this post, I’ll share a few interesting queries I’ve written to look at semantic history. These should work just fine for you too if you follow the steps in the previous post.

Visited U.S. States

This query will show when you first visited a state. Google Maps needs to have placed you at a specific location you visited for it to appear in this list, so if you just drive through a state it may not register. You could get that data with your raw location history, and I’ll probably write that up later, but this assures that you at least “went” somewhere in the state.

WITH statedate AS 
(SELECT sl.StateAbbrev state, 
        sh.timestampstart tstart, 
        sh.timestampend tend
FROM location.semantic_history sh
INNER JOIN location.statelookup sl
ON sh.address LIKE(CONCAT('%, ', sl.StateAbbrev, ' %')))
SELECT state, 
       COUNT(*) locations, 
       MIN(tstart) firstVisit, 
       MAX(tend) maxVisit
FROM statedate
GROUP BY state
ORDER BY 2

This query requires a supplementary table, which I named statelookup. I just grabbed the query from https://gist.github.com/esfand/9443427 and modified it for BigQuery (delete identity column and change the VARCHAR/CHAR columns to STRING.) Tons of the public data sets have states available so you could join to one of those too.

This will return a list of every state you’ve been to within your location history, ordered by the first time you went somewhere in that state. You can also tack on an ANY_VALUE(name) to pick a random place in that state that you went to. (It won’t actually be random, because repeated visits increase the weight, so statistically you’ll probably get a familiar place.)

Fill-Up Frequency

This query will show (almost) every time you went to a gas station, although it can be extended to any other Google Places category.

SELECT name, duration, date, pointstart 
FROM location.semantic_history
WHERE 'gas_station' IN UNNEST(locationTypes)
ORDER BY date

I noticed that it missed a bunch of rural places in which I have definitely purchased gas, so you might consider an OR filter with common names of gas stations.

OR REGEXP_CONTAINS(ARRAY_TO_STRING(locationTypes, ' '), '(phillips|valero|chevron|shell|sheetz|texaco)')

Add whatever gas station names are common in your area. Some may also be missing because you didn’t remain at the place long enough for Google to register it; I’ve seen that issue with other locations too. I think some of these show up as waypoint locations, but these queries don’t account for that.

Any Other Kind of Location

You can take the query above and replace ‘gas_station’ with a bunch of categories. Here are some examples:

  • airport
  • gym
  • grocery_or_supermarket
  • electronics_store
  • restaurant
  • movie_theater

The full list can be found here. You actually can use most of the types it says you can’t use in the type filter, like point_of_interest or food, because we obtained these locations from ID mapping, not from a search.

Google is pretty inclusive for these tags, so Target, Staples, and even a piano store I went to come up as electronics_store.

Favorite Restaurants

As a variation on this, you can see which restaurants you dine out at the most, and on average how often you visit them. Using placeid as the GROUP BY here ensures that different locations of the same chain are separated out.

DECLARE startdate DATE DEFAULT '2018-01-01';
DECLARE enddate DATE DEFAULT '2019-12-31';

SELECT ANY_VALUE(name),
       COUNT(*) visits,
       ROUND(DATE_DIFF(enddate, startdate, DAY) / COUNT(*),0) frequency
FROM location.semantic_history
WHERE 'restaurant' IN UNNEST(locationTypes)
AND date BETWEEN startdate AND enddate
GROUP BY placeid
ORDER BY 2 DESC

Lastly, replacing the visits column with a duration check will show you your favorite places ordered by how many hours you spent there. There probably won’t be a huge difference unless you frequently disappear into your favorite bars for hours on end.

ROUND(SUM(duration)/3600, 2) totalTime,

Wrap Up

Other than the state abbreviation table, these queries don’t use anything besides the semantic data upload we already had. We didn’t even use any GIS queries to plot maps or intersections, which would give us all kinds of power to join other kinds of data.

For instance, you could join to weather data to see your experience with outdoor temperatures over time, by your personal location. If you missed a heat wave at home, your personal weather history would show that. You could even plot how close you were to every reported hurricane. Lots of fun data mining yet to come!