Start Analyzing your own CGM data in 5 Easy Steps

A Beginner Friendly Tutorial

In our previous blog post, we presented some visualizations of CGM data that was gracefully provided by Henrik Berggren, the founder of Steady and Type 1 since 20 years.

But, what’s even more interesting is of course to analyze your own data!

The goal of this article is to provide a follow along guide to everyone using a CGM to get started tinkering their own data, visualizing it and produce plots like these. Plots that can be useful extrapolate new insights about your lifestyle.

This a rather long and detailed post. To make it easy, all of the steps are described in detail to provide a thorough understanding of what we’re doing. The main segments (feel free to jump around to the ones that interest you) are:


What’s CGM and why should I care?

Continuous Glucose Monitor (CGM) sensors are wearable devices that measure blood glucose concentration at frequent intervals throughout the day, usually every 1–5 minutes. These devices have revolutionized the data available to diabetes patients and caregivers compared to finger pricking, the status quo of glucose measuring today. If you want a deeper background on the impact of CGMs, check out this post: The wearable that changed my life.

This new stream of glucose data enables a bunch of new applications like real-time predictive alerts for high/low blood sugar and closed-loop systems where insulin is delivered based on the current glucose levels. It’s also a gold mine for tracking change over time, behavioral patterns and, in short, data science!


Prerequisites

Skills

Some programming experience will be handy but even without it, you will be able to follow along. Most of the process here is a matter of copy-and-paste. Should any issues come up, don’t hesitate to write in the comments or send me an email.

Software

So what do you need? Amazingly enough just a browser and a Google account. We’ll be making use of Google Colaboratory which mean we do not need to install any software on our computer, how convenient!

Data

Of course, we also need some data to play with. I’ll be using a fake, generated, data set so that we can share it freely. Note, the data is a .csv in the format that comes from Dexcom which is what this implementation is designed for, but all sensor types are able to fit this guide with minor adjustments. With that out of the way, let’s get started!


Setup — Quick version

  1. View code used in this post here, Click “open in Colab” then “File” and “Save to Drive”.
  2. Export your Dexcom data following these 2 steps and upload the .csv file to the same folder as your code in Google Drive.
  3. Open the Colab file and change the filename (second cell) to the name of your CGM data file.
  4. Click “File” then “Run all”
  5. See all plots and visualization in this post appear before your eyes!

That’s is! If you’re more curious about what all this means and how to read all the plots, keep reading.

Setup — Detailed version

Make a new folder in your Google Drive, upload your CGM data into it. Then make a Google Colaboratory file by hitting “New → More → Colaboratory”.

If you haven’t used it before, you need to press “Connect more apps” and search for it. Now open the Colaboratory file.

For the first lines in your Colab file, enter the following

then click the “Play button” to the left of the cell in order to execute. This prompts you to authorize the Colab file to access your Google Drive files.

With our new access privileges, we can now read the CGM data we placed in our folder. We’ll do this by using the Pandas library like this

import pandas as pd
df = pd.read_csv('drive/My Drive/my-cgm-exploration/Example_CGM_data.csv')
df.head(20)

The head command previews the 20 first rows of our newly imported data frame which should be similar to this one (cut down to 5 for readability)

We notice that the first 10 rows lack time stamps and hold information like name, device info, etc. We’re not interested in those so let’s trim them out of our data frame by

Looking at our preview the data now looks more consistent, great! Although, we’re not going to use some of these columns, let’s pick the ones we’re interested in, namely blood sugar and what time it was measured. Also, let’s rename those long clunky names to something easier to type out

Alright, this seems more manageable. Now we’ll do one last thing before actually starting to inspect the data, most times with time series such as this we want to index the samples by, you guessed it, time. Pandas enable this with a few lines of code, make sure to switch the time zone ‘US/Pacific’ to the one which applies to you, list of time zones.

Okay, now we have successfully imported all the data, selected the data we’re interested in and done some basic formatting to make out lives easier.

We can now continue to quality assure the data then, at last, derive some insights from it!


Quality Assurance of the Data

If we’re going to produce some trustworthy insights from our data we need to make sure the data itself can be trusted. Measurement errors, missing values or similar anomalies can skew our results and cause trouble down the line so we better make sure we’ve dealt with them beforehand.

To keep things simple we’re not going to wrangle the data in any way here, just observe any anomalies and quirks with the data that might be good to have in mind later on.

Let’s check for a common issue, missing values. First, we’ll check at what frequency glucose sample values are to be expected

From this description, it’s clear that we expect a measurement every 5 minutes. However, some anomalies seems to occur as at least one interval is only three seconds and a period of missing data of a little more than 3 days are also present.

Repeated measurements are an issue that has a tendency to occur when syncing the device. Let’s take a closer look at those repeated measurements

We can tell that some measurements that are only seconds apart have glucose values that differentiate almost 15 mg/dl which is a lot more than is to be expected. We notice about 10 data points, in a 90-day period, have this issue.

Now, let’s review the frequency of samples that’s far apart, where we seem to be missing a lot of data.

Hmm, at two places we’re missing more than three days of consecutive data. We’ll need to keep that in mind going forward. Okay, we now have a better understanding of the frequency of our data, before we start visualizing the data we should get a grasp of how our glucose values are distributed.

Looking okay here, the min and max values are well within what’s to be expected. To conclude this section, we now have a sense of what the data looks like, we’ve discovered some anomalies to keep in mind, like our missing values.


Analytics and Visualizations

Okay now the fun part, let’s start visualizing the data to reveal some interesting insights. This section will be laid out by asking questions, then trying to let the data and plots give us the answers.

What’s the trend of my average blood glucose?

We’ll resample daily to get a mean glucose every day, then have a look at whether we can see a trend in the data or not.

Here we’ve fitted the daily glucose averages with linear regression to investigate if we can see any significant trend in the data. Clearly, we can’t distinguish any evident trend here, especially considering the wide confidence interval (the shaded area around the regression line).

Although, we can tell that on average the mean blood glucose for a full day is about 115 mg/dl.

What’s the trend for time spent in range?

You neither want your blood sugar to be too high or low for extended periods of time, therefore time in range is a common metric for how diabetes management. We’ll apply these standards thresholds for blood sugar in range then see what that trend looks like

Okay, looking good more time spent in range! But wait, what’s those days where close to 0% were spent in range..?

Yes, it those days where we missed data, let’s ignore them and try again,

So, actually, we notice a slight negative trend when we’ve removed the bad data, not very significant though but nonetheless an indicator of how time-in-range have progressed the last 90 days.

How’s my average day looking?

Time in Range is not a perfect measure, we want to account for variance too. Let’s have a look at an average day and see what’s the common trends.

Here we can see the variability and trends of blood sugar throughout the day. There’s a general trend of night-time highs — where we can see higher than average glucose levels after bedtime.

Does the day of the week influence my blood sugar?

I always look for an excuse to use violin plots, they look so nice! In this case, they actually make a lot of sense, let’s group our data by weekday and check whether the distributions differ depending on the day of the week

Immediately we notice that Thursdays, Fridays, and Sundays display a larger variance than other days.

We can take it one step further and add in how progression looks for each day. We’ll separate the first and second half of the data and review it again. We can then observe that Fridays are getting a lot better but on Sunday’s there’s been more time in high recently compared to before.

Are there any evident daily or periodic trends?

As we saw in the first plot, recurring patterns can be seen during the day. Let’s dig into this idea and observe the Blood Glucose samples in a heatmap.

A heatmap gives us a bird’s eye view on the data and enables easy detection of patterns. For example, scanning across the nighttime portion of the plot, we can see infrequent streaks of red (highs) broken up by chunks of dark blue (lows). Perhaps tighter glycemic control during nighttime/mornings might be worth experimenting with in this case.


Conclusions and Related Content

In conclusion, what we’ve done in this post is to import our data, reviewing the data quality and made some basic visualizations. The beauty is that now you can keep adding content to the Notebook and try new exciting thing to analyze and use your data in whatever way is best for you!

Why do this and step out of the already provided data views by the sensor providers? I’m of the opinion the CGM data entails a lot more than it’s utilized for today, dietary patterns, exercise impact maybe sleep cycles? The point is that we’re not yet aware of all the potential in this data and the way of finding out where the limit lies is by experimenting with it!

If bio-hacking, CGM’s and data science interest you, this might too:

Learn more about Steady

Thanks to Henrik Berggren.

Leave a Reply