How to use gspread to get and update Google Sheets

Google Sheets is a daily application used by data analysts, especially in companies that use the Google Workspace ecosystem. I use Google Sheets to make reports, dashboards, and update data for Google Sheets happens every day but is automated with Python. Below are some of my experiences using gspread to work with Google Sheets.

Preparation

To use gspread or Google Sheets API, you need to create a Service Account and get its JSON file. This makes the account authorization part easy in Python.

Once you have a Service Account, go to Google Sheets sharing settings and add the Service Account email with editing permissions. The email is the client_email field in the JSON file you downloaded.

Add the Service Account email to the folder or Google Sheets file with editing permissions
Add the Service Account email to the folder or Google Sheets file with editing permissions

Make sure you have Python installed on your machine. Install gspread with the following command:

pip install gspread

Use gspread to get data from Google Sheets

Get all the data in the worksheet

If the data in the worksheet is in standard tabular format (the first row is a header, other rows are values), use get_all_records() to get the data from Google Sheets.

import pandas as pd
import gspread

client = gspread.service_account(filename='service_account.json')
sheet = client.open_by_key(key='YourSheetsKey')
worksheet = sheet.get_worksheet_by_id(id=0) # Replace 0 with your worksheet ID

data = worksheet.get_all_records()
df = pd.DataFrame(data)

If the data in the worksheet is not structured as a table, you can get the data from Google Sheets in the form of all rows (list of lists) using get_all_values(). However, you need to extract and clean your data yourself.

data = worksheet.get_all_values()

Although gspread supports other functions to select a sheet or worksheet by name. But in my experience, using key and id is still the safest because it is always fixed, but the names can be changed, or having many sheets with the same name will cause errors.

Get data from a range in a worksheet

When using get_values() to get data from a range in a worksheet, you will receive the value as a list of lists.

data = worksheet.get_values(range_name='A1:B2')

Get all formulas instead of all values

You may not know. In addition to getting data in value form, gspread also supports getting data in formula form. This feature is helpful for special situations, such as updating the entire formula by replacing a string.

data = worksheet.get_all_records(value_render_option='FORMULA')

You can use value_render_option='FORMULA' parameter for other data fetching functions.

Use gspread to update data into Google Sheets

In most of the functions to update data in Google Sheets, I often add a parameter value_input_option='USER_ENTERED' because it helps the worksheet receive the correct format as if you filled in the data manually.

Replace all old data

Below is the code to update data into Google Sheets using gspread that I often use.

  1. Connect worksheet.
  2. Delete all values ​​from row 2 and above.
  3. Update all data from the data frame into the worksheet, including the header.
import pandas as pd
import gspread

df = pd.DataFrame({'name': ['iPhone', 'iPad', 'MacBook'], 'profit': [7200, 8600, 9500]})

client = gspread.service_account(filename='service_account.json')
sheet = client.open_by_key(key='YourSheetsKey')

worksheet = sheet.get_worksheet_by_id(id=0) # Replace 0 with your worksheet ID

# Delete old data but excluded header
worksheet.delete_rows(start_index=2, end_index=worksheet.row_count) 

# Update new data
worksheet.update(values=[df.columns.tolist()] + df.values.tolist(), value_input_option='USER_ENTERED')

You can also declare an additional range parameter in update() to select the data update area.

Append data below the worksheet

The familiar term is “append”. When appending data, we no longer need the header from the data frame.

worksheet.append_rows(values=df.values.tolist(), value_input_option='USER_ENTERED')

Update a specific cell

To update a cell, you can use update_acell() or update_cell(), they are just different in how you fill in the parameters for the cell you need to update.

worksheet.update_acell(label='A1', value='Hello World!')

# The same as
worksheet.update_cell(row=1, col=1, value='Hello World!')

Update multiple ranges at once

batch_update() is ideal when you need to update multiple cells or ranges simultaneously, instead of repeating countless other update functions.

worksheet.batch_update([
    {
        'range': 'A1',
        'values': [['Hello World!']]
    },
    {
        'range': 'B2:C3',
        'values': [['Name', 'Birthday'], ['Hieu Tran', 1996]]
    },
    {
        'range': 'D1:D3',
        'values': [['Guide'], ['From'], ['Lucid Gen']]
    }
])

Advanced: To update multiple ranges at once while still ensuring automation, you need to combine with other libraries like pandas and openpyxl. Suppose you want to update only the changed cells in a data frame. The best approach is:

  1. Create row_number using the index of the data frame.
  2. Filter out rows that have changes.
  3. Create a list of columns to update.
  4. Use a for loop to create batches automatically. In it, use get_column_letter() and row_number to create range.
  5. Finally, use batch_update() to update all changed cells.
from openpyxl.utils import get_column_letter

# Assume df is data from worksheet
df['row_number'] = df.index + 2 # Due to header and Python index start from 0

# You had changed some values in df. Filter rows you want to update to worksheet
df_filter = df[df['status']=='changed'] # Replace with your condition

# Columns you want to update to worksheet
update_cols = ['profit', 'revenue']

batches = []
for col in update_cols:
    col_letter = get_column_letter(df_filter.columns.tolist().index(col) + 1) # Due to Python index start from 0
    for row_number, value in zip(df_filter.row_number, df_filter[col]):
        batch = {'range': f'{col_letter}{row_number}', 'values': [[value]]}
        batches.append(batch)

worksheet.batch_update(data=batches, value_input_option='USER_ENTERED')

Conclusion

The article has compiled cases where data analysts often use gspread to work with Google Sheets. The gspread library still has many functions that we rarely use. If you are interested, please refer to their documentation. Leave a comment below to interact with me!

Hieu Tran Ngoc Minh

Hieu (born in 1996) holds a Bachelor's degree in Business Administration from Saigon Technology University. Currently a Data Analyst at Ninja Van, Hieu has extensive experience in Data Analysis and Digital Marketing. This blog is where Hieu shares practical experiences from work and life.

Leave a Comment

Feel free to leave your comment, and we will review and respond as soon as possible. Please use a real email to ensure your comment is approved and to receive notifications when we reply. You can also add an avatar to your email.