Easily Import Kaggle Datasets in Google Colab with Python

Image for post
Image for post
Photo by Filiberto Santillán on Unsplash

When I first tried to import Kaggle datasets into Google Colab, I scoured Medium, Kaggle, and other online forums for how to do so and each answer was slightly different from the next. I also found it rather tedious to try to remember all the steps in order to successfully import the dataset.

Therefore in this post, I will share my solution to these problems with a function I wrote to handle all the dirty work for you. All you have to do is provide the Kaggle dataset url and your api credentials. If you would like to understand the script, please read the entire post. Otherwise, the script in it’s entirety can be found at the end.

First and foremost, we will handle the imports. In order to install kaggle into Google Colab’s environment through a script, we will use the subprocess module.

import subprocesssubprocess.check_call(['pip', 'install', '-q', 'kaggle'])import requests
import os
import json
import zipfile
from pathlib import Path

The function will have two parameters. The first parameter will be a string containing the url for the Kaggle dataset. The second parameter will be a dictionary containing the Kaggle username and key credentials.

DATASET = "https://www.kaggle.com/heeraldedhia/groceries-dataset"
API_TOKENS = {"username": "...", "key": "...}
def download_kaggle_dataset(dataset_url, api_token):

api_user = api_token['username']
api_key = api_token['key']
...

We then perform a check to ensure that the url and api credentials are valid

if (requests.get(dataset_url).status_code == 200 and api_user and api_key):
...

If the conditional is met, the function will then create a kaggle.json file which to dump the api credentials in.

# Setup Kaggle for Google Colab Environment
print('Setting up Kaggle for Google Colab...')
os.mkdir("../root/.kaggle")
Path('../root/.kaggle/kaggle.json').touch()
with open('../root/.kaggle/kaggle.json', 'w') as file:
json.dump(api_token, file)
subprocess.run(["chmod", "600", "~/.kaggle/kaggle.json"])

Finally, the function will download and upzip the contents of the Kaggle dataset.

# Download and unzip Kaggle Dataset
print('Downloading Kaggle Dataset...')
dataset = '/'.join(dataset_url.split('/')[3:])
subprocess.run(["kaggle", "datasets", "download", "-d", dataset, "-p", "/content"])dataset = dataset.split('/')[-1]
with zipfile.ZipFile("{}.zip".format(dataset)) as zip_ref:
zip_ref.extractall("/content")

Put together, the entire script is as follows:

I hope this post has helped you better understand how to install Kaggle datasets in Google Colab. If you encounter any issues with this code, wording in the post, or have any other inquires, please don’t hestiate to let me know. Thank you!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store