SPARQL from Python

Reading time ~3 minutes

SPARQLWrapper is a simple Python wrapper around a SPARQL service for remote query execution. Not only does it enable us to write more complex queries to extract information from RDF than those exposed through a library like rdflib, it can also convert query results into other formats like JSON and CSV!

First, what is SPARQL?

SPARQL (“SPARQL Protocol And RDF Query Language”) is a W3C standard for querying RDF and can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs.

SPARQL allows us to express queries as three-part statements:

PREFIX ... // identifies & nicknames namespace URIs of desired variables
SELECT ... // lists variables to be returned (start with a ?)
WHERE  ... // contains restrictions on variables expressed as triples


The Python library SPARQLWrapper (which can be installed via pip) enables us to use the SPARQL query language to interact with remote or local SPARQL endpoints, such as DBPedia:

from SPARQLWrapper import SPARQLWrapper, JSON

# Specify the DBPedia endpoint
sparql = SPARQLWrapper("")

# Query for the description of "Capsaicin", filtered by language
    PREFIX rdfs: <>
    SELECT ?comment
    WHERE { <> rdfs:comment ?comment
    FILTER (LANG(?comment)='en')

# Convert results to JSON format
result = sparql.query().convert()

# The return data contains "bindings" (a list of dictionaries)
for hit in result["results"]["bindings"]:
    # We want the "value" attribute of the "comment" field
Capsaicin (/kæpˈseɪ.ᵻsɪn/ (INN); 8-methyl-N-vanillyl-6-nonenamide) is an active component of chili peppers, which are plants belonging to the genus Capsicum. It is an irritant for mammals, including humans, and produces a sensation of burning in any tissue with which it comes into contact. Capsaicin and several related compounds are called capsaicinoids and are produced as secondary metabolites by chili peppers, probably as deterrents against certain mammals and fungi. Pure capsaicin is a volatile, hydrophobic, colorless, odorless, crystalline to waxy compound.

Querying Wikidata

We can also use the Wikidata Query Service (WDQS) endpoint to query Wikidata.

Let’s say we want to continue our research into spicy things by searching for information about hot sauces in Wikidata. The first step is to find the unique identifier that Wikidata uses to reference “hot sauce”, which we can do by searching on Wikidata. It turns out to be “Q522171”, which is an “entity”, which corresponds to the “wd” prefix in Wikidata.

If we want to get back results for all of the kinds of hot sauces cataloged in Wikidata, we want to query for the results that have the direct property – or “wdt” in Wikidata prefix speak – “<subclasses of>”, which is encoded as “P279” in Wikidata.

NOTE: For simple WDQS triples, items should be prefixed with wd:, and properties with wdt:. We don’t need to explicitly alias any prefixes in this case because WDQS already knows many shortcut abbreviations commonly used externally (e.g. rdf, skos, owl, schema, etc.) as well as ones internal to Wikidata, such as:

PREFIX wd: <>
PREFIX wds: <>
PREFIX wdv: <>
PREFIX wdt: <>
PREFIX wikibase: <>
PREFIX p: <>
PREFIX ps: <>
PREFIX pq: <>
PREFIX rdfs: <>
PREFIX bd: <>

More on prefixes here.

sparql = SPARQLWrapper("")

# Below we SELECT both the hot sauce items & their labels
# in the WHERE clause we specify that we want labels as well as items
SELECT ?item ?itemLabel

  ?item wdt:P279 wd:Q522171.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
results = sparql.query().convert()

Let’s use pandas to review the results as a dataframe:

import pandas as pd

results_df =['results']['bindings'])
results_df[['item.value', 'itemLabel.value']]
item.value itemLabel.value
0 salsa
1 Tabasco sauce
2 Adobo
3 Blair's 16 Million Reserve
4 harissa
5 Chili oil
6 sriracha sauce
7 mojo
8 Shito
9 Valentina
10 Doubanjiang
11 sauce samouraï
12 Q3474250
13 Nam phrik
14 Cholula Hot Sauce
15 Nam chim
16 Q16628511
17 Q16642516


More on SPARQL & SPARQL Endpoints

A Parrot Trainer Eats Crow

In this post, we'll consider how it is that models trained on massive datasets using millions of parameters can be both "low bias" and al...… Continue reading

Embedded Binaries for Go

Published on February 06, 2021