Part of the Semantic Representation series:

  1. Constructing a Knowledge Graph on Sri Lankan PoliticsThis post!
  2. Ingesting and Visualizing RDF Data with Neo4j
  3. Further querying on the Knowledge Graph of Sri Lankan Politics

Introduction Link to this heading

In this article, I aim to demonstrate how we can query data from a public knowledge base and create a knowledge graph for a particular domain. The main goal is to extract data related to Sri Lankan politics using Wikidata. Before we dive into the technical details, let’s first get an understanding of some key concepts: knowledge graphs, Wikidata, RDF and SPARQL.

Knowledge Graphs Link to this heading

A knowledge graph is a network of real-world entities and their interrelations, organized in a graph structure. This allows for the integration of diverse data sources, providing a unified framework to connect various pieces of information. Knowledge graphs are powerful tools for data analysis, enabling complex queries and insightful visualizations.

Fuzheado, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Wikidata Link to this heading

Wikidata is a collaboratively edited knowledge base hosted by the Wikimedia Foundation. It acts as a central repository of structured data for Wikipedia, Wikivoyage, Wikisource, and other Wikimedia projects. Wikidata provides a rich source of interconnected data on a wide array of subjects, including historical events, scientific concepts, and political figures.

RDF (Resource Description Framework) Link to this heading

The Resource Description Framework (RDF) is a standard model for data interchange on the web. RDF allows data to be linked across different sources, making it possible to merge information from various origins. The basic structure of RDF data consists of triples: subject, predicate, and object. This structure is ideal for representing knowledge graphs.

SPARQL (SPARQL Protocol and RDF Query Language) Link to this heading

SPARQL is the query language for RDF. It allows us to extract information from RDF graphs by specifying patterns to match against the data. SPARQL queries can be used to retrieve and manipulate data stored in RDF format, making it a powerful tool for querying knowledge graphs.

Querying Wikidata Using SPARQL Link to this heading

In this section, we will define SPARQL queries to extract data related to Sri Lankan politics from Wikidata. We will cover querying for politicians, political parties and coalitions, and political offices or positions. These queries will be constructed using Python multiline strings and the results will be formatted as RDF triples.

Query Prefixes Link to this heading

First, we define the necessary query prefixes. These include standard namespaces and a custom namespace (slpg) for our schema.

python
1query_prefixes="""
2PREFIX wd: <http://www.wikidata.org/entity/>
3PREFIX wdt: <http://www.wikidata.org/prop/direct/>
4PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
5PREFIX slpg: <http://www.slpg.lk/>
6"""

Query for Political figures Link to this heading

The following query extracts information about Sri Lankan politicians. It constructs RDF triples for each politician with attributes such as name, date of birth, political party, positions held, education, religion, spouse, place of birth, siblings, and children.

python
 1query_persons = query_prefixes + """
 2CONSTRUCT {
 3    ?person
 4    slpg:label ?personLabel;
 5    slpg:dateOfBirth ?dob;
 6    slpg:memberOf ?party;
 7    slpg:holds ?position;
 8    slpg:educatedAt ?educated;
 9    slpg:religion ?religion;
10    slpg:spouseOf ?spouse;
11    slpg:placeOfBirth ?placeOfBirth;
12    slpg:siblingOf ?sibling;
13    slpg:parentOf ?children;
14    a slpg:PERSON.
15}
16WHERE {
17  ?person wdt:P31 wd:Q5;
18          wdt:P27 wd:Q854;
19          wdt:P106 wd:Q82955;
20  wdt:P569 ?dob.
21  OPTIONAL{
22    ?person wdt:P102 ?party
23  }
24  OPTIONAL{
25    ?person wdt:P39 ?position
26  }
27  OPTIONAL{
28    ?person wdt:P69 ?educated
29  }
30  OPTIONAL{
31     ?person wdt:P140 ?religion;
32  }
33  OPTIONAL{
34     ?person wdt:P26 ?spouse;
35  }
36  OPTIONAL{
37     ?person wdt:P19 ?placeOfBirth;
38  }
39  OPTIONAL{
40     ?person wdt:P3373 ?sibling;
41  }
42  OPTIONAL{
43     ?person wdt:P40 ?children;
44  }
45  ?person rdfs:label ?personLabel . FILTER (lang(?personLabel) = "en")
46  wd:Q5 rdfs:label ?humanLabel . FILTER (lang(?humanLabel) = "en")
47}
48"""

This query can be tested on the Wikidata SPARQL Query Service

Query for Parties and Coalitions Link to this heading

The following query extracts information about political parties and coalitions. It constructs RDF triples for each party with attributes such as name, ideology, chairperson, founding date, founder, and political alignment.

python
 1query_parties = query_prefixes + """
 2CONSTRUCT {
 3  ?party  slpg:label ?partyLabel;
 4          slpg:hasIdeology ?ideology;
 5          slpg:hasChairperson ?leader;
 6          slpg:foundOn ?foundingDate;
 7          slpg:foundBy ?founder;
 8          slpg:hasPoliticalAlignment ?politicalAlignment;
 9          a slpg:PARTY_OR_COALITION;
10}
11WHERE {
12  ?person wdt:P31 wd:Q5;
13         wdt:P27 wd:Q854;
14         wdt:P106 wd:Q82955;
15         wdt:P102 ?party.
16  OPTIONAL { ?party wdt:P1142  ?ideology. }  # Ideology
17  OPTIONAL { ?party wdt:P488 ?leader. }  # Leader
18  OPTIONAL { ?party wdt:P112 ?founder. }  # Founder
19  OPTIONAL { ?party wdt:P1387 ?politicalAlignment. }  # Founder
20  OPTIONAL { ?party wdt:P571 ?foundingDate. }  # Founding date
21  ?party rdfs:label ?partyLabel.FILTER(langMatches( lang(?partyLabel), "EN" ) )
22}
23"""

Additional Queries Link to this heading

Similarly, you can create queries for other relevant data such as poltical offices,educational institutions, religious beliefs, family members, political ideologies, and locations. These queries follow the same structure as above, adjusting the CONSTRUCT and WHERE clauses to match the specific data you are interested in.

Performing the queries Link to this heading

Next, we use the library SPARQLWrapper to execute these queries against the Wikidata SPARQL endpoint, convert the results into RDF graphs, and then merge these graphs. The final graph is serialized into turtle format and saved.

python
 1from SPARQLWrapper import SPARQLWrapper
 2endpoint_url = "https://query.wikidata.org/sparql"
 3sparql = SPARQLWrapper(endpoint_url)
 4queries=[query_persons,query_parties,query_offices,query_education,query_religion,query_family,query_ideology,query_location]
 5graphs = []
 6for query in queries:
 7    sparql.setQuery(query)
 8    graph = sparql.queryAndConvert()   # returns rdflib Graph objects ,rdflib is a powerful library used to handle rdf data in python
 9    graphs.append(graph)
10merged_graph = Graph()
11for g in graphs:
12    merged_graph += g
13merged_graph.serialize(destination='merged_graph.ttl', format='turtle')

Conclusion Link to this heading

In this post we looked at how we can query a public knowledge base to generate RDF data.In the next sections we aim to injest this data for analytics and visualization.