Part of the Semantic Representation series:
- Constructing a Knowledge Graph on Sri Lankan PoliticsThis post!
- Ingesting and Visualizing RDF Data with Neo4j
- Further querying on the Knowledge Graph of Sri Lankan Politics
Introduction
In this article, I aim to demonstrate how we can query data from a public knowledge base and create a knowledge graph for a particular domain. The main goal is to extract data related to Sri Lankan politics using Wikidata. Before we dive into the technical details, let’s first get an understanding of some key concepts: knowledge graphs, Wikidata, RDF and SPARQL.
Knowledge Graphs
A knowledge graph is a network of real-world entities and their interrelations, organized in a graph structure. This allows for the integration of diverse data sources, providing a unified framework to connect various pieces of information. Knowledge graphs are powerful tools for data analysis, enabling complex queries and insightful visualizations.
Wikidata
Wikidata is a collaboratively edited knowledge base hosted by the Wikimedia Foundation. It acts as a central repository of structured data for Wikipedia, Wikivoyage, Wikisource, and other Wikimedia projects. Wikidata provides a rich source of interconnected data on a wide array of subjects, including historical events, scientific concepts, and political figures.
RDF (Resource Description Framework)
The Resource Description Framework (RDF) is a standard model for data interchange on the web. RDF allows data to be linked across different sources, making it possible to merge information from various origins. The basic structure of RDF data consists of triples: subject, predicate, and object. This structure is ideal for representing knowledge graphs.
SPARQL (SPARQL Protocol and RDF Query Language)
SPARQL is the query language for RDF. It allows us to extract information from RDF graphs by specifying patterns to match against the data. SPARQL queries can be used to retrieve and manipulate data stored in RDF format, making it a powerful tool for querying knowledge graphs.
Querying Wikidata Using SPARQL
In this section, we will define SPARQL queries to extract data related to Sri Lankan politics from Wikidata. We will cover querying for politicians, political parties and coalitions, and political offices or positions. These queries will be constructed using Python multiline strings and the results will be formatted as RDF triples.
Query Prefixes
First, we define the necessary query prefixes. These include standard namespaces and a custom namespace (slpg) for our schema.
python1query_prefixes="""
2PREFIX wd: <http://www.wikidata.org/entity/>
3PREFIX wdt: <http://www.wikidata.org/prop/direct/>
4PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
5PREFIX slpg: <http://www.slpg.lk/>
6"""
Query for Political figures
The following query extracts information about Sri Lankan politicians. It constructs RDF triples for each politician with attributes such as name, date of birth, political party, positions held, education, religion, spouse, place of birth, siblings, and children.
python 1query_persons = query_prefixes + """
2CONSTRUCT {
3 ?person
4 slpg:label ?personLabel;
5 slpg:dateOfBirth ?dob;
6 slpg:memberOf ?party;
7 slpg:holds ?position;
8 slpg:educatedAt ?educated;
9 slpg:religion ?religion;
10 slpg:spouseOf ?spouse;
11 slpg:placeOfBirth ?placeOfBirth;
12 slpg:siblingOf ?sibling;
13 slpg:parentOf ?children;
14 a slpg:PERSON.
15}
16WHERE {
17 ?person wdt:P31 wd:Q5;
18 wdt:P27 wd:Q854;
19 wdt:P106 wd:Q82955;
20 wdt:P569 ?dob.
21 OPTIONAL{
22 ?person wdt:P102 ?party
23 }
24 OPTIONAL{
25 ?person wdt:P39 ?position
26 }
27 OPTIONAL{
28 ?person wdt:P69 ?educated
29 }
30 OPTIONAL{
31 ?person wdt:P140 ?religion;
32 }
33 OPTIONAL{
34 ?person wdt:P26 ?spouse;
35 }
36 OPTIONAL{
37 ?person wdt:P19 ?placeOfBirth;
38 }
39 OPTIONAL{
40 ?person wdt:P3373 ?sibling;
41 }
42 OPTIONAL{
43 ?person wdt:P40 ?children;
44 }
45 ?person rdfs:label ?personLabel . FILTER (lang(?personLabel) = "en")
46 wd:Q5 rdfs:label ?humanLabel . FILTER (lang(?humanLabel) = "en")
47}
48"""
This query can be tested on the Wikidata SPARQL Query Service
Query for Parties and Coalitions
The following query extracts information about political parties and coalitions. It constructs RDF triples for each party with attributes such as name, ideology, chairperson, founding date, founder, and political alignment.
python 1query_parties = query_prefixes + """
2CONSTRUCT {
3 ?party slpg:label ?partyLabel;
4 slpg:hasIdeology ?ideology;
5 slpg:hasChairperson ?leader;
6 slpg:foundOn ?foundingDate;
7 slpg:foundBy ?founder;
8 slpg:hasPoliticalAlignment ?politicalAlignment;
9 a slpg:PARTY_OR_COALITION;
10}
11WHERE {
12 ?person wdt:P31 wd:Q5;
13 wdt:P27 wd:Q854;
14 wdt:P106 wd:Q82955;
15 wdt:P102 ?party.
16 OPTIONAL { ?party wdt:P1142 ?ideology. } # Ideology
17 OPTIONAL { ?party wdt:P488 ?leader. } # Leader
18 OPTIONAL { ?party wdt:P112 ?founder. } # Founder
19 OPTIONAL { ?party wdt:P1387 ?politicalAlignment. } # Founder
20 OPTIONAL { ?party wdt:P571 ?foundingDate. } # Founding date
21 ?party rdfs:label ?partyLabel.FILTER(langMatches( lang(?partyLabel), "EN" ) )
22}
23"""
Additional Queries
Similarly, you can create queries for other relevant data such as poltical offices,educational institutions, religious beliefs, family members, political ideologies, and locations. These queries follow the same structure as above, adjusting the CONSTRUCT and WHERE clauses to match the specific data you are interested in.
Performing the queries
Next, we use the library SPARQLWrapper to execute these queries against the Wikidata SPARQL endpoint, convert the results into RDF graphs, and then merge these graphs. The final graph is serialized into turtle format and saved.
python 1from SPARQLWrapper import SPARQLWrapper
2endpoint_url = "https://query.wikidata.org/sparql"
3sparql = SPARQLWrapper(endpoint_url)
4queries=[query_persons,query_parties,query_offices,query_education,query_religion,query_family,query_ideology,query_location]
5graphs = []
6for query in queries:
7 sparql.setQuery(query)
8 graph = sparql.queryAndConvert() # returns rdflib Graph objects ,rdflib is a powerful library used to handle rdf data in python
9 graphs.append(graph)
10merged_graph = Graph()
11for g in graphs:
12 merged_graph += g
13merged_graph.serialize(destination='merged_graph.ttl', format='turtle')
Conclusion
In this post we looked at how we can query a public knowledge base to generate RDF data.In the next sections we aim to injest this data for analytics and visualization.