Update
After this post was published, some people asked for some other suffixes to be analyzed. I’ve now made an interactive webpage to analyse place names in any way with SQL:
Intro
I noticed something unique when I walked around in Kakkanad, some of the local place names ended with the word “മുഗൾ”.
I haven’t heard of any other place having this suffix in Kerala before, so it was quite interesting to me. The curiosity instantly brewed in my mind, is this really specific to Kakkanad? Why so?
I never got the time to explore this idea more until last week at a Wikimedia event in my alma mater.
Goal
To list all the places in Kerala with a particular suffix and show them as dots on an interactive map. This’ll help to know the frequency better.
Gathering data
Where to get the list of all place names in Kerala & their coordinates?
It is only when you try to do data analysis, you realize the lack of data and errors in the data that is available.
OpenStreetMap
- Get the list of all place names
- Get the list of all bus stops
Bus stop names are usually the place names itself, I noticed in OSM that some place name nodes are not on the map but the bus stops are. Hence why I collected them both, there will be duplicates because of this but that’s fine, this is a simple human analysis.
Wikidata
Wikidata query service is used to run these SPARQL queries.
- Get the list of human settlements in Kerala:
SELECT DISTINCT ?item ?len ?lml ?coord
WHERE
{
?item wdt:P31 wd:Q486972 .
?item wdt:P131/wdt:P131* wd:Q1186.
?item wdt:P625 ?coord.
OPTIONAL { ?item rdfs:label ?len. FILTER(LANG(?len)="en") }
OPTIONAL { ?item rdfs:label ?lml. FILTER(LANG(?lml)="ml") }
}
LIMIT 100
The good folks at OpenDataKerala has made ward data openly accessible on Wikidata and OpenStreetMap. Some of these wards have coordinates assigned.
- Get list of local body wards that has a coordinate.
SELECT DISTINCT ?item ?len ?lml ?coord
WHERE
{
?item wdt:P31 wd:Q1195098 .
?item wdt:P131* wd:Q1186.
?item wdt:P625 ?coord.
OPTIONAL { ?item rdfs:label ?len. FILTER(LANG(?len)="en") }
OPTIONAL { ?item rdfs:label ?lml. FILTER(LANG(?lml)="ml") }
}
Combining data
We have four sources of data in JSON. They need to be combined so that it’s easy to filter out data.
I figured putting them all into a SQLite DB would be the best way. For this I wrote a Ruby script. Rails’ ActiveRecord makes it pretty easy to manage the DB.
ActiveRecord without Ruby on Rails
One of the main features of Ruby on Rails is the ActiveRecord ORM. This can be used without Ruby on Rails as well.
require 'active_record'
require 'sqlite3'
ActiveRecord::Base.establish_connection(
adapter: 'sqlite3',
database: 'db.sqlite3'
)
# This is a model that corresponds to a table
class Place < ActiveRecord::Base
end
# Table name in SQLite will be plural: "places"
@first_run = !Place.table_exists?
# 3. Create the table (migration-like setup)
if @first_run
ActiveRecord::Schema.define do
create_table :places do |t|
t.string :name
t.string :lat
t.string :lon
t.timestamps
end
add_index :places, :name, unique: true
end
end
The above script will setup the DB. Sometimes you’ll want to reset and start over, simply just the delete the db.sqlite3
file (Good part of SQLite being simple).
Insert data
The next step is to insert the data, for this I created separate functions.
if @first_run
osm_bus_stops
osm_place_nodes
wikidata_places
wikidata_wards
Place.insert_all(@records)
end
Now that we have the DB setup, we can do the analysis.
Ruby interactive console can be used to debug and query easily with ActiveRecord, for this I trigger the runtime developer console at the end of the file.
require 'pry'
...
...
binding.pry
Using the console to fetch all the places that has the word mugal
in it:
Place.where("name LIKE '%mugal%'").pluck(:name, :lat, :lon)
Analysis
The most effective way to show the result is with a map of marked points. I figured using Leaflet will be the easiest because I can programmatically control it and I have seen it being used everywhere on the web.
Showing just Kerala region
Leaflet showed the full map of the Earth, but I wanted to show just Kerala. This was difficult to achieve. I needed to get the exact boundaries of the Kerala region. My first attempt was to use the geometry data from Wikipedia.
But I wanted to distinguish the districts better. I explored many ways and finally reached the best solution. Write a query in Overpass, download the geojson and load it in leaflet.
The geojson can be obtained by running this query on https://overpass-turbo.eu/
[out:xml][timeout:500];
{{geocodeArea:Kerala}}->.searchArea;
(
nwr["boundary"="administrative"]["admin_level"="5"](area.searchArea);
);
// print results
out meta;
>;
out meta qt;
If you notice the query, it fetches administrative regions with level 5, this is the districts of Kerala.
Coloring districts uniquely
Since the geojson is made up of district region boundaries, based on the name of the region, the fill color can be changed. This is how that looks like:
L.geoJSON(json, {
style: function(feature) {
return {
fillColor: colors[feature.properties.name], // colors["Thrissur"]
fillOpacity: 0.5,
color: "#000",
weight: 0.2,
}
}
})
Result
I’ve analyzed suffixes of “kari”, “mugal”, “ssery”, and “kulam” suffixes, plotted it on to the map, then use Firefox’s screenshot tool to grab the boundary box. This is how the webpage of Leaflet looks like:
Firefox’ screenshot tool is pretty nice to grab just the rectangle box by the border:
Conclusion
mugal
The place names ending with “mugal” suffix is indeed a specialty of Kakkanad. The logical reasoning I have so far is that it’s because of the nature of the place. Since they’re hilly areas, using the name mugal/മുകൾ (top) makes sense.
kari
Kuttanad has a reason why it has a lot of place names ending with kari/കരി. From Wikipedia:
Kuttanad was once believed to be a wild forest with dense tree growth which was destroyed subsequently by a wild fire. Chuttanad (place of the burnt forest), was eventually called Kuttanad. Until the recent past burned black wooden logs were mined from paddy fields called as “Karinilam” (Black paddy fields). This fact substantiates the theory of Chuttanad evolving to Kuttanad. Ramankary, Puthukkary, Amichakary, Oorukkary, Mithrakary, Mampuzhakary, Kainakary, Chathurthiakary, Thakazhy, Edathua, Chambakkulam, Mankombu and Chennamkary are some familiar place names in Kuttanad
Interestingly, Kannur mountain ranges also have a lot of such places! I don’t have a good answer to this but I think it must be because the people who settled in these mountain regions are people from south Kerala. When people migrate, they tend to name the new places from the places they came from or are familar with.
kulam
Kulam means pond and Kerala has a lot of it. So it obviously makes snese that it’s present everywhere in Kerala.
ssery
Interestingly it’s not present in Kasargod or Thiruvananthapuram but is present everywhere else.
You can see more imagery here.
Credits
Thanks to OpenStreetMap, Wikidata contributors. Data analysis is all fancy and all, but no analysis is possible without data. So first and foremast thanks to all the people who contibute!
Thanks to Jinoy, Manoj K & Ranjith Siji for answering my queries at the event. This made things faster to build.