Clustering Mexico's neighborhoods and economic activities
data-science

Clustering Mexico's neighborhoods and economic activities

banner

Mexico’s poverty and rural areas

Mexico is a country in North America that extends over 1.9 M square kilometers. It is composed of 32 states and as of 2020 it had a population of 126 M (1). It is characterized by its vast biodiversity and natural resources, sitting among the 15 largest economies in the world (2).


Nevertheless, poverty & wealth inequality drag the development of the nation and the wellbeing of its people. To put this into perspective, according to the nation’s Social Development Policy agency, close to 44% of the population lives in poverty as of 2020 (3).


This agency defines poverty with a multidimensional approach. 2 concepts, that comprise multiple measures, define poverty.

  1. Lack of basic social needs
    • Education lag
    • Lack on access to healthcare services
      • Neither private nor public
    • Lack on access to social security services
      • No insurance/plan to overcome eventualities (i.e., accidents, retirement or pregnancy)
    • Poor housing conditions and access to housing
      • Low quality or lack of building materials
      • Overcrowding
    • Lack on access to basic housing needs
      • Low quality access to water
      • Low quality sewage
      • No electricity
      • Cooking with coal or firewood
    • Lack on access to a good quality nutrition
  2. Economic welfare
    • Extreme vulnerability from low income
    • Vulnerability from low income

In an effort to study poverty and its incidence in urban and rural areas, this organization used k-means to identify rurality by classifying in 3 clusters (urban, interface & rural) the setting of each municipality using 12 variables and followed the study by analyzing how poverty was distributed among them.

  1. Total population
  2. Population density
  3. Percentage of population in locations with less than 2,500 inhabitants
  4. Connectivity index
  5. Economic activity index
  6. Employment
  7. Economic activity rate
  8. Education on population older than 15 years
  9. Percentage of houses without proper sewage
  10. Percentage of houses without internet
  11. Percentage of houses with dirt floor
  12. Percentage of houses with firewood cooking

It was found that 82% of the population in rural areas lived in poverty (8.5 M people) compared against 41% of the population being poor in non-rural ones (43.8 M people). This means that the odds of being poor are twice in rural areas and this is challenging as the rural areas represent 40% of municipalities (30% of localities) housing 9.4% of the population (4).


In agreement with the former exercise (4), rurality and poverty are both multidimensional concepts which are complex to define and worth tackling with data science tools that embrace multivariate & entangled concepts.


A setback in this exercise is that characteristics that were used to identify rural areas are used too in identifying poverty. Therefore, rural areas will always be characterized heavily by poverty with this approach. This can be seen in their results, where poor housing conditions is a measure used to identify both rurality and poverty, flowing into their conclusion that housing presents the greatest gap between rural and urban areas because all poor housing condition places where clustered together in rural and poor sites.

For example, the lack of electricity in houses characterizes a rural area & a poor neighborhood too. Disentangling this relation raises the question why can’t rural areas have electricity & further why couldn’t rural areas have all its needs satisfied?


I will try to disentangle poverty and rurality in this post by merging different datasets and making the analysis on a higher resolution moving from municipalities to neighborhoods. Rural areas must be characterized by variables independent from poverty to further identify how they relate and how this could be addressed.


Understanding Mexico at neighborhood level


Census data

Even though, Mexico’s 2020 census database reached data per block, this analysis will be carried out at a neighborhood level. Compared to the previous analysis this represents an increase in granularity from municipalities.


General administrative structure of Mexico:

  • State
    • City
      • Municipality
        • Locality
          • Neighborhood
            • Block
              • Household

Sociodemographic data at neighborhood level from the 2020 census is then paired with the 2019 economic survey which data is available at municipality level. Therefore, this analysis will assume all neighborhoods hold the same economic conditions within a municipality.


Approach

The task of identifying the distribution of poverty in the country and characterize urban and rural areas will be carried out in several steps. Motivated by the analysis aforementioned, variables from the reference set will be used in this analysis but carefully split in two sets.

Variables that denote poverty will not be used in the rurality analysis to avoid data leakage.


The first step is to identify poor & developed neighborhoods. To achieve this, 3 indices are designed to collapse socioeconomic measurements into a single KPI: housing, social & lag.

  • The housing index represent the quality of housing conditions as it is composed of houses without electricity, sewage, etc.
  • The social index represents the quality of covered social needs as it is composed of the average schooling grade, access to health care, etc.
  • The lag index will be a proxy for poverty as it is composed of measurements that capture the lack of housing and social needs being covered.

Then, each neighborhood is clustered by the housing and social indices, the lag index if kept for analysis.
The outcome of this stage are neighborhood clusters that characterize their environment which encloses development and poverty.


The second step is to identify rural neighborhoods.
The approach here is to use the economic activities performed at each municipality as a proxy to rural and urban areas.
Based on the Social Accounting Matrix approach by Blancas and Aliphat (SAM), economic sectors defined in this paper will work as proxies for rural and urban activities.
The primary sector will be the proxy for rural areas and tertiary sector for urban areas.
Finally, employment participation for urban and rural sectors are paired with the size of the population in the neighborhood and two connectivity indices to cluster each neighborhood in two sectors to differentiate from urban and rural areas.


These two sets of clusters, environment and sector, become the basis to analyze how variables of interest such as wages and the lag index behave in the intersection of the clusters.


Analysis


Sociodemographic data

The analysis begins with the characterization of the socioeconomic setting of Mexico’s neighborhoods.
To achieve this, 3 indices will be designed:

  • Housing: Coverage in the neighborhood of housing basic services like flooring, internet, sewage, drinkable water & electricity.
  • Social: Coverage in the neighborhood of social needs like education measured with average schooling grade and population with access to healthcare. Also considers the activity rate.
    • Activity rate: Ratio of not-economically active population over economically active population in the neighborhood
  • Lag: Maps into a feature the lack of variables considered in housing and social indices.

Clustering neighborhoods


Development environment

Housing and social indices are then employed in the clustering exercise for the neighborhoods in Mexico.

In this step, a gaussian algorithm of 4 clusters was applied. The result are the following clusters:

  1. Marginalized: Locations that display an extreme degree of poverty through poor housing conditions & the lack of basic needs.
  2. Lagging: Locations that display a high degree of poverty through poor housing conditions and lesser lack of basic needs.
  3. Underdeveloped: Locations where basic needs are mostly covered, and housing conditions are good.
  4. Developed: Locations where basic needs are fulfilled, and housing conditions are good.

The interactive map bellow shows an example of the result for Nuevo Leon and Coahuila states. It is reduced to just 2 states because the size of the file for the whole country is 200 MB.


Developed sectors are filled with blue, underdeveloped are in green, & poor sectors are in red.

It is important to notice how developed sectors can be neighbored by sectors characterized by poverty.


Rural & urban sectors

Now the task is to identify if a neighborhood is in a urban or rural sector. The economic activity employed in each municipality along with the connectivity index, and population size will be the proxy to define the sector of a neighborhood in this analysis.

  1. Rural: Proxy built with the population employed in primary sectors.
  2. Urban: Proxy built with the population employed in tertiary sectors.
  3. Urban Connectivity index: Proxy built with the Gross Product from Urban Transit Systems.
  4. Rural Connectivity index: Proxy built with the Gross Product from Interurban and Rural Bus Transportation.
  5. Population in neighborhood: Population living in the neighborhood.

A gaussian algorithm of 2 clusters is then fitted on these 6 features to identify the following clusters:


  • Rural:
    • Geographic sectors similar to each other in having:
      • a higher level of employment in rural activities & rural connectivity index.
      • a lower degree of population.
  • Urban:
    • Geographic sectors similar to each other in having:
      • a higher level of employment in urban activities & urban connectivity index.
      • a higher degree of population.

Urban sectors are filled with grey & rural sectors are filled with green.


Results

Analyzing development & rurality


The tables bellow summarize the distribution of neighborhoods and population among their respective environment and sector.


NEIGHBORHOODS (%)
SECTORRURALURBANTOTAL
ENVIRONMENT
DEVELOPED2.818.421.2
UNDERDEVELOPED7.115.022.1
LAGGING13.013.826.8
MARGINALIZED17.012.929.9
TOTAL39.960.1100

POPULATION (%)
SECTORRURALURBANTOTAL
ENVIRONMENT
DEVELOPED3.329.933.2
UNDERDEVELOPED7.322.830.1
LAGGING9.014.623.6
MARGINALIZED5.77.313.0
TOTAL25.374.6100

Defining the sector using the economic activity engaged in each municipality allows the analysis to disentangle poverty measurements from rurality. The results is that approximately a quarter of Mexico’s population resides on rural areas.


Using the environment clusters as a proxy for poverty, lagging & marginalized neighborhoods represent a 56.7% in the country, hosting 36.6% of the population.

40% of the population living in poverty resides in rural areas.
58% of the rural population lives in poor conditions.
29% of the urban population lives in poor conditions.
Rural population is 2x likely to live in poverty.

RURALURBANDELTA
POPULATION (M people)31.294.1+195%
POOR POPULATION (%)58.129.4-28.7pp
GDP (B MXN/year)8,36013,850+66%
WAGE (MXN/month)4,8506,035+25%*
GDP per capita (k MXN/capita)490133-73%*
AVG. SCHOOL GRADE (>15 years)3.45.0+1.6 years*
LAG Index (%) - poverty proxy64.5%48.9%-15.6pp*
HOUSING Index (%)37.8%54.8%+17.1pp*
SOCIAL Index (%)38.3%51.2%+12.9pp*
*: significantly different under T-test for the means (p-value <0.01)

Overall, rural areas seem to fall behind under key measures for welfare. The average wage in rural areas is 4,850 MXN/month, whereas urban areas is at 6,035 (+25%). Moreover, there is room for improvement in the proposed indices that behave as proxies for poverty, HOUSING and SOCIAL, which underperformed significantly in rural areas. Nevertheless, annual GDP per capita in rural areas stands at 490 k MXN, and urban areas is 133 k MXN (+73%).


Comparing this approach to the one referenced at the beginning of the post, the odds of poverty in rural areas is similar in this exercise (2x vs 2x reference) but the percentage of poor population differs by 7pp (37% vs 44% reference). Moreover, the exercises differ greatly in the percentage of population living in rural areas (25.3% vs 9.4% reference).
More significantly, using employment to identify rural sectors without mixing it with housing quality revealed indeed that rurality faces poverty challenges but is not as accentuated for rural areas (58% vs 81% reference).


Results indicate that poverty hits around a third of Mexico’s population. Rural areas are twice as likely to face poverty than urban centers, with half of its population lacking quality housing and social security.

Rural areas do face challenges with poverty and host a relevant portion in the population.

Essential services become more essential in rural areas. With lack of access to education, wages and poor living conditions, development hinders significantly.

Nevertheless, rural activities display higher GDP per capita posing an opportunity to drive growth.
Therefore, the promotion of decent work and the improvement of rural economic activities towards greater value should provide an engine for growth and development. This motivates the next blog post.
With the descriptive analysis at hand and a neighborhood dataset complemented with clusters, the intent is to extend this exercise with a prescriptive analysis that promotes economic growth and tackles poverty in less favored neighborhoods.