It continues to be a controversial voting scenario for Peruvians, on the second round of voting between Pedro Castillo (far left) and Keiko Fujimori (far right). There have been allegations of fraud, though no tangible evidence has been provided yet.
ONPE, the National Office of Electoral Processes of Peru, has made the voting dataset public, for both 1st and 2nd rounds. That shows the level of transparency the entity is providing, to try mitigate disinformation.
I have taken the data and created a simple dashboard using PowerBI, that helps visualize results for second and first round of votes. It can help us with the comparison of the exact voting tables (MESA_DE_VOTACION) between both rounds, and also the differences between both candidates on the second round.
The data and metadata is located on the ONPE site, and will be used to create a data model with driving attribute MESA_DE_VOTACION.
Loading and creating the data model
We will be using the 2 CSV files provided by ONPE and load them (1st and 2nd rounds).
PowerBI will detect relationships and create the reference for us. This is not always automatic, and thus it is recommended to review.
Create additional attributes to drive International hierarchies
The data has been created to be very simple to use as one dataset. However, there is a combination of national and international data. With the steps below we will attempt to create attributes that can help us dissect the data.
These steps will be created for each dataset (2 of them):
Continente
Will create based on values of Departamento. If values match the name of each continent, then use that name. If not, stick to AMERICA, as Peru resides there.
Continente =
IF(Resultados_1ra_vuelta_Version_PCM[DEPARTAMENTO]
IN {"AMERICA","ASIA", "OCEANIA","EUROPA", "AFRICA"},
Resultados_1ra_vuelta_Version_PCM[DEPARTAMENTO],
"AMERICA")
Pais
Will create based on values of Departamento and Provincia. Provincia contains the Country information if values are outside Peru as per below:
Pais =
IF(Resultados_1ra_vuelta_Version_PCM[Continente]="AMERICA"
&& Resultados_1ra_vuelta_Version_PCM[Continente]
<> Resultados_1ra_vuelta_Version_PCM[DEPARTAMENTO],
"PERU"
, Resultados_1ra_vuelta_Version_PCM[PROVINCIA])
Departamento & Provicia (as new attributes)
Will create based on values of Pais. In both cases, we will check if País is Perú. For non matches, we will default values for Departamento Peru and Provincia Peru as “Exterior”.
Departamento Peru =
IF(Resultados_1ra_vuelta_Version_PCM[Pais] = "PERU",
Resultados_1ra_vuelta_Version_PCM[DEPARTAMENTO
], "Exterior")
Provincia Peru =
IF(Resultados_1ra_vuelta_Version_PCM[Pais] = "PERU",
Resultados_1ra_vuelta_Version_PCM[PROVINCIA]
, "Exterior")
The creation of the proper hierarchy will allow us to create multiple slicers, starting from continent and let us drill down as needed like below:
Creating the Dashboard
This area is pretty much free-form and it really depends on what has to be accomplished. I took the decision to create 3 sheets.
- Presidenciales: Information of votes for Keiko Fujimori and Pedro Castillo on 1st and 2nd rounds, with options to filter and drill down.
- Nulos, Blancos: Information of Votes that have no candidate, or that were invalidated due to error.
- Menos de 5 Votos: Attempts to provide information of all locations that recorded less than 5 votes to Keiko Fujimori or Pedro Castillo.
Creating a Slicer
More information can be obtained from the official documentation. In a nutshell you have to select an attribute from the dataset and create a slicer as per below
Creating a Matrix
More information can be obtained from the official documentation. Based on a known hierarchy, the attributes have to be selected in the proper order. Rows will contain the drill down attributes, and columns will contain the metrics.
Creating a Measure
More information can be obtained from the official documentation. The main difference between attributes and measures, is that an attribute is a straight computation of existing data within the dataset, in a per row basis. A measure is a calculated value that will change based on the rows chosen or filtered within the model. So for example, if you would like to obtain a percent of votes of a particular location, and Keiko Fujimori obtained 6,000 vs 4,000 of Pedro Castillo, you would like to visualize 60% and 40%, regardless of the total vote count for each candidate. If you drill up or down, the expectation is for those values to continue changing.
I went ahead and created a % of votes for each candidate.
Ratio_K =
CALCULATE(
SUM(Resultados_2da_vuelta_Version_PCM[VOTOS_P2]))
/ (CALCULATE(SUM(Resultados_2da_vuelta_Version_PCM[VOTOS_P2]))
+ CALCULATE(SUM(Resultados_2da_vuelta_Version_PCM[VOTOS_P1])))
CALCULATE is necessary for the scope of summing, as the options on the dashboard can and will change. So every time the options change, the values have to be recalculated.
The learnings
We have managed to create a simple dashboard with 2 datasets, driven by a data model that has 1 relationship. We have also created new attributes based on existing values. Those values allowed us to create a hierarchy for drill down capabilities. We have also created a measure that help us calculate percents, based on filters selected.
The PowerBI dashboard has a few more objects. You can go ahead and download to interact with it, and extend as needed.