It has been a controversial voting scenario for Peruvians, on the second round of voting Between Pedro Castillo (far left) and Keiko Fujimori (far right). There have been allegations of fraud, though no tangible evidence has been provided yet.
ONPE, the National Office of Electoral Processes of Peru, has made the voting dataset public, for both 1st and 2nd rounds. That shows the level of transparency the entity is providing, to try mitigate disinformation.
I took the data and loaded into Azure Synapse Serverless, to get a few graphs. One in particular that has skeptics pointing to “curation” for the irregular curve of voting tables with low votes for candidate Keiko Fujimori.
Loading the Dataset into Azure Data Lake Store
This process is rather trivial, and requires an active access to your Azure resources to load data into the Data Lake Store. Steps can be performed by either using Azure Storage Explorer, or leveraging your Azure Synapse Analytics Workspace.

You can then opt to preview the data by right clicking on the file:

Selecting Data from the Data Lake Store

We just need to ensure that the proper delimiters and options are in place:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://account.dfs.core.windows.net/container/ONPE/Resultados_1ra_vuelta_Version_PCM.csv',
FORMAT = 'CSV', -- File Type
PARSER_VERSION='2.0', -- Faster Parsing with CSV files
FIRSTROW = 1, -- First Row starts here
FIELDTERMINATOR = ';', -- Field Terminator
FIELDQUOTE ='"', -- Field Quote
HEADER_ROW = TRUE -- It has a header
) AS [result]
;
And the results like below:

Consolidate Data Creating a View
Make sure that you have create a new database under Synapse Serverless.
CREATE DATABASE Onpe;
ALTER DATABASE Onpe COLLATE Latin1_General_100_BIN2_UTF8;
USE Onpe;
CREATE VIEW ONPE AS
SELECT
COALESCE(r1.DEPARTAMENTO, r2.DEPARTAMENTO) AS DEPARTAMENTO,
COALESCE(r1.PROVINCIA, r2.PROVINCIA) AS PROVINCIA,
COALESCE(r1.DISTRITO, r2.DISTRITO) AS DISTRITO,
COALESCE(r1.MESA_DE_VOTACION, r2.MESA_DE_VOTACION) AS MESA_DE_VOTACION,
r1.DESCRIP_ESTADO_ACTA AS DESCRIP_ESTADO_ACTA_1ra,
r2.DESCRIP_ESTADO_ACTA AS DESCRIP_ESTADO_ACTA_2da,
r1.N_CVAS AS N_CVAS_1ra,
r2.N_CVAS AS N_CVAS_2da,
COALESCE(r1.VOTOS_P1, 0) AS VOTOS_P1,
COALESCE(r1.VOTOS_P2, 0) AS VOTOS_P2,
COALESCE(r1.VOTOS_P3, 0) AS VOTOS_P3,
COALESCE(r1.VOTOS_P4, 0) AS VOTOS_P4,
COALESCE(r1.VOTOS_P5, 0) AS VOTOS_P5,
COALESCE(r1.VOTOS_P6, 0) AS VOTOS_P6,
COALESCE(r1.VOTOS_P7, 0) AS VOTOS_P7,
COALESCE(r1.VOTOS_P8, 0) AS VOTOS_P8,
COALESCE(r1.VOTOS_P9, 0) AS VOTOS_P9,
COALESCE(r1.VOTOS_P10, 0) AS VOTOS_P10,
COALESCE(r1.VOTOS_P12, 0) AS VOTOS_P12,
COALESCE(r1.VOTOS_P13, 0) AS VOTOS_P13,
COALESCE(r1.VOTOS_P14, 0) AS VOTOS_P14,
COALESCE(r1.VOTOS_P15, 0) AS VOTOS_P15,
COALESCE(r1.VOTOS_P17, 0) AS VOTOS_P17,
COALESCE(r1.VOTOS_P18, 0) AS VOTOS_P18,
COALESCE(r1.VOTOS_P11, 0) AS FP_1ra,
COALESCE(r1.VOTOS_P16, 0) AS PL_1ra,
COALESCE(r1.VOTOS_VB, 0) AS VOTOS_VB_1ra,
COALESCE(r1.VOTOS_VN, 0) AS VOTOS_VN_1ra,
COALESCE(r2.VOTOS_P2, 0) AS FP_2da,
COALESCE(r2.VOTOS_P1, 0) AS PL_2da,
COALESCE(r2.VOTOS_VB, 0) AS VOTOS_VB_2da,
COALESCE(r2.VOTOS_VN, 0) AS VOTOS_VN_2da
FROM
OPENROWSET(
BULK 'https://account.dfs.core.windows.net/container/ONPE/Resultados_2da_vuelta_Version_PCM.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0',
FIRSTROW = 1,
FIELDTERMINATOR = ';',
FIELDQUOTE ='"',
HEADER_ROW = TRUE
) AS [r2]
FULL OUTER JOIN
OPENROWSET(
BULK 'https://account.dfs.core.windows.net/container/ONPE/Resultados_1ra_vuelta_Version_PCM.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0',
FIRSTROW = 1,
FIELDTERMINATOR = ';',
FIELDQUOTE ='"',
HEADER_ROW = TRUE
) AS [r1] ON r2.MESA_DE_VOTACION = r1.MESA_DE_VOTACION
;
Select a few rows from the view:
SELECT TOP 100 * FROM ONPE;

Recreate the curve of votes for each candidate
This is the graph that folks are pointing as anomaly for candidate Keiko Fujimori. They expect a bell curve for both candidates, and are instead getting an irregular shape for Fujimori.
-- Curva de Cantidad de mesas por numero de votos por ambos candidatos
WITH CTE_FP AS (
SELECT
FP_2da AS NUMERO_DE_VOTOS,
COUNT(MESA_DE_VOTACION) AS CANTIDAD_DEMESAS
FROM ONPE
WHERE DESCRIP_ESTADO_ACTA_2da = 'CONTABILIZADA'
GROUP BY FP_2da
), CTE_PL AS (
SELECT
PL_2da AS NUMERO_DE_VOTOS,
COUNT(MESA_DE_VOTACION) AS CANTIDAD_DEMESAS
FROM ONPE
WHERE DESCRIP_ESTADO_ACTA_2da = 'CONTABILIZADA'
GROUP BY PL_2da
)
SELECT
COALESCE(FP.NUMERO_DE_VOTOS, PL.NUMERO_DE_VOTOS) AS NUMERO_DE_VOTOS,
COALESCE(FP.CANTIDAD_DEMESAS, 0) AS CANTIDAD_DEMESAS_FP,
COALESCE(PL.CANTIDAD_DEMESAS, 0) AS CANTIDAD_DEMESAS_PL
FROM CTE_FP AS FP
FULL OUTER JOIN CTE_PL PL ON (FP.NUMERO_DE_VOTOS = PL.NUMERO_DE_VOTOS)
ORDER BY COALESCE(FP.NUMERO_DE_VOTOS, PL.NUMERO_DE_VOTOS);
If you switch to chart, then select “Area” under Chart Type, and exclude “NUMERO_DE_VOTOS” under Legend, you end up with the graph below:

The learnings
2 main things we have accomplished. The first one, on very easy steps, how to load and query data with Azure Synapse Serverless. The Second one, we can plot graphs based on query result sets that interact with data in the Data Lake Store.
Do I see an anomaly? Yes. Should it indicate fraud? Not necessarily. Will let the investigations continue.