Creating External Table AS (CETAS) on Azure Synapse Analytics

This is a continuation from a prior post that allows users leverage Peruvian voting data with Azure Synapse Serverless.

This post will show how to create an external table using Azure Synapse Analytics. The creation of an external table will place data inside a folder in the Data Lake Store, that has to be specified. The format can be plain text or parquet. Additional information can be found in the Microsoft documentation link.

There are 3 steps to be accomplished:

  1. Create an external data source
  2. Create an external file format
  3. Create an external table (a schema can also be created as a prior step).

Creating the external references

-- Create an External Data Source
CREATE EXTERNAL DATA SOURCE TestData
WITH ( LOCATION = 'https://account.dfs.core.windows.net/container/');

-- Create an External File Format
CREATE EXTERNAL FILE FORMAT txt  
WITH (  
    FORMAT_TYPE = DELIMITEDTEXT, 
    FORMAT_OPTIONS ( 
            FIELD_TERMINATOR = '|',
            STRING_DELIMITER = '"',
             )
    ); 

Creating the schema and external table

-- Create Schema (optional)
CREATE SCHEMA ext;

-- Create External Table as (CETAS)

CREATE EXTERNAL TABLE ext.onpecombinado
WITH (
    LOCATION = 'onpecombinado/',
    DATA_SOURCE = TestData,  
    FILE_FORMAT = txt
)  
AS
SELECT * FROM ONPE_ASSUMPTION;

In the steps above, we are using the object ONPE_ASSUMPTION that points to a view created on our prior post.

Review the data

SELECT TOP 100 * FROM ext.onpecombinado;

Visualize Objects on the Data Lake Store

Behind the scenes, Azure Synapse is creating a series of files within a folder (as specified). You can visualize right from the Synapse Workspace:

You can right click on any file and preview:

The learnings

A few steps are required to land the resultset into the Data Lake Store, and to be leveraged by any query, downloaded or used as a source for Analytics/dashboard products like PowerBI. The key elements are data source, file format and the external table.