Create table as select (CTAS)

Emulation context

In Azure Synapse Analytics (dedicated SQL pools), the CREATE TABLE AS SELECT (CTAS) statement is commonly used to create and populate new tables in a single operation. CTAS is optimized for data loading, transformation workflows, and is widely applied for:

  • Materializing query results into permanent tables

  • Repartitioning or transforming existing tables

  • Duplicating subsets of data for downstream processing

  • Replacing multi-step CREATE TABLE + INSERT logic with a single statement

For example:

CREATE TABLE SalesSummary
WITH (
    DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
    SalesPersonID,
    SUM(TotalDue) AS TotalSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID;

Synapse supports additional WITH (...) clauses for:

  • Distribution strategies (ROUND_ROBIN, HASH(column), REPLICATE)

  • Indexing hints such as CLUSTERED COLUMNSTORE INDEX

In Microsoft Fabric Warehouse, CTAS is supported in a more streamlined form. While the CREATE TABLE AS SELECT syntax itself is valid, distribution options and index hints from Synapse are not supported. Fabric infers column names, data types, and nullability directly from the SELECT output, and does not allow explicit column definitions in CTAS. In Fabric, CTAS is a logical operation without distribution metadata or physical indexing.


Emulation strategy

SQL Tran emulates Synapse CTAS in Fabric by preserving valid SQL and stripping unsupported metadata. The emulation process includes:

  • Preserving the CREATE TABLE AS SELECT structure.

  • Commenting out all WITH (...) clauses, including:

    • DISTRIBUTION = HASH(...)

    • DISTRIBUTION = ROUND_ROBIN

    • DISTRIBUTION = REPLICATE

    • CLUSTERED COLUMNSTORE INDEX

  • Retaining full SELECT logic, including:

    • Expressions such as CAST(...), UPPER(...), CASE, and RANK()

    • Joins, filters (WHERE), grouping, and window functions

    • Set operations (UNION ALL), subqueries, and nested aggregates

  • Supporting CTAS inside stored procedures without rewriting it as separate CREATE TABLE and INSERT operations.

  • Inserting /* SQLTRAN FIX */ comments next to transformed clauses for traceability.

This strategy ensures that the translated CTAS is valid and executable in Fabric, while clearly marking any removed or unsupported Synapse-specific metadata for user review.


Code example

Synapse Analytics:

CREATE TABLE CustomerSummary
WITH (
	DISTRIBUTION = HASH(CustomerID),
	CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT
	CustomerID,
	COUNT(*) AS OrderCount,
	SUM(TotalDue) AS TotalAmount
FROM Sales.SalesOrderHeader
GROUP BY CustomerID;

Fabric Warehouse (generated by SQL Tran):

CREATE TABLE CustomerSummary
/* SQLTRAN FIX: WITH (
	DISTRIBUTION = HASH(CustomerID),
	CLUSTERED COLUMNSTORE INDEX
) */
AS
SELECT
	CustomerID,
	COUNT(*) AS OrderCount,
	SUM(TotalDue) AS TotalAmount
FROM Sales.SalesOrderHeader
GROUP BY CustomerID;
CTAS emulation in SQL Tran

Important notes

  • Distribution metadata removed: WITH (DISTRIBUTION = ...) clauses are not supported in Fabric and are commented out by SQL Tran.

  • Index options not supported: Indexing hints like CLUSTERED COLUMNSTORE INDEX are removed and annotated with /* SQLTRAN FIX */.

  • Column definitions inferred: Fabric does not allow explicit column declarations in CTAS. All column names, data types, and nullability are inferred from the SELECT clause. Any change to expressions or column order in the SELECT may alter the resulting schema.

  • Query logic fully preserved: SQL Tran retains all query logic including casting, expressions, joins, filters, grouping, window functions, and set operations.

  • Works inside procedures: CTAS within stored procedures is supported and preserved without decomposition.

  • Manual review recommended: If performance or behavior in Synapse relied on distribution or indexing strategies, users should reimplement these using Fabric-compatible constructs (e.g., partitioned tables, post-CTAS indexing, or ingestion tuning).

Last updated