https://pytroubles.com/en/posts/id2147-define-array-struct-columns-in-aws-glue-cdk-aws-glue-alpha-using-type-descriptors

Define array<struct> Columns in AWS Glue (CDK aws_glue_alpha) Using Type Descriptors

Array of Struct in AWS Glue (CDK aws_glue_alpha): Model array<struct> Columns via inputString and isPrimitive

Define array<struct> Columns in AWS Glue (CDK aws_glue_alpha) Using Type Descriptors

Define array<struct> columns in AWS Glue using CDK aws_glue_alpha: build the struct once, pass inputString and isPrimitive to array(), and keep schemas precise.

2025-11-28T09:00:11+03:00

2025-11-28T09:00:12+03:00

Defining a column as array of struct in an AWS Glue table sounds trivial until you touch the current aws_glue_alpha construct. The API lets you describe types, but the array-of-struct shape isn’t exposed as a direct struct-in-array call in the way many expect. If you tried to nest struct(...) straight into array(...), you likely hit a wall.What we want to expressThe construct clearly allows the array shell, but only when you feed it the low-level type description. It looks like this:glue.Schema.array( input_string='struct', is_primitive=False, ) What many of us naturally reach for instead is a nested form that mirrors the logical shape of the data:glue.Schema.array( glue.Schema.struct( columns=[...] ) ) That intention is correct, but the construct does not accept a struct instance directly in this spot.Why this happensThe key is what the type builders actually return. Each schema helper yields a Type object that carries two fields: inputString and isPrimitive. The array(...) factory consumes those properties to describe the element type. In other words, array(...) does not take a nested struct builder call; it expects the underlying type descriptor it would otherwise derive from that struct.Working approachFirst, build the struct once and hold onto its descriptor. Then pass the two properties to array(...). That’s all the array constructor needs to represent array<struct<...>> correctly.# build the struct shape once compound_shape = glue.Schema.struct( columns=[...] ) # feed its descriptor to array(...) array_of_compound = glue.Schema.array( input_string=compound_shape.input_string, is_primitive=compound_shape.is_primitive, ) This retains the exact semantics: the array element type is the struct you defined, and you don’t duplicate the schema or fall back to a stringly-typed hack.But I’ve seen code that looks like array(struct(...)) — why does it work?When code like that appears to work, what matters under the hood is precisely the pair inputString and isPrimitive emitted by the inner struct builder. The array type factory operates on those two fields. Making them explicit removes ambiguity and matches how the construct represents types internally.Why this detail mattersSchema definitions are long-lived and foundational. Getting the array-of-struct shape right at construction time avoids drift between intended structure and what the catalog actually stores. It also keeps your CDK code resilient to changes in how helpers validate arguments, because you pass exactly what the array factory expects.TakeawaysModel the struct once, then hand its type descriptor to the array factory. If you find yourself trying to nest builders directly, stop and expose the two fields the array(...) call needs. This keeps your Glue table schema precise, explicit, and future-proof within the current aws_glue_alpha behavior.

AWS Glue, CDK, aws_glue_alpha, array of struct, array<struct>, Glue table schema, schema builder, inputString, isPrimitive, type descriptor, struct columns, AWS Glue CDK

2025

2025, Nov 28 09:00

Array of Struct in AWS Glue (CDK aws_glue_alpha): Model array<struct> Columns via inputString and isPrimitive

Define array<struct> columns in AWS Glue using CDK aws_glue_alpha: build the struct once, pass inputString and isPrimitive to array(), and keep schemas precise.

What we want to express

The construct clearly allows the array shell, but only when you feed it the low-level type description. It looks like this:

glue.Schema.array(
    input_string='struct',
    is_primitive=False,
)

What many of us naturally reach for instead is a nested form that mirrors the logical shape of the data:

glue.Schema.array(
    glue.Schema.struct(
        columns=[...]
    )
)

That intention is correct, but the construct does not accept a struct instance directly in this spot.

Why this happens

The key is what the type builders actually return. Each schema helper yields a Type object that carries two fields: inputString and isPrimitive. The array(...) factory consumes those properties to describe the element type. In other words, array(...) does not take a nested struct builder call; it expects the underlying type descriptor it would otherwise derive from that struct.

Working approach

First, build the struct once and hold onto its descriptor. Then pass the two properties to array(...). That’s all the array constructor needs to represent array<struct<...>> correctly.

# build the struct shape once
compound_shape = glue.Schema.struct(
    columns=[...]
)
# feed its descriptor to array(...)
array_of_compound = glue.Schema.array(
    input_string=compound_shape.input_string,
    is_primitive=compound_shape.is_primitive,
)

This retains the exact semantics: the array element type is the struct you defined, and you don’t duplicate the schema or fall back to a stringly-typed hack.

But I’ve seen code that looks like array(struct(...)) — why does it work?

When code like that appears to work, what matters under the hood is precisely the pair inputString and isPrimitive emitted by the inner struct builder. The array type factory operates on those two fields. Making them explicit removes ambiguity and matches how the construct represents types internally.

Why this detail matters

Schema definitions are long-lived and foundational. Getting the array-of-struct shape right at construction time avoids drift between intended structure and what the catalog actually stores. It also keeps your CDK code resilient to changes in how helpers validate arguments, because you pass exactly what the array factory expects.

Takeaways

Model the struct once, then hand its type descriptor to the array factory. If you find yourself trying to nest builders directly, stop and expose the two fields the array(...) call needs. This keeps your Glue table schema precise, explicit, and future-proof within the current aws_glue_alpha behavior.

amazon-web-services aws-cdk aws-glue python