https://pytroubles.com/en/posts/id1770-fixing-list-object-has-no-attribute-to-in-pytorch-dataloader-via-s3torchconnector-on-sagemaker

Fixing 'list' object has no attribute 'to' in PyTorch DataLoader via s3torchconnector on SageMaker

How to Resolve PyTorch 'list' object has no attribute 'to' When Loading Batches from S3 with s3torchconnector

Fixing 'list' object has no attribute 'to' in PyTorch DataLoader via s3torchconnector on SageMaker

Fix PyTorch DataLoader crashes on SageMaker: s3torchconnector S3MapDataset.from_prefix may yield list-like batches. Normalize and move Tensors to device.

2025-11-09T07:00:08+03:00

2025-11-09T07:00:09+03:00

Training directly from S3 is convenient until the first batch crashes with a cryptic attribute error. If a pipeline expects a single Tensor per batch but the data source yields a list-like structure, a simple device move like samples.to(device) will fail. That’s exactly what can happen when using s3torchconnector.S3MapDataset.from_prefix in a SageMaker environment.Repro: when a batch isn’t a TensorThe following setup initializes a dataset from an S3 prefix and feeds it to a standard DataLoader. The model and loop are typical, but the critical part is that the incoming sample isn’t always a Tensor; with S3 it may be a list-like object, which triggers 'list' object has no attribute 'to'.from PIL import Image import torch import torchvision from torchvision import transforms import s3torchconnector # image transform returning an identifier and a float32 tensor def fetch_img(obj_ref): pic = Image.open(obj_ref) resizer = transforms.Resize(size=(224, 224)) pic = resizer(pic) pic = transforms.functional.pil_to_tensor(pic) return (obj_ref.key, torchvision.transforms.functional.convert_image_dtype(pic, dtype=torch.float32)) # S3-backed dataset train_ds = s3torchconnector.S3MapDataset.from_prefix( cfg.IMAGES_URI, region=cfg.REGION, transform=fetch_img, ) # DataLoader train_loader = torch.utils.data.DataLoader( train_ds, sampler=train_sampler, batch_size=cfg.batch_size, num_workers=cfg.num_workers, pin_memory=cfg.pin_mem, drop_last=True, ) # Model model = models_mae.__dict__[cfg.model](norm_pix_loss=cfg.norm_pix_loss) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) # Training loop (fragment) for ep in range(cfg.start_epoch, cfg.epochs): if cfg.distributed: train_loader.sampler.set_epoch(ep) for batch in train_loader: samples = batch samples = samples.to(device, non_blocking=True) # raises: 'list' object has no attribute 'to' What’s going onThe data pipeline is mixing two expectations. The training step assumes a Tensor so it can call .to(device, non_blocking=True). With data coming via s3torchconnector.S3MapDataset.from_prefix, the sample can arrive as a list-like container instead of a Tensor. As soon as .to is invoked on that list, the attribute error appears. The same loop can work with a local dataset where the sample is already a Tensor, which is why the issue only surfaces after switching to S3.Fix: normalize the batch right before moving it to deviceThe practical resolution is to coerce the incoming batch to the expected Tensor. When the batch is a list, the usable Tensor is at index 1; otherwise it’s already a Tensor. This conditional move keeps both local and S3 training paths working.for ep in range(cfg.start_epoch, cfg.epochs): if cfg.distributed: train_loader.sampler.set_epoch(ep) for batch in train_loader: inputs = batch if type(inputs) == list: inputs = inputs[1].to(device, non_blocking=True) else: inputs = inputs.to(device, non_blocking=True) # proceed with forward/backward using `inputs` Why this mattersConsistency of the batch interface is essential when swapping data sources. A training step generally codifies assumptions about input shape and type. If one source yields a Tensor and another produces a list-like wrapper, the same model code will behave differently. Normalizing the batch at the handoff point ensures the loop can run against a large S3-hosted dataset as reliably as it does against locally stored files in SageMaker.TakeawaysIf a training run fails with 'list' object has no attribute 'to' after switching to s3torchconnector, check what the batch actually contains and convert it to a single Tensor before calling .to(...). A compact type check that selects the correct element from the list and moves it to the device restores compatibility for both local datasets and S3-backed loading, enabling the model to read and train on a list-backed sample without further changes to the rest of the training loop.

PyTorch, DataLoader, SageMaker, S3, s3torchconnector, S3MapDataset.from_prefix, AttributeError, list object has no attribute 'to', batch Tensor, training loop, device move, normalize batch

2025

2025, Nov 09 07:00

How to Resolve PyTorch 'list' object has no attribute 'to' When Loading Batches from S3 with s3torchconnector

Fix PyTorch DataLoader crashes on SageMaker: s3torchconnector S3MapDataset.from_prefix may yield list-like batches. Normalize and move Tensors to device.

Repro: when a batch isn’t a Tensor

The following setup initializes a dataset from an S3 prefix and feeds it to a standard DataLoader. The model and loop are typical, but the critical part is that the incoming sample isn’t always a Tensor; with S3 it may be a list-like object, which triggers 'list' object has no attribute 'to'.

from PIL import Image
import torch
import torchvision
from torchvision import transforms
import s3torchconnector
# image transform returning an identifier and a float32 tensor
def fetch_img(obj_ref):
    pic = Image.open(obj_ref)
    resizer = transforms.Resize(size=(224, 224))
    pic = resizer(pic)
    pic = transforms.functional.pil_to_tensor(pic)
    return (obj_ref.key, torchvision.transforms.functional.convert_image_dtype(pic, dtype=torch.float32))
# S3-backed dataset
train_ds = s3torchconnector.S3MapDataset.from_prefix(
    cfg.IMAGES_URI,
    region=cfg.REGION,
    transform=fetch_img,
)
# DataLoader
train_loader = torch.utils.data.DataLoader(
    train_ds,
    sampler=train_sampler,
    batch_size=cfg.batch_size,
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_mem,
    drop_last=True,
)
# Model
model = models_mae.__dict__[cfg.model](norm_pix_loss=cfg.norm_pix_loss)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Training loop (fragment)
for ep in range(cfg.start_epoch, cfg.epochs):
    if cfg.distributed:
        train_loader.sampler.set_epoch(ep)
    for batch in train_loader:
        samples = batch
        samples = samples.to(device, non_blocking=True)  # raises: 'list' object has no attribute 'to'

What’s going on

The data pipeline is mixing two expectations. The training step assumes a Tensor so it can call .to(device, non_blocking=True). With data coming via s3torchconnector.S3MapDataset.from_prefix, the sample can arrive as a list-like container instead of a Tensor. As soon as .to is invoked on that list, the attribute error appears. The same loop can work with a local dataset where the sample is already a Tensor, which is why the issue only surfaces after switching to S3.

Fix: normalize the batch right before moving it to device

The practical resolution is to coerce the incoming batch to the expected Tensor. When the batch is a list, the usable Tensor is at index 1; otherwise it’s already a Tensor. This conditional move keeps both local and S3 training paths working.

for ep in range(cfg.start_epoch, cfg.epochs):
    if cfg.distributed:
        train_loader.sampler.set_epoch(ep)
    for batch in train_loader:
        inputs = batch
        if type(inputs) == list:
            inputs = inputs[1].to(device, non_blocking=True)
        else:
            inputs = inputs.to(device, non_blocking=True)
        # proceed with forward/backward using `inputs`

Why this matters

Consistency of the batch interface is essential when swapping data sources. A training step generally codifies assumptions about input shape and type. If one source yields a Tensor and another produces a list-like wrapper, the same model code will behave differently. Normalizing the batch at the handoff point ensures the loop can run against a large S3-hosted dataset as reliably as it does against locally stored files in SageMaker.

Takeaways

If a training run fails with 'list' object has no attribute 'to' after switching to s3torchconnector, check what the batch actually contains and convert it to a single Tensor before calling .to(...). A compact type check that selects the correct element from the list and moves it to the device restores compatibility for both local datasets and S3-backed loading, enabling the model to read and train on a list-backed sample without further changes to the rest of the training loop.

The article is based on a question from StackOverflow by AlternativeWaltz and an answer by AlternativeWaltz.

amazon-s3 amazon-sagemaker database python pytorch