2026, Jan 10 21:00

Cloning Oracle Tables with pandas and SQLAlchemy Without Type Drift: CTAS vs to_sql

Learn why pandas to_sql inflates Oracle types (NUMBER to NUMBER(19), VARCHAR2 to CLOB, fixed-scale to FLOAT) and how CTAS preserves schema. Safely clone tables

Cloning an Oracle table with pandas often looks straightforward until column types come back wrong. If you read SCOTT.EMP into a DataFrame and write it back with to_sql, you may see NUMBER(4) inflated to NUMBER(19), VARCHAR2 columns turning into CLOB, and fixed-scale NUMBER(7,2) converted to FLOAT(126). Here is how this happens in practice and how to avoid it cleanly.

Reproducing the issue

The goal is to create a new table TEST with the same structure as EMP using pandas and SQLAlchemy. The source metadata is the classic Oracle EMP definition:

DESC EMP:
Name     Null?    Type         
-------- -------- ------------ 
EMPNO    NOT NULL NUMBER(4)    
ENAME             VARCHAR2(10) 
JOB               VARCHAR2(9)  
MGR               NUMBER(4)    
HIREDATE          DATE         
SAL               NUMBER(7,2)  
COMM              NUMBER(7,2)  
DEPTNO            NUMBER(2)

A typical pandas workflow might look like this:

import pandas as pd
import oracledb
from sqlalchemy import create_engine
import sqlalchemy as sqla

# create engine
db_engine = create_engine("oracle+oracledb://scott:tiger@LAPTOP-QMH68LT9:1521?service_name=ORCLPDB")

# query Oracle and put the result into a pandas DataFrame
frame = pd.read_sql('SELECT * FROM EMP', db_engine)

# copy the data
frame_clone = frame.copy()

# attempt to create TEST with explicit float mappings for some columns
frame_clone.to_sql(
    'test',
    con=db_engine,
    if_exists='replace',
    index=False,
    dtype={'sal': sqla.types.Float, 'mgr': sqla.types.Float, 'comm': sqla.types.Float}
)

After writing, the resulting Oracle table might look like this:

DESC TEST        
Name     Null? Type       
-------- ----- ---------- 
EMPNO          NUMBER(19) 
ENAME          CLOB       
JOB            CLOB       
MGR            FLOAT(126) 
HIREDATE       DATE       
SAL            FLOAT(126) 
COMM           FLOAT(126) 
DEPTNO         NUMBER(19)

Notable differences are exactly the ones you were trying to avoid: precision widening for integers, VARCHAR2 converted to CLOB, and fixed-scale numbers converted to binary floats.

Why the types drift

The behavior stems from crossing the boundary between Oracle types and Python/pandas types via SQLAlchemy. In this path, pandas infers dtypes from the DataFrame and SQLAlchemy maps those to database column types. When you explicitly set sal, mgr and comm as Float, you deliberately steer the result toward FLOAT(126). Meanwhile, inference for integer-like columns and strings can lead to broader defaults such as NUMBER(19) and CLOB.

At a high level the why is because Python floats don't work like Oracle number.

That mismatch is enough to cause the precision/scale loss you observed. Attempting to tune every column by hand becomes fragile and error-prone.

The robust fix: do it in SQL

If the task is to clone a table with identical datatypes, the simplest and most reliable method is to let Oracle handle it natively. Creating the copy with SQL avoids copying data to Python and back, and it eliminates the datatype translation issues entirely.

CREATE TABLE new_emp AS
SELECT *
FROM emp;

This CTAS approach is designed for exactly this scenario and preserves the column definitions as expected for a straightforward clone.

Running CTAS from application code

If you still want to trigger the operation from Python, execute the same SQL from your app without sending the data through pandas.

Through SQLAlchemy:

from sqlalchemy.orm import Session

with Session(db_engine) as sess:
    raw_conn = sess.connection()
    db_cur = raw_conn.connection.cursor()
    sql_text = """create table new_emp as select * from emp"""
    db_cur.execute(sql_text)

Or directly with python-oracledb:

import getpass
import oracledb

user_name = 'cj'
conn_str = 'localhost/orclpdb1'
secret = getpass.getpass(f'Enter password for {user_name}@{conn_str}: ')

with oracledb.connect(user=user_name, password=secret, dsn=conn_str) as conn:
    with conn.cursor() as cur:
        sql_text = """create table new_emp as select * from emp"""
        cur.execute(sql_text)

Why this matters

Cloning with SQL avoids unnecessary data movement and keeps all type semantics inside the database, where they belong. You sidestep the float vs NUMBER mismatch, prevent VARCHAR2 from turning into CLOB, and keep numeric precision and scale intact. This also reduces complexity in your Python layer and helps keep operational workflows predictable.

Conclusion

When you need a like-for-like copy of an Oracle table, rely on a CTAS statement. It is efficient, avoids pandas and SQLAlchemy type inference pitfalls, and preserves datatypes as expected. Use pandas only when you truly need DataFrame operations; for schema-accurate clones, keep the operation in SQL and call it from your application code if needed.