Skip to content

DuckDB Spark API is incompatible with the PySpark API's spark.createDataFrame(list of dict) #183

Description

@asddfl

What happens?

The DuckDB Spark API is incompatible with the PySpark API's spark.createDataFrame(list of dict) method.

To Reproduce

from duckdb.experimental.spark.sql import SparkSession as DuckdbSparkSession
from pyspark.sql import SparkSession

sql_text = "SELECT * FROM t0"
data = [
    {"c0": "1969-12-21"}
]
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data)
df.createOrReplaceTempView("t0")

print("PySpark SQL result:")
pyspark_result = spark.sql(sql_text)
pyspark_result.show()

duckdb_spark = DuckdbSparkSession.builder.getOrCreate()
df = duckdb_spark.createDataFrame(data)
df.createOrReplaceTempView("t0")

print("Duckdb Spark SQL result: ")
duckdb_spark_result = duckdb_spark.sql(sql_text)
duckdb_spark_result.show()
PySpark SQL result:
+----------+                                                                    
|        c0|
+----------+
|1969-12-21|
+----------+

Duckdb Spark SQL result: 
┌─────────┐
│  col0   │
│ varchar │
├─────────┤
│ c0      │
└─────────┘

OS:

x86_64 Ubuntu 24.04 Linux-6.14.0-35-generic-x86_64-with-glibc2.39

DuckDB Version:

1.4.2

DuckDB Client:

Python

Hardware:

No response

Full Name:

asddfl

Affiliation:

xxx

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • Yes, I have

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant data sets for reproducing the issue?

Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions