Re: Double insertion from scala spark job
От | Dave Cramer |
---|---|
Тема | Re: Double insertion from scala spark job |
Дата | |
Msg-id | CADK3HH+ycBStbXk2KjrFWt+-17gS0WE0t_KmKFxNhcFoGUfAwA@mail.gmail.com обсуждение исходный текст |
Ответ на | Double insertion from scala spark job (Antoine DUBOIS <antoine.dubois@cc.in2p3.fr>) |
Ответы |
Re: Double insertion from scala spark job
|
Список | pgsql-jdbc |
On Tue, 9 Feb 2021 at 06:48, Antoine DUBOIS <antoine.dubois@cc.in2p3.fr> wrote:
HelloI'm working with spark and postgresql to compute stat.I came to encounter a strange behaviour in my job, when working with postgresql output I sometime have a double insertion happenning into my table (and violating constraint).Detail: Key (xxx, xxx, xxx, xxx, xxx, xxxx, xxxx)=(2021-02-05 00:00:00, data, moredate, evenmoredata, somuchmoredata, dataagain, somuchofit) already exists. Call getNextException to see other errors in the batch.
My data are generated as duplicate if I write the same data into mysql or into a parquet file with the same input and treatment I don't observe this behaviour.Dev spec:Scala 2.12Spark Version 3.0.1JDK 8jdbc "org.postgresql" % "postgresql" % "42.2.18"PostgreSQL 12.5My code is pretty simple and apply a SQL request to a parquet file and write the result like this :outputDF.write.format("jdbc").option("driver", "org.postgresql.Driver
").option("url", "jdbc:postgresql://<HOST>:<PORT>/<SCHEMA>?user=<USERNAME>&password=<PASSWORD>
").option("dbtable", "mytable").mode(append).save()What lead me to think it's a postgres jdbc bug more than anything else is the fact that this same command to output in mysql or in a parquet file produce no duplicate in this particular edge case i have with only some of my input files.If any of you had any idea what could cause such a behavior (special char in the input, misconfigured something, maybe an option I don't know could help solving this issue )
I came to a point where I'm not sure of anything any longer.Hope anyone will have some though about it.
You are the first person to report such a problem.
without additional information such as your code, there's little we can do.
Dave Cramer
www.postgres.rocks
В списке pgsql-jdbc по дате отправления: