SHA256 SQL Server & Spark

The Problem

It didn’t match. Damn it. Why not? So the issue was having to move a workload from SQL Server, to Spark/Databricks. One of the columns on the table used SHA256 encryption to create a key column on a table. So SQL Server output this:

SELECT HASHBYTES('SHA2_256', cast('248' as NVARCHAR(8))) AS Key Result: 0x301DB0B8C94E22CC954E83F47964983011330F98A2A48BBC108FCC30F8445EBF

Spark/Databricks output this:
SELECT SHA2(ENCODE('248','UTF-8'), 256) AS Key Result:766cb53c753baedac5dc782593e04694b3bae3aed057ac2ff98cc1aef6413137

Ah right, I can see the issue, I should use UTF-16

SELECT SHA2(ENCODE('248','UTF-16'), 256) AS Key Result:cfa15dd8fe9e7262a7769904aff8fa34e3e494ad62eaa1c9eaeff0e729c1eb2e

Right, OK, that is not right. I was looking like I had to regenerate the whole column, and then the foreign keys etc through the data warehouse. Not a easy thing to do, as it was massive.

After some messing around with encoding and stuff I found the answer…. by asking a colleague! Thanks Sean!

To match the encoding with SQL Server you have to use a variant of UTF-16, UTF-16LE

SELECT SHA2(ENCODE('248','UTF-16LE'), 256) AS Key Result: 301DB0B8C94E22CC954E83F47964983011330F98A2A48BBC108FCC30F8445EBF

Right so now it matches, but why? UTF-16LE is used normally used in Windows environments, so I suspect that this has been carried over to SQL Server. Spark looks like underlying cluster doesn’t, maybe something to do with Java or it running on Linux. I haven’t found a good answer to why, but if you know, let me know!

SHA256 SQL Server & Spark

The Problem

Share this:

Related

Leave a comment Cancel reply