Home » SQL & PL/SQL » SQL & PL/SQL » calculate a function on whole table (oracle 11g)  () 1 Vote
calculate a function on whole table [message #673189] Sat, 10 November 2018 14:11 Go to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
Hello,

I have a sql related question just to know if this is possible or not.

I have a table with many number of columns and different datatypes namely varchar, date, number etc.

is it possible to apply hash on whole table

what i mean is this for example if i have a file i can use md5sum in unix to get the whole file hash value in the same way in oracle i can get hash of each row by using dbms.obfuscation package and passing the columns inside

but what if i want the hash for a set of rows is it possible.

Thanks for time.
Re: calculate a function on whole table [message #673190 is a reply to message #673189] Sat, 10 November 2018 14:23 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
ramkumar10 wrote on Sat, 10 November 2018 12:11
Hello,

I have a sql related question just to know if this is possible or not.

I have a table with many number of columns and different datatypes namely varchar, date, number etc.

is it possible to apply hash on whole table

what i mean is this for example if i have a file i can use md5sum in unix to get the whole file hash value in the same way in oracle i can get hash of each row by using dbms.obfuscation package and passing the columns inside

but what if i want the hash for a set of rows is it possible.

Thanks for time.

Possible? Yes.
It is possible to establish a Virtual Column that contains a hash of that row columns.
Doing so will be VERY resource intensive since EVERY subsequent DML requires new hash to be computed.
Then you could compute new hash containing as many rows as you desire.
Should you? Possibly NOT.

What problem are you really trying to solve?
Re: calculate a function on whole table [message #673193 is a reply to message #673190] Sat, 10 November 2018 15:14 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
Doing so will be VERY resource intensive since EVERY subsequent DML requires new hash to be computed.
Well, if it is a virtual column only queries (or WHERE clauses) using this column will.

Anyway, this may indeed not be the best way to fulfill the requirements.

Re: calculate a function on whole table [message #673194 is a reply to message #673190] Sat, 10 November 2018 15:40 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
Thanks for the reply well I am extracting data from an oracle table and loading the data into flatfiles using different etl tools. here I am tasked with the burden of proving that there has not been any content change while extracting and loading into flat files so my initial thought was to hash the each row and store it as a virtual column i know this is very intensive so i thought why not hash the whole table as I do with flat files on unix. my little understanding of database tables is effectively backend everything is a file so thought cross to me why not hash the whole table to show that there is no data changed while etl.

Thanks hopefully My explanation is clear.
Re: calculate a function on whole table [message #673197 is a reply to message #673194] Sun, 11 November 2018 01:09 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Just make hashes on each row or globally on all rows won't work as hash functions work on bytes not on external (human) values.
So if you have not the same character set in the file and in the database then there will have changes in bytes; numbers and dates are stored in an internal format not on human one.
So, if you want to compare hashes, you have to convert each column value in the EXACT same format that it was in the file before hashing. Are you sure you are able to do that? Are you sure you know the EXACT format of each value in the file?

[Edit: English]

[Updated on: Sun, 11 November 2018 01:48]

Report message to a moderator

Re: calculate a function on whole table [message #673198 is a reply to message #673194] Sun, 11 November 2018 01:38 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
Quote:
I am tasked with the burden of proving that there has not been any content change while extracting and loading into flat files
I think that you have been asked to do something that is impossible, because there will be content change as a part of this process. Indeed, the whole process is designed to do content change: converting data from one format to another. Many sites rely on a simple row count check which will survive the process (read consistency permitting) would that not be adequate?
Re: calculate a function on whole table [message #673200 is a reply to message #673198] Sun, 11 November 2018 11:26 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
John Watson wrote on Sun, 11 November 2018 01:38
Quote:
I am tasked with the burden of proving that there has not been any content change while extracting and loading into flat files
I think that you have been asked to do something that is impossible, because there will be content change as a part of this process. Indeed, the whole process is designed to do content change: converting data from one format to another. Many sites rely on a simple row count check which will survive the process (read consistency permitting) would that not be adequate?
Technically yes you are right and that was my initial solution but now I have been asked for content validation as well which in my particular case is very hard as I am using a single job to extract from many tables dynamically.

But I agree with you there must be more discussion with my team on this requirement.
Re: calculate a function on whole table [message #673201 is a reply to message #673197] Sun, 11 November 2018 11:29 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
Michel Cadot wrote on Sun, 11 November 2018 01:09

Just make hashes on each row or globally on all rows won't work as hash functions work on bytes not on external (human) values.
So if you have not the same character set in the file and in the database then there will have changes in bytes; numbers and dates are stored in an internal format not on human one.
So, if you want to compare hashes, you have to convert each column value in the EXACT same format that it was in the file before hashing. Are you sure you are able to do that? Are you sure you know the EXACT format of each value in the file?

[Edit: English]
I am planning to hash each row and from table and make hashes from each row of the file and try to compare them and see if i can get a match which i highly doubt as you pointed out I should know the exact datatype of each column but it is worth a shot and I will follow up here with the results i get.
Re: calculate a function on whole table [message #673202 is a reply to message #673201] Sun, 11 November 2018 11:58 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Not only datatype but also format.
For example, when you have 10 (number) in a field, was it "10", "10.0", "10.00", "10,00" in the file?

Not only datatype but also character set.
For example, my Windows is using code page 1252 (which is WE8MSWIN1252 in Oracle), if I have a "é" in a file in the Windows character set, its code point (its binary data) is 233 = xE9, but my database uses AL32UTF8 character set, so the code point for the same character is 195,169 (2 bytes) = xC3A9. So when when an insert a "é" in the database, Oracle inserts 2 bytes 195 and 169, then you see you cannot compare the MD5 values between the 2 binary values which are for the same character.

[Updated on: Sun, 11 November 2018 12:00]

Report message to a moderator

Re: calculate a function on whole table [message #673233 is a reply to message #673202] Mon, 12 November 2018 10:56 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
That's a great point did not notice it until you pointed it out. When a file is generated number 10 inside the number file is 000000000000000010 some thing like this based on the precision defined in the database so hash values of the table and the script will never be equal.

Thanks for the insights.
Re: calculate a function on whole table [message #673257 is a reply to message #673189] Tue, 13 November 2018 13:50 Go to previous messageGo to next message
JPBoileau
Messages: 88
Registered: September 2017
Member
While not the most efficient way, I would possibly consider this:

Extract the current data to a flatfile.
Manipulate the data.
Load the flatfile back into a staging table and verify that the data is consistent with the original table after manipulation.

JP
Re: calculate a function on whole table [message #673259 is a reply to message #673257] Tue, 13 November 2018 18:41 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
JPBoileau wrote on Tue, 13 November 2018 13:50
While not the most efficient way, I would possibly consider this:

Extract the current data to a flatfile.
Manipulate the data.
Load the flatfile back into a staging table and verify that the data is consistent with the original table after manipulation.

JP
1.I have only read access on database so writing back to a table is out of question
2.I would not like to manipulate the data in flat file as it defeats the point of content validation.
3.I know I am contradicting myself but I think it is neither wise nor possible to compare content between a table and file without manipulation so that answers my question.

Thanks for the suggestion's cheers.
Re: calculate a function on whole table [message #673317 is a reply to message #673259] Wed, 14 November 2018 11:54 Go to previous messageGo to next message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
How about a trigger on the table that fired on insert/update/delete that captures the primary key or unique key along with the type of event and a timestamp of when it happened and a unique sequence. This is what we do to maintain alignment between an oracle database and an external nosql data warehouse (snowflake). This allows us to only push out the changes to the external system without having to do a mass load
Re: calculate a function on whole table [message #673364 is a reply to message #673317] Fri, 16 November 2018 11:33 Go to previous messageGo to next message
alvalongo
Messages: 3
Registered: January 2017
Location: Colombia
Junior Member
When you write a flat file every column has some kind of transformation.
You need to define consistent transformations according column type, business rules and type of flat file,
for example:
--
1-NUMBER:
always use TO_CHAR() including the format parameter (2nd argument) and nls parameter.(3rd argument)
for example:
to_number(id,'fm999999999d99','NLS_NUMERIC_CHARACTERS=''.,''')
--
2-DATE, always use ISO-8601 International Format:
TO_CHAR( <COLUMNA_DATE>, 'yyyy-mm-dd hh24:mi:ss')
--
3-VARCHAR2:
replace single quotes (ascii code 34) with two quotes and use quote before and after string
if spaces before and after are significant use quote before and after string.
--
What is the character type of your database?
Information is always in English or use others languages, for example western Europe (Spanish, Italia, etc).


Then build a package in PL/SQL with functions for every kind of transformation.

For hash vallue use the ORA_HASH function as in this example, and append as a last column on every line written.

create table call_detail_01
(id             number(6) primary key,
 mobile_number  varchar2(20),
 call_date      date,
 city           varchar2(10),
 money          number(10,2)
)


insert into call_detail_01
values (1,'3054374200',to_date('2018-08-01 10:40:27','yyyy-mm-dd hh24:mi:ss'),'new york',20.5);


commit;

If the flat file is in CSV (comma-separated values) format a transformation is:

select ora_hash(       to_number(a.id,'fm999999999')
                ||','||a.mobile_number
                ||','||to_char(a.call_date,'yyyy-mm-dd hh24:mi:ss')
                ||','||chr(34)||replace(a.city,chr(34),chr(34)||chr(34))||chr(34)
                ||','||to_number(id,'fm999999999d99','NLS_NUMERIC_CHARACTERS=''.,''')
               ) as hash_info
from call_detail_01 a;

HASH_INFO
577687472




Re: calculate a function on whole table [message #673365 is a reply to message #673364] Fri, 16 November 2018 12:16 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

The first problem is that the format of the original data is not known for all rows.

And what this stuff about single quotes? I don't see the point for a CSV file!

Re: calculate a function on whole table [message #673366 is a reply to message #673365] Fri, 16 November 2018 12:36 Go to previous messageGo to next message
alvalongo
Messages: 3
Registered: January 2017
Location: Colombia
Junior Member
ramkumar10 says "I have a table with many number of columns and different datatypes".
The format is for every single column not rows.
In Oracle you can know the data type of every column using ALL_TAB_COLUMNS view.
This articles explain all aspects about CSV and quotes:
https://en.wikipedia.org/wiki/Comma-separated_values
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
Re: calculate a function on whole table [message #673367 is a reply to message #673366] Fri, 16 November 2018 12:48 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
The format is for every single column not rows.
Of course.

Quote:
In Oracle you can know the data type of every column using ALL_TAB_COLUMNS view.
Yes but you don't know the original format in the file you loaded.

Quote:
This articles explain all aspects about CSV and quotes:
I don't know but smart people do not use the same character around the values and inside the values.
If you have quote in the values you don't use quote to delimit the values unless you want troubles.


I summarize.
OP has a file with data (say it is in CSV but nothing is said about original file format), he loads the file inside a table, and now wants to compare with the initial data.
Now, as I said earlier,"when you have 10 (number) in a field, was it "10", "10.0", "10.00", "10,00" in the file?". Nothing in Oracle tells you. (In the end, OP said, "When a file is generated number 10 inside the number file is 000000000000000010 some thing like this".)
So how do you rebuilt the original file if you don't know the format of each value of each column of each row/line?

Re: calculate a function on whole table [message #673368 is a reply to message #673367] Fri, 16 November 2018 13:58 Go to previous messageGo to next message
alvalongo
Messages: 3
Registered: January 2017
Location: Colombia
Junior Member
I understand the task is from data on an Oracle table to flatfile because ramkumar10 says on 11 November 2018 12:26 "I am using a single job to extract from many tables dynamically".
If it is the other way you are right.
About quotes, you can test cells with quotes on Excel or other similar spreadsheat and export to CSV to view the results.
Re: calculate a function on whole table [message #673371 is a reply to message #673317] Sat, 17 November 2018 12:03 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
Bill B wrote on Wed, 14 November 2018 11:54
How about a trigger on the table that fired on insert/update/delete that captures the primary key or unique key along with the type of event and a timestamp of when it happened and a unique sequence. This is what we do to maintain alignment between an oracle database and an external nosql data warehouse (snowflake). This allows us to only push out the changes to the external system without having to do a mass load

I only have read access to the db so triggers or procedures are not possible in my case.
Re: calculate a function on whole table [message #673372 is a reply to message #673367] Sat, 17 November 2018 12:07 Go to previous messageGo to next message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
Michel Cadot wrote on Fri, 16 November 2018 12:48

Quote:
The format is for every single column not rows.
Of course.

Quote:
In Oracle you can know the data type of every column using ALL_TAB_COLUMNS view.
Yes but you don't know the original format in the file you loaded.

Quote:
This articles explain all aspects about CSV and quotes:
I don't know but smart people do not use the same character around the values and inside the values.
If you have quote in the values you don't use quote to delimit the values unless you want troubles.


I summarize.
OP has a file with data (say it is in CSV but nothing is said about original file format), he loads the file inside a table, and now wants to compare with the initial data.
Now, as I said earlier,"when you have 10 (number) in a field, was it "10", "10.0", "10.00", "10,00" in the file?". Nothing in Oracle tells you. (In the end, OP said, "When a file is generated number 10 inside the number file is 000000000000000010 some thing like this".)
So how do you rebuilt the original file if you don't know the format of each value of each column of each row/line?


I agree with you.

The file is a flat file delimited by pipe.
i came up with a interim solution. I am going to hash each row and store it as a value in a column and when I am re loading the file to an other database most likely it is sql server although not yet decided I would write a package to get hashbytes of all columns except the hash value column and hopefully will yield the same value.

I will have the chance to test this only in jan and will update the thread with my findings.
Re: calculate a function on whole table [message #673373 is a reply to message #673364] Sat, 17 November 2018 12:10 Go to previous message
ramkumar10
Messages: 19
Registered: March 2017
Junior Member
alvalongo wrote on Fri, 16 November 2018 11:33
When you write a flat file every column has some kind of transformation.
You need to define consistent transformations according column type, business rules and type of flat file,
for example:
--
1-NUMBER:
always use TO_CHAR() including the format parameter (2nd argument) and nls parameter.(3rd argument)
for example:
to_number(id,'fm999999999d99','NLS_NUMERIC_CHARACTERS=''.,''')
--
2-DATE, always use ISO-8601 International Format:
TO_CHAR( <COLUMNA_DATE>, 'yyyy-mm-dd hh24:mi:ss')
--
3-VARCHAR2:
replace single quotes (ascii code 34) with two quotes and use quote before and after string
if spaces before and after are significant use quote before and after string.
--
What is the character type of your database?
Information is always in English or use others languages, for example western Europe (Spanish, Italia, etc).


Then build a package in PL/SQL with functions for every kind of transformation.

For hash vallue use the ORA_HASH function as in this example, and append as a last column on every line written.

create table call_detail_01
(id             number(6) primary key,
 mobile_number  varchar2(20),
 call_date      date,
 city           varchar2(10),
 money          number(10,2)
)


insert into call_detail_01
values (1,'3054374200',to_date('2018-08-01 10:40:27','yyyy-mm-dd hh24:mi:ss'),'new york',20.5);


commit;

If the flat file is in CSV (comma-separated values) format a transformation is:

select ora_hash(       to_number(a.id,'fm999999999')
                ||','||a.mobile_number
                ||','||to_char(a.call_date,'yyyy-mm-dd hh24:mi:ss')
                ||','||chr(34)||replace(a.city,chr(34),chr(34)||chr(34))||chr(34)
                ||','||to_number(id,'fm999999999d99','NLS_NUMERIC_CHARACTERS=''.,''')
               ) as hash_info
from call_detail_01 a;

HASH_INFO
577687472




I agree but my case is a little different i am loading some where around 3000 tables and also i don't have permissions to write procedures etc. What you have mentioned is ideally the way it should be handled but just not for my case.

Thanks.
Previous Topic: Calling Java function from pl/sql
Next Topic: Are following statements executed in any case if the first one throws an exception?
Goto Forum:
  


Current Time: Thu Mar 28 07:17:21 CDT 2024