Unicode conversion [message #548779] |
Mon, 26 March 2012 04:21 |
sstellini
Messages: 32 Registered: April 2005 Location: Malta
|
Member |
|
|
Hi there,
how can I convert incorrectly imported data into it's proper unicode format.
example:
FULL_NAME
GöRAN JOHANSSON
GÖRAN JOHANSSON
The first record is incorrectly imported and the second is how it should be looking like, if it has been properly imported.
NLS parameters are:
NLS_CHARACTERSET: WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16
In the example above, full_name is of a NVARCHAR2(100) type but the same problem applies to columns with VARCHAR2 type.
Is there a function or a peace of code I could use to convert value of the first record to be look alike of the second record?
thanks
Sonja
|
|
|
|
Re: Unicode conversion [message #548788 is a reply to message #548785] |
Mon, 26 March 2012 05:07 |
sstellini
Messages: 32 Registered: April 2005 Location: Malta
|
Member |
|
|
they had been imported in 2 ways, both resulted with the same problem:
1. via Toad's Import Table Data (Import from Text file option.. and the file was comma delimited csv file)
2. using an Oracle external table that was using csv file (organisation external type ORACLE_LOADER, comma delimited fields)
|
|
|
|
Re: Unicode conversion [message #548802 is a reply to message #548796] |
Mon, 26 March 2012 06:25 |
sstellini
Messages: 32 Registered: April 2005 Location: Malta
|
Member |
|
|
thanks Michel,
here is the external table def:
CREATE TABLE leads
(
EMAIL VARCHAR2(100 BYTE),
HTML VARCHAR2(20 BYTE),
F_NAME1 VARCHAR2(80 BYTE),
S_NAME1 VARCHAR2(50 BYTE),
URL VARCHAR2(200 BYTE),
OPTIN VARCHAR2(50 BYTE),
USERAGENT VARCHAR2(300 BYTE),
IP VARCHAR2(150 BYTE),
KEYWORD VARCHAR2(300 BYTE)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY MAILER
ACCESS PARAMETERS
( RECORDS DELIMITED BY newline
LOGFILE 'log_file.log'
FIELDS TERMINATED BY ','
missing field values are null
REJECT ROWS WITH ALL NULL FIELDS
( email CHAR(100),
html CHAR(20),
F_NAME1 CHAR(80),
s_NAME1 CHAR(50),
url CHAR(200),
optin CHAR(50),
UserAgent CHAR(300),
ip CHAR(150),
keyword CHAR(300)
)
)
LOCATION (DOTMAILER:'Leads_to_load.csv')
)
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;
Then, I was populating ORIG_LEADS table by insert into.. select from this external LEADS table.
NLS parameters are:
PARAMETER VALUE
NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET WE8MSWIN1252
NLS_COMP BINARY
NLS_CURRENCY $
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_DUAL_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_LANGUAGE AMERICAN
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS .,
NLS_RDBMS_VERSION 9.2.0.7.0
NLS_SORT BINARY
NLS_TERRITORY AMERICA
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
thanks for helping!
|
|
|
|
|
Re: Unicode conversion [message #548807 is a reply to message #548806] |
Mon, 26 March 2012 06:54 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
If the "funny" characters are in the file then they are in the datbase.
If your file is not in WE8MSWIN1252 character set, you have to tell its character set to Oracle:
SQL> CREATE TABLE leads
2 (
3 EMAIL VARCHAR2(100 BYTE),
4 HTML VARCHAR2(20 BYTE),
5 F_NAME1 VARCHAR2(80 BYTE),
6 S_NAME1 VARCHAR2(50 BYTE),
7 URL VARCHAR2(200 BYTE),
8 OPTIN VARCHAR2(50 BYTE),
9 USERAGENT VARCHAR2(300 BYTE),
10 IP VARCHAR2(150 BYTE),
11 KEYWORD VARCHAR2(300 BYTE)
12 )
13 ORGANIZATION EXTERNAL
14 ( TYPE ORACLE_LOADER
15 DEFAULT DIRECTORY FILESDIR
16 ACCESS PARAMETERS
17 ( RECORDS DELIMITED BY newline CHARACTERSET UTF8
18 LOGFILE 'log_file.log'
19 FIELDS TERMINATED BY ','
20 missing field values are null
21 REJECT ROWS WITH ALL NULL FIELDS
22 ( email CHAR(100),
23 html CHAR(20),
24 F_NAME1 CHAR(80),
25 s_NAME1 CHAR(50),
26 url CHAR(200),
27 optin CHAR(50),
28 UserAgent CHAR(300),
29 ip CHAR(150),
30 keyword CHAR(300)
31 )
32 )
33 LOCATION (FILESDIR:'Leads_to_load.csv')
34 )
35 REJECT LIMIT UNLIMITED
36 NOPARALLEL
37 NOMONITORING
38 /
Table created.
SQL> select * from leads;
no rows selected
Regards
Michel
|
|
|
|
Re: Unicode conversion [message #548811 is a reply to message #548809] |
Mon, 26 March 2012 07:08 |
sstellini
Messages: 32 Registered: April 2005 Location: Malta
|
Member |
|
|
thank you very much
I'll use that the next time I'm importing.
However, the data has been loaded in the database and to load them again would take lot of time. Is there a way to convert the 'badly loaded' data to their correct presentation?
If nothing more sophisticated, then a search/replace would do.. I'm trying to find a translation table for these characters like:
bad_value good_value
ö Ö
|
|
|
|
Re: Unicode conversion [message #548814 is a reply to message #548811] |
Mon, 26 March 2012 07:56 |
|
Michel Cadot
Messages: 68665 Registered: March 2007 Location: Nanterre, France, http://...
|
Senior Member Account Moderator |
|
|
You first have to get the UTF8 code points for all the non-ASCII characters, for instance:
SQL> select dump(convert('Ö','UTF8')) from dual;
DUMP(CONVERT('Ö','UT
--------------------
Typ=1 Len=2: 195,150
When you have all these values you can convert back:
SQL> select replace('-->Ã<--', chr(195)||chr(150), 'Ö') from dual;
REPLACE
-------
-->Ö<--
1 row selected.
Unfortunatly, it is (very) possible that some codes of UTF8 character fall in codes that do not exist in your character set and in this case Oracle convert them to a replacement character which is ¿ (chr(191) in WE8MSWIN1252 and so some UTF8 characters may have been converted to the same set of WE8MSWIN1252 characters.
Regards
Michel
|
|
|
|
|
|