Accepting number and international Characters into the name

  • 3
  • Problem
  • Updated 11 months ago
  • Acknowledged
We have discovered completed applications with just numbers in the last name and name with Chinese characters.  All of which throws our system off.  Is there reason why we allow numbers into the name.  And suggestion as to what to do with Chinese characters .  
Photo of Fai Fong

Fai Fong

  • 6 Posts
  • 0 Reply Likes

Posted 2 years ago

  • 3
Photo of Kasey Crosby

Kasey Crosby, Official Rep

  • 66 Posts
  • 2 Reply Likes
To convert the Chinese characters, you'll need to set the stripDiacritics to "true"  The stripDiacritics attribute will convert certain diacritic (non-English or foreign)characters to standard. Please see the example below.



As for numbers in the name, I'll need to check on that and get back to you.
Photo of Fai Fong

Fai Fong

  • 6 Posts
  • 0 Reply Likes
We do have the setting for stripDiacritics to true.    It's not working.   Do we need then encoding="UTF-8" ?

<?xml version="1.0" ?><formatDefinitions xmlns="http://xmlns.cccnext.org/xfer">;
 <formatDefinition outputFormat="delimited" id="smcformat" applicationType="apply" delimiter="|"
 stripDiacritics="true">
<fieldList>

The second part.  We are also getting all numbers in a name.  It's that expected?
Photo of Kasey Crosby

Kasey Crosby, Official Rep

  • 66 Posts
  • 2 Reply Likes
It's unusual, but there are people with numbers in their names. Therefore, we do not check for, nor prevent numbers in names.
Photo of Kasey Crosby

Kasey Crosby, Official Rep

  • 66 Posts
  • 2 Reply Likes
Yes, you should be using encoding="UTF-8".

Would you mind sending me the CCCID and Confirmation number of that app? Please email it to kcrosby@ccctechcenter.org. I'd like to take a look at it as it is possible it is a fraudulent app.
Photo of severa

severa, Champion

  • 101 Posts
  • 10 Reply Likes

I'd be interested to know what 'its not working' means.

Fyi, there is a bug in the strip diacritics behavior, specifically with some Chinese characters. It only occurs with fixed length files, which yours seems to be. What happens in that case is the length of the field is not what is defined in the format file. Its not a huge issue if you don't encounter those particular characters tho.

Also, correct me if I'm wrong, but strip diacritics should work whether ouptut format is set to utf-8 or not. Correct?

Photo of Fai Fong

Fai Fong

  • 6 Posts
  • 0 Reply Likes
Kasey,  I added encoding and reset a couple of applications containing Chinese characters.  The resulting name I see is just "?".    Let me know if that is expected.   Thanks,   
Photo of Kasey Crosby

Kasey Crosby, Official Rep

  • 66 Posts
  • 2 Reply Likes
It looks like this is a known issue. Validation for Chinese (non-standard (non-English)) characters is being researched and has been prioritized for the next annual update. I do not have a eta, but for now, know that it is being looked into.
Photo of JoeHo

JoeHo

  • 4 Posts
  • 0 Reply Likes
Kasey, when viewing the _main or _inst file in notepad++  when encodings are changed (from UTF-8 to ANSI or ASCII) we notice some special charaters will create a mis-alignment in the row with extra spaces, it looks correct under UTF-8, however our SIS does not process UTF-8, thus mis-reading the input. Is there a fix for this to strip out the special characters?
Photo of John Saric

John Saric

  • 8 Posts
  • 0 Reply Likes
This is our code
-<formatDefinitions xmlns="http://xmlns.cccnext.org/xfer">
-<formatDefinition
id="gcccdMain" outputFormat="fixed">
-<fieldList>

TO adjust for foreign character with accents, would we add???

-<formatDefinitions xmlns="http://xmlns.cccnext.org/xfer">
-<formatDefinition
 id="gcccdMain" outputFormat="fixed" stripDiacritics = "True">
-<fieldList>

Would this solve the problem????





Photo of JoeHo

JoeHo

  • 4 Posts
  • 0 Reply Likes
John, I believe the issue that we are having is that our SIS does not process UTF-8 propoerly. We are thinking about putting something to convert the UTF-8 file to ASCII, after it is downloaded from CCCApply, but unfortunately that won't handle anything that's outside of the ASCII char set.
Photo of John Saric

John Saric

  • 8 Posts
  • 0 Reply Likes
So, would this include spanish caracters with accents marks or foreign characters with 2 dots above an 'a'?
Photo of JoeHo

JoeHo

  • 4 Posts
  • 0 Reply Likes
It will be limited to ascii and extended ascii, which includes what you are referring to (see http://www.asciitable.com/) the difficult part will be programming the actual conversion.

Photo of John Saric

John Saric

  • 8 Posts
  • 0 Reply Likes
so, would this solve the accent, tilda, and 2 dots above the letter problem?


-<formatDefinitions xmlns="http://xmlns.cccnext.org/xfer">
-<formatDefinition
 id="gcccdMain" outputFormat="fixed" stripDiacritics = "True">
-<fieldList>