 |
Vreleksá The Alurhsa Word for Constructed: Creativity in both scripts and languages
|
View previous topic :: View next topic |
Author |
Message |
Tolkien_Freak

Joined: 26 Jul 2007 Posts: 1231 Location: in front of my computer. always.
|
Posted: Thu Sep 10, 2009 1:52 am Post subject: Problem with signature change |
|
|
I wish to change my signature to this: (my probably awful translation of a quote from Galileo)
「私たちに理性と知性を与えた神は、その神と同じのは、その使いを捨てるつもりがあると信じざるを得なさそうではない。」
-ガリレオ・ガリレイ
Somehow the forum considers this above the 255 character limit, even though I count ~60.
Ittai nani suru no yo? |
|
Back to top |
|
 |
eldin raigmore Admin

Joined: 03 May 2007 Posts: 1621 Location: SouthEast Michigan
|
Posted: Thu Sep 10, 2009 11:20 pm Post subject: |
|
|
The 255 "character" limit is actually a 255 "byte" or "octet" limit.
If you look at Unicode you'll see that any character-set with more than 256 characters in it takes two bytes per character. (Unless there are more than 65536 characters in the characterset, in which case, I think, Unicode just whimpers and rolls over and sucks its thumb.) That's going to be true of any system based on Chinese logograms, for instance: such as kanji.
Also, anytime you switch from one system to another, you use two bytes (or one? or three?) as a "shift" character.
So, for all I know, then, every time you shift from hiragana to kanji, or from kanji to hiragana, should count as two bytes as well.
Write the entire thing in hiragana or katekana, leaving out the kanji. You should be able to get 125 characters in.
Or, write it all in kanji, (say Mandarin instead of Japanese) and you should be able to get 60 characters in. (Or see how many you can get in.) _________________ "We're the healthiest horse in the glue factory" - Erskine Bowles, Co-Chairman of the deficit reduction commission |
|
Back to top |
|
 |
kyonides
Joined: 28 Aug 2008 Posts: 301
|
Posted: Fri Sep 11, 2009 1:05 am Post subject: |
|
|
Well, yes, kanji, hiragana, katakana and so on are considered multibyte characters (no "single byte per symbol"-character is available). _________________ Seos nivo adgene Kizne tikelke
The Internet might be either your best friend or your worst enemy. It just depends on whether or not she has a bad hair day. |
|
Back to top |
|
 |
Tolkien_Freak

Joined: 26 Jul 2007 Posts: 1231 Location: in front of my computer. always.
|
Posted: Fri Sep 11, 2009 1:33 am Post subject: |
|
|
Ah, didn't know that. I'll try it a different way then. |
|
Back to top |
|
 |
Tolkien_Freak

Joined: 26 Jul 2007 Posts: 1231 Location: in front of my computer. always.
|
Posted: Fri Sep 11, 2009 1:36 am Post subject: |
|
|
Well, all kana didn't work either, and I don't know Chinese enough to put it in that, so oh well.
Anyone want it for a translation challenge? The English text is thus:
'I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use.' |
|
Back to top |
|
 |
Aeetlrcreejl

Joined: 08 Jun 2007 Posts: 839 Location: Over yonder
|
Posted: Fri Sep 11, 2009 11:20 pm Post subject: |
|
|
Mí èdrù mí senít díwà déonà abaket, budet, int cŏ tés èxat. _________________ Iwocwá ĵọṭãsák.
/iwotSwa_H d`Z`Ot`~asa_Hk/
[iocwa_H d`Z`Ot`_h~a_Hk] |
|
Back to top |
|
 |
StrangeMagic Admin

Joined: 18 Apr 2007 Posts: 640
|
Posted: Sat Sep 12, 2009 3:35 pm Post subject: |
|
|
Tolkien_Freak, I have changed the limit on the signature characters, hopefully it should work now. =D |
|
Back to top |
|
 |
Tolkien_Freak

Joined: 26 Jul 2007 Posts: 1231 Location: in front of my computer. always.
|
Posted: Sat Sep 12, 2009 4:19 pm Post subject: |
|
|
Woo, thank you! Works now. |
|
Back to top |
|
 |
Baldash
Joined: 19 May 2009 Posts: 86 Location: Sweden
|
Posted: Thu Sep 17, 2009 12:00 pm Post subject: |
|
|
eldin raigmore wrote: | If you look at Unicode you'll see that any character-set with more than 256 characters in it takes two bytes per character. (Unless there are more than 65536 characters in the characterset, in which case, I think, Unicode just whimpers and rolls over and sucks its thumb.) That's going to be true of any system based on Chinese logograms, for instance: such as kanji.
Also, anytime you switch from one system to another, you use two bytes (or one? or three?) as a "shift" character.
So, for all I know, then, every time you shift from hiragana to kanji, or from kanji to hiragana, should count as two bytes as well.
Write the entire thing in hiragana or katekana, leaving out the kanji. You should be able to get 125 characters in.
Or, write it all in kanji, (say Mandarin instead of Japanese) and you should be able to get 60 characters in. (Or see how many you can get in.) |
That's not how UTF-8 works. UTF-8 uses a variable byte length for its characters, and it only has 128 single byte characters, the same ones as in ASCII (at least the printable ones). The eighth (or first) bit in the first byte isn't used for an additional 128 characters (like ISO-8859-1), but for indicating that it is a two byte character. The number of bytes are indicated in unary in the first byte. A two byte character has the shape 110xxxxx10xxxxxx, a three byte character is 1110xxxx10xxxxxx10xxxxxx, and a four byte character is 11110xxx10xxxxxx10xxxxxx10xxxxxx. The system could be expanded, but I think four bytes is the limit of the standard. That gives 2^21 = 2097152 theoretically possible characters (because I think 0aaaaaaa, 1100000a10aaaaaa, 111000001000000a10aaaaaa, and 11110000100000001000000a10aaaaaa are synonymous). There are no multiple character sets, it's just a single one. There are no "shift" characters that jump between any character sets. So you could shift from hiragana to kanji how often you want, without it affecting the size. But any non-ASCII character will still be at least two bytes long.
I said "UTF-8", since I don't know whether what you said is true for some other encoding, but I haven't heard about it. I'm not sure, but I think UTF-16 works the same way as UTF-8, except that it works with 16 bit blocks instead of 8 bit blocks. |
|
Back to top |
|
 |
eldin raigmore Admin

Joined: 03 May 2007 Posts: 1621 Location: SouthEast Michigan
|
Posted: Fri Sep 18, 2009 8:39 pm Post subject: |
|
|
Baldash wrote: | That's not how UTF-8 works. UTF-8 uses a variable byte length for its characters, and it only has 128 single byte characters, the same ones as in ASCII (at least the printable ones). The eighth (or first) bit in the first byte isn't used for an additional 128 characters (like ISO-8859-1), but for indicating that it is a two byte character. The number of bytes are indicated in unary in the first byte. A two byte character has the shape 110xxxxx10xxxxxx, a three byte character is 1110xxxx10xxxxxx10xxxxxx, and a four byte character is 11110xxx10xxxxxx10xxxxxx10xxxxxx. The system could be expanded, but I think four bytes is the limit of the standard. That gives 2^21 = 2097152 theoretically possible characters (because I think 0aaaaaaa, 1100000a10aaaaaa, 111000001000000a10aaaaaa, and 11110000100000001000000a10aaaaaa are synonymous). There are no multiple character sets, it's just a single one. There are no "shift" characters that jump between any character sets. So you could shift from hiragana to kanji how often you want, without it affecting the size. But any non-ASCII character will still be at least two bytes long.
I said "UTF-8", since I don't know whether what you said is true for some other encoding, but I haven't heard about it. I'm not sure, but I think UTF-16 works the same way as UTF-8, except that it works with 16 bit blocks instead of 8 bit blocks. | Thanks. _________________ "We're the healthiest horse in the glue factory" - Erskine Bowles, Co-Chairman of the deficit reduction commission |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|