A short tutorial which explains what ASCII and Unicode are, how they work, and what the difference is between them, for students studying GCSE Computer Science.
Okay, I have gone through maybe 13 other videos including a video by my instructor, and all of them were not as simple and easy as you made this explanation out to be. Thank you for making this. I'm finally understanding it!
(4:30) You should preferably save in UTF-8 instead, that uses 1-7 bits per character depending on how far into Unicode it appears. - Furthermore, you can't use characters beyond the 1 114 111th character, due to how the standard is set up with the 16-bit surrogate pairs.
The explanation is wrong. When saved in UNICODE format, Notepad adds a two-byte magic number to mark the file format as being UTF-16, and the ALT-1 character is saved as two bytes. Notepad does not save in a 32-bit UNICODE format. You can verify this by putting the ALT-1 character in the file twice, and see that the file size is then 6 bytes. Also, UTF-16 encodes up to 20 bits, not 32 as stated.
Thank you so much for the heroes that create these videos. My first time delving into telcom based project and this video helped me so much for a non tech.
Keyboard keys have a different number asociated to them that gets translated into "ASCII" later, upper or lower case depending on whether Shift was held down. PS/2 scancode for A is 0x1E. You skipped over intermediate word lengths. For most of the history, a character has been 8 bits, and later 16 bits. Having so many bits comes at cost. The Notepad shortcut might be something found in newest Windows only. A Unicode text file is usually prefixed with a technical character called a "byte order mark" to indicate the UTF format. Saving one symbol will actually save two.
The reason notepad uses four bytes in that example is not because utf-32, but instead because they use utf-16 with an additional BOM at the beginning of the file.
ASCII is an extension of the 5-bit Badau Code used in such things as the original electro-mechanical Teletype Keyboard/Printer long-distance communication devices that replaced the old hand-keyed dit-dah Morse Code when more capable and much faster methods of sending on/off (digital) codes were developed at the end of the 19th Century. Much of ASCII was adopted to allow more characters to be sent and to allow more thorough control of the receiving device (End-of-Message value and so forth) for more intricate messages (for example, the Escape Code as a flag to allow non-ASCII values to be sent for use by the receiving machine as other than letters or numbers or standard symbols or ASCII Printer Commands). Sending pictures is another major extension of ASCII where the original printable characters are now just a small part of the image printed out. UNICODE is one part of this expansion but such things as "JPG" and "MP4" and other special-purpose coding schemes are now also used extensively for inter-computer messaging. Automatic "handshaking" and error determination are now absolutely needed for computer messaging that is going much too fast for human monitoring of the connections -- this can get extremely complex when automatic backup systems with quick-swapping paths are used.
Wow, what a lot of extra information! Very interesting, thank you very much for sharing that. My videos tend to be targetted at the current UK GCSE Computer Science curriculum, and so only tend to provide that much information, but it's always good when subscribers share extra information and explanations, so thank you!
@@TheTechTrain You are welcome. I started work for the US Navy as an Electronic Engineer for Systems of the TERRIER Anti-Aircraft Guided Missile System in late 1972, just as digital computers were being added to replace the old electro-mechanical computers used to aim the missiles and initialize them just prior to firing and control the large tracking radars (huge "folded telescope" designs that used pulse tracking and a separate continuous very-high-frequency illumination beam for the missile's radar "eye" to home on). A couple of years later the profession of Computer Engineer finally was added to the US Government employment system and our group all changed over to it, so I, in a manner of speech, was not on the "ground floor" of the computer revolution but, as far as the US Federal Government was concerned, I was "in the basement" at this extremely critical time of change when computers "got smaller" as was given as an inside joke in the Remo Williams movie. There is a CZcams video series out concerning a group in a museum rebuilding one of those old Teletype machines that used the Badau Code and showing how it controlled all of the tiny moving parts in the machines. VERY interesting!
You've certainly seen a fair few changes in that time then! As someone with such an extensive background in the subject I feel humbled at you reviewing my little video! Are you still involved with the subject these days?
@@TheTechTrain I retired in 2014 after 41 years of US Federal Government employment. First for TERRIER until it was decommissioned in 1992 when steam warships (other than the nukes) were suddenly without any warning deleted from the Navy, where I was lucky and could immediately change over to TARTAR, which was on many gas-turbine-powered ships and lasted a few years longer before AEGIS replaced almost every other major US Navy missile system. TARTAR had some computer engineering/programming jobs open and I now learned a whole new kind of software/hardware design and support scheme -- boy, was that a step down, from the 18-bit UNIVAC C-152 computers that TERRIER used to the chained-together 16-bit computers from several manufacturers that TARTAR used, since 18-bit (though now obsolete due to 32- and 64-bit machines becoming standard) gave programmers WAY, WAY more capability than 16-bit did. When TARTAR in the US Navy "bit the dust" (I think that a foreign navy still uses it) a few years later, I moved to the FFG-7 frigates that used a kind of "poor-man's TARTAR" (still limited to the early SM-1 "homing-all-the-way" missiles when TARTAR had changed over to the much more capable command-guided, much-longer-range SM-2 missiles). I did some programming and program testing and spec writing, with my largest work effort being on the several-year-long project to upgrade the Australian FFG-7 ships to SM-2 and an Evolved Seasparrow vertical-launch close-in defense system -- that was a HUGE job like shoe-horning a size 12 foot into a size 8 shoe, but we did our part in developing the programming portion of the system and it WORKED!! By then I was doing Software Quality Assurance and Control (SQA), where I made sure all documents were properly reviewed and OKed and I held the final meetings where we decided if we have finished each major project step and can go to the next one, which was a major change for me. I had to learn all about the SQA process, originally developed for NASA (though we never got to their own Level 5 SQA System as that would have needed many more people and lots more money), and my boss had me flow-chart, by hand, the entire process with all possible branches to make me REALLY know it ASAP -- he stuck my large flow-charts up on the wall just outside his office and directed that everybody study their part in it (only I and our project librarian/documentation/computer storage medium "czar" had to learn the whole thing; just my luck!). To get some idea as to how far behind the US Navy is getting vis-a-vis software, where originally we were on the "bleeding edge" when I started work, I was the ONLY SQA person in our entire group, handling the several concurrent projects we had by juggling the timing of meetings and so forth. In the online video game DIABLO III, to name just one, they have over ONE-HUNDRED (100!!!) people dedicated to just SQA, and that is only a small part of their entire international staff. I felt like Indiana Jones being dragged by his whip behind that Nazi truck, only in my case that truck was getting farther and farther away as the whip stretched...
The short answer is 'no' because the first bit being equal to 1 will be the flag used to indicate a Unicode character. Since the first bit has to be a '1', you are halving the amount of available combinations potentially available.
does our system uses a single system at a time or it toggles between ascii and unicode as needed automatically ? what if the file contains simple alphabets as well as the emojis ??
If a text file contains only ASCII files then it will be saved by default as an ASCII file, unless you choose otherwise. If the file contains any Unicode characters then if you try to save it as an ASCII/ANSII file then you will be warned about the potential loss of the Unicode characters. Generally the system will try to keep file sizes low, so will only save using the higher Unicode file size if any Unicode characters are included.
I think you must make a distinction between the character space (e.g. Unicode codepoints) and the function to map from the character space to the encoded sequences of bits. You would then notice that there are constraints on this function and not all the 32 bits can be freely used, making the 2Billion number quite false. I may be wrong though, just learned about these stuffs 5 mins ago.
I tend to focus on the needs of the GCSE Computer Science course I teach. You are correct though - the 2 billion is a potential rather than an actual figure.
I tried this in Windows 10 v22H2 and found that the Alt+1 combination file size was 3 bytes instead of 4 bytes, as mentioned in the video. Any specific reason for this that you can recall?
Great job! Could you explain how you reached the 2 billion number for total possibilities for 32bit size? Is it 2^31? And if so, what is the last bit for?
The last bit on binary numbers are usually used as the _sign_ of the number i.e. if the sign bit is 0, the number is a positive number. If the sign bit is 1, it's a negative number. Most binary numbers (8 / 16 / 32 / 64 bit) has a sign bit at the end of the number so it's actually 2^n-1 individual numbers Hope this helps! Edit: I'll provide an example: 1101 - Normally this would be 13, but if we used the last bit (most left) as the sign bit, 1_101 - This would be -5
very well and simply explained, thanks a lot , but I have a question: why can 32-bit represent only half the no. of values, I mean why 2 billion while it can represent up to 4.3 billion ??
(3:20) Well, you see. "emoji" is a glyph in Unicode that is defined to be a picture, usually with its own colours, unlike other text. These characters are specifically defined to represent these pictures. - "emoticon" is a series of characters, built up by existing symbols that was not intended to be part of the picture. For example the basic smilie ":)" consists of colon and parenthesis, two symbols intended for other things.
No, Emoji are Unicode characters, not series of many. That is three if you encode the Unicode (also known as ISO-10646) with UTF-8 encoding/compression. Then characters like Å, Ä and Ö turns out as two characters if you look at UTF-8 encoded files as if they was ASCII or Latin-1 (that is ISO-8859-1). Common missconfiguration of web servers.
One of the most beautiful arts is making complicated things look so simple. And only legends can do it.
We watch this video in my computer class at school 😂 it’s very well put together, good job.
Perfect. Simple, easy and straightfoward 10/10 Great explanation!
Thank you so much, I'm very glad you think so.
It took me lots of video browsing to be here but this was the the video I was looking for all this while. This is the best
Fantastic, thank you for the clarity. Have read blog posts and seen videos on this topic and never understood it quite so well.
Thank you Alexander, I'm very glad you feel this video is so useful.
Well explained and I love the “Try It Yourself”
Simple, clear instructions- very helpful. Thank you!
I'm so glad you found it helpful, thank you.
Loved this, especially the "Try it out" part - this made exam prep for Intro to IT much easier!
by far the best on this topic!!!
This video explains it beautifully and very easy to understand. Thanks for the great content
Glad it was helpful!
More than half an hour I was searching answer for this question but within 6min you did it....thanks a lot..
I'm so glad I was able to help you.
My content on this topic is crystal clear. Thanks tech train.
Okay, I have gone through maybe 13 other videos including a video by my instructor, and all of them were not as simple and easy as you made this explanation out to be. Thank you for making this. I'm finally understanding it!
Well, he have left out quite a lot to make it look simple. Like the important encoding of Unicode (ISO-10646) in UTF-8.
Chinese: Im gonna end ASCII's whole career.
😆
Unicode: what are these..?
Japanese: Emojis
Unicode: it.. it's a face
Japanese: yeah, and?
Unicode: now has Emojis
LMAO
Good to see this comment! ΟωΟ
Unicode ; hold my 🍺
Refreshed me a beginner lessons of my computer science class. great thanks
I'm so glad you found it helpful Abed Behrooz
(4:30) You should preferably save in UTF-8 instead, that uses 1-7 bits per character depending on how far into Unicode it appears. - Furthermore, you can't use characters beyond the 1 114 111th character, due to how the standard is set up with the 16-bit surrogate pairs.
Clean and clear, super well presented. Thank you for contributing great quality information on this platform. It's a breath of fresh air.
Thank you so much for your kind comment, I am so glad you felt the video was so useful. Hope to see you here again!
the best explanation ....easily understood the topic..hats off
Thank you!
Explanation is good.. and helpful as well
Great explanation. very helpful!
You're welcome!
Thank you for great explanation!
Best ever explanation...Thanks dude
Your method of teaching is so simple and amazing...means it's easy to understand..💗❤️🙂
thanks u are an amazing professor!!
The explanation is wrong. When saved in UNICODE format, Notepad adds a two-byte magic number to mark the file format as being UTF-16, and the ALT-1 character is saved as two bytes. Notepad does not save in a 32-bit UNICODE format. You can verify this by putting the ALT-1 character in the file twice, and see that the file size is then 6 bytes. Also, UTF-16 encodes up to 20 bits, not 32 as stated.
Nice, I got 6 bytes!
I got 16 bytes
@@gbzedaii3765I don't know how you managed to get 16 bytes
this is the best video on this topic!! must watch!!
Thank you so much! I'm glad it was useful.
The greatest teachers are the ones that can simplify the most complicated of things. Bravo to you!! tysm for the vid :)
Thank you! I'm so glad you found it helpful. 😊
This actually helped me understand why a bunch of symbols have random numbers after it.
This is what im looking for this morning
Thanks so much. This is a great video
Easy to understand, will subscribe for that
Thank you so much for the heroes that create these videos. My first time delving into telcom based project and this video helped me so much for a non tech.
As smooth as it get, Thanks!
Just awesome 👏👏👏👏.
It was short video but full of content. Very well explained. Thank You :) !
I love the "try it yourself" part!. Thanks a lot sir!
I'm glad you found it useful
Keyboard keys have a different number asociated to them that gets translated into "ASCII" later, upper or lower case depending on whether Shift was held down. PS/2 scancode for A is 0x1E. You skipped over intermediate word lengths. For most of the history, a character has been 8 bits, and later 16 bits. Having so many bits comes at cost.
The Notepad shortcut might be something found in newest Windows only. A Unicode text file is usually prefixed with a technical character called a "byte order mark" to indicate the UTF format. Saving one symbol will actually save two.
A very good explanation.
Extremely helpful, thanks a lot!
You're welcome!
Thanks for this. Great explination.
Very comprehensive, thank you
I'm so glad you found it useful. Thank you for your support.
Dude your youtube channel is amazing! Especially this vid! Helped a lot with computer science 🤞👍
Thank you very much! I'm so glad you found it helpful. 👍
You guys explain this better then what my prof. did over 2 hours. lolz
7 bit stores 128 characters from 0-127 => 0000000-1111111. Correct me if I am wrong please.
@@mrsher4517 a lot of info dumped here lolz not fast enough ooooopp
For UTF-8, there are 21 free bits. So the highest possible code point will be 2097151 (decimal) or 1FFFFF (hex)
Don't forget that with UTF-8 only one byte is used for the ASCII characters.
thank you for the explanation
thanks a lot for the awesome explaination
You're very welcome!
Damn such a beautiful way to explain things
Thank you, I'm so glad you found it helpful.
Thanks a lot. top-notch 🙂
what a great explanation!
Thank you so much, I'm glad you found it useful.
Thank you for explaining we are thankfull to you
It's my pleasure
Great method 👍🏻👍🏻👍🏻
The reason notepad uses four bytes in that example is not because utf-32, but instead because they use utf-16 with an additional BOM at the beginning of the file.
yeah i think i seen something like this in the Wikipedia page
ASCII is an extension of the 5-bit Badau Code used in such things as the original electro-mechanical Teletype Keyboard/Printer long-distance communication devices that replaced the old hand-keyed dit-dah Morse Code when more capable and much faster methods of sending on/off (digital) codes were developed at the end of the 19th Century. Much of ASCII was adopted to allow more characters to be sent and to allow more thorough control of the receiving device (End-of-Message value and so forth) for more intricate messages (for example, the Escape Code as a flag to allow non-ASCII values to be sent for use by the receiving machine as other than letters or numbers or standard symbols or ASCII Printer Commands). Sending pictures is another major extension of ASCII where the original printable characters are now just a small part of the image printed out. UNICODE is one part of this expansion but such things as "JPG" and "MP4" and other special-purpose coding schemes are now also used extensively for inter-computer messaging. Automatic "handshaking" and error determination are now absolutely needed for computer messaging that is going much too fast for human monitoring of the connections -- this can get extremely complex when automatic backup systems with quick-swapping paths are used.
Wow, what a lot of extra information! Very interesting, thank you very much for sharing that. My videos tend to be targetted at the current UK GCSE Computer Science curriculum, and so only tend to provide that much information, but it's always good when subscribers share extra information and explanations, so thank you!
@@TheTechTrain You are welcome. I started work for the US Navy as an Electronic Engineer for Systems of the TERRIER Anti-Aircraft Guided Missile System in late 1972, just as digital computers were being added to replace the old electro-mechanical computers used to aim the missiles and initialize them just prior to firing and control the large tracking radars (huge "folded telescope" designs that used pulse tracking and a separate continuous very-high-frequency illumination beam for the missile's radar "eye" to home on). A couple of years later the profession of Computer Engineer finally was added to the US Government employment system and our group all changed over to it, so I, in a manner of speech, was not on the "ground floor" of the computer revolution but, as far as the US Federal Government was concerned, I was "in the basement" at this extremely critical time of change when computers "got smaller" as was given as an inside joke in the Remo Williams movie. There is a CZcams video series out concerning a group in a museum rebuilding one of those old Teletype machines that used the Badau Code and showing how it controlled all of the tiny moving parts in the machines. VERY interesting!
You've certainly seen a fair few changes in that time then! As someone with such an extensive background in the subject I feel humbled at you reviewing my little video! Are you still involved with the subject these days?
@@TheTechTrain I retired in 2014 after 41 years of US Federal Government employment. First for TERRIER until it was decommissioned in 1992 when steam warships (other than the nukes) were suddenly without any warning deleted from the Navy, where I was lucky and could immediately change over to TARTAR, which was on many gas-turbine-powered ships and lasted a few years longer before AEGIS replaced almost every other major US Navy missile system. TARTAR had some computer engineering/programming jobs open and I now learned a whole new kind of software/hardware design and support scheme -- boy, was that a step down, from the 18-bit UNIVAC C-152 computers that TERRIER used to the chained-together 16-bit computers from several manufacturers that TARTAR used, since 18-bit (though now obsolete due to 32- and 64-bit machines becoming standard) gave programmers WAY, WAY more capability than 16-bit did. When TARTAR in the US Navy "bit the dust" (I think that a foreign navy still uses it) a few years later, I moved to the FFG-7 frigates that used a kind of "poor-man's TARTAR" (still limited to the early SM-1 "homing-all-the-way" missiles when TARTAR had changed over to the much more capable command-guided, much-longer-range SM-2 missiles). I did some programming and program testing and spec writing, with my largest work effort being on the several-year-long project to upgrade the Australian FFG-7 ships to SM-2 and an Evolved Seasparrow vertical-launch close-in defense system -- that was a HUGE job like shoe-horning a size 12 foot into a size 8 shoe, but we did our part in developing the programming portion of the system and it WORKED!! By then I was doing Software Quality Assurance and Control (SQA), where I made sure all documents were properly reviewed and OKed and I held the final meetings where we decided if we have finished each major project step and can go to the next one, which was a major change for me. I had to learn all about the SQA process, originally developed for NASA (though we never got to their own Level 5 SQA System as that would have needed many more people and lots more money), and my boss had me flow-chart, by hand, the entire process with all possible branches to make me REALLY know it ASAP -- he stuck my large flow-charts up on the wall just outside his office and directed that everybody study their part in it (only I and our project librarian/documentation/computer storage medium "czar" had to learn the whole thing; just my luck!). To get some idea as to how far behind the US Navy is getting vis-a-vis software, where originally we were on the "bleeding edge" when I started work, I was the ONLY SQA person in our entire group, handling the several concurrent projects we had by juggling the timing of meetings and so forth. In the online video game DIABLO III, to name just one, they have over ONE-HUNDRED (100!!!) people dedicated to just SQA, and that is only a small part of their entire international staff. I felt like Indiana Jones being dragged by his whip behind that Nazi truck, only in my case that truck was getting farther and farther away as the whip stretched...
isn't 32-bit capable of storing potentially 2^32-1=4,294,967,295 characters (not only 2,147,483,647, as shown in the video)?
The short answer is 'no' because the first bit being equal to 1 will be the flag used to indicate a Unicode character. Since the first bit has to be a '1', you are halving the amount of available combinations potentially available.
Its actually 2^31
@@xiaoling943 It's actually 2^31 - 1. xD
Unsigned vs signed integer
Thanks ...anything on EBCDIC ?
so useful even now, thank you
Glad to hear!
What an amazing video, bravo, I definitely will Sub
I'm so glad it helped! Thank you for the sub! 👍
nice explanation........... i want more abot ANSI
Nice!!! clear, concise and simple
Great video 👌, clear explanation, very good examples.
Very helpful thanks!!!
Glad it helped!
Woow thank you for the information
So what are the possible questions that can come up for extended ASCII
Great absolutely great! love the video best video I have viewed for this topic!
Thank you so much, I'm very glad you found it so useful! (Feel free to share and help spread the word! 😉👍)
Thank you man, really helpful
Great video . Have understood ascii and unicode clearly . This video deserves thumbs up ..
You're a saving grace, bruv. God bless your heart. Merry Christmas and good night.
very nice explanation
ThankYou so much sir
Thank you very much! This is the best explanation I've ever seen in my life.
Thank you so much, I'm very glad you liked it
How can we learn all of them,bcd,ascii ebcduc
does our system uses a single system at a time
or
it toggles between ascii and unicode as needed automatically ?
what if the file contains simple alphabets as well as the emojis ??
If a text file contains only ASCII files then it will be saved by default as an ASCII file, unless you choose otherwise. If the file contains any Unicode characters then if you try to save it as an ASCII/ANSII file then you will be warned about the potential loss of the Unicode characters. Generally the system will try to keep file sizes low, so will only save using the higher Unicode file size if any Unicode characters are included.
I think you must make a distinction between the character space (e.g. Unicode codepoints) and the function to map from the character space to the encoded sequences of bits. You would then notice that there are constraints on this function and not all the 32 bits can be freely used, making the 2Billion number quite false. I may be wrong though, just learned about these stuffs 5 mins ago.
I tend to focus on the needs of the GCSE Computer Science course I teach. You are correct though - the 2 billion is a potential rather than an actual figure.
Very well done!
Thank you very much!
I tried this in Windows 10 v22H2 and found that the Alt+1 combination file size was 3 bytes instead of 4 bytes, as mentioned in the video. Any specific reason for this that you can recall?
Great job! Could you explain how you reached the 2 billion number for total possibilities for 32bit size? Is it 2^31? And if so, what is the last bit for?
The last bit on binary numbers are usually used as the _sign_ of the number
i.e. if the sign bit is 0, the number is a positive number.
If the sign bit is 1, it's a negative number.
Most binary numbers (8 / 16 / 32 / 64 bit) has a sign bit at the end of the number so it's actually 2^n-1 individual numbers
Hope this helps!
Edit:
I'll provide an example:
1101 - Normally this would be 13, but if we used the last bit (most left) as the sign bit,
1_101 - This would be -5
This was an awesome explanation. Thank you.
Extremely helpful. Thanks a million😊
best explanation ever
Best explanation I’ve seen yet!
Subbed, likes, etc.
Amazing video!
Thank you!
very well and simply explained, thanks a lot , but I have a question: why can 32-bit represent only half the no. of values, I mean why 2 billion while it can represent up to 4.3 billion ??
1 bit is used to represent positive or negative
Awesome bro
THANK YOU !
Very nice basic introduction
Glad you liked it
brilliant explanation,thank you!
I'm so glad, thank you.
It means we have need to remember ASCII code.Is there is any way to remember ASCII code
4:45
wait a second... 2^31 -1 = 2147483647..
why we call it 32 bits instead of 31 bits?
(3:20) Well, you see. "emoji" is a glyph in Unicode that is defined to be a picture, usually with its own colours, unlike other text. These characters are specifically defined to represent these pictures. - "emoticon" is a series of characters, built up by existing symbols that was not intended to be part of the picture. For example the basic smilie ":)" consists of colon and parenthesis, two symbols intended for other things.
No, Emoji are Unicode characters, not series of many. That is three if you encode the Unicode (also known as ISO-10646) with UTF-8 encoding/compression.
Then characters like Å, Ä and Ö turns out as two characters if you look at UTF-8 encoded files as if they was ASCII or Latin-1 (that is ISO-8859-1). Common missconfiguration of web servers.
@@AndersJackson Not all emoji are one Unicode codepoint. For example, 👍🏻 is made up of 2 codepoints: 👍 and the pale skin tone modifier.
Great video,learned a lot!
great explanation. i was able to understand it easy
Thank you for making this. It was very helpful.
A great work.
Thank you very much!
You can actually type in the characters using the Ascii numbers, just press Alt + the number code
Thank u so much. No one explained this way.❤️❤️
WOW, by far the best explanation
Glad it was helpful!
@@TheTechTrain yeahhhhhhhhhhhhhhhhhhhhhhh
so much helpful!!!......
Glad you think so!
Got it ... Thank you...
You're very welcome!
Amazing :)
Why can't characters be saved as a 1-byte ASCII and then a 4-byte Unicode for other characters? Which would reduce the size needed for memory?
Unicode does come in many flavours, but you don't want to have 4 bytes necessarily for everything as that would waste storage space.
@@TheTechTrain I want to make unicde of my language and want to revive the script of my language
This video has really helped me understand from a total beginners point of view, thank you. :)