What a charset!

A presentation at Tech Talk @ Tagesspiegel in September 2021 in by Gunnar Bittersmann

Slide 1

Slide 1

What a character set!

Slide 2

Slide 2

What a charset!

Slide 3

Slide 3

Bundesarchiv, Bild 183-58117-0010 / CC-BY-SA 3.0

Slide 4

Slide 4

ISO-8859-1

Slide 5

Slide 5

ISO-8859-2

Slide 6

Slide 6

ISO-8859-6

Slide 7

Slide 7

ISO-8859-8

Slide 8

Slide 8

Photo by Nicholas Lazarine on Unsplash

Slide 9

Slide 9

He always used to refer this guitar, never “Fender guitar” or “Gibson guitar,” it was always the “goddamn guitar.” —Bruce Springsteen talking about his father

Slide 10

Slide 10

When I was growing up there were two things that were unpopular in my house: one was me, and the other one was my guitar. —Bruce Springsteen

Slide 11

Slide 11

‫שלום‬

ISO-8859-8 FD E5 EC F9 visuell

ISO-8859-8-I F9 EC E5 FD ‫ם ו ל ש‬ logisch

Slide 12

Slide 12

character set ≠ character encoding

Slide 13

Slide 13

a ä “ Unicode U+0061 U+00E4 U+201C HTML escapes ä “ ä “

<p>Anton&#xED;n Dvo&#x159;&#xE1;k</p> <p>Antonín Dvořák</p>

Slide 14

Slide 14

a ä “ Unicode U+0061 U+00E4 U+201C HTML escapes ä “ ä “

<p>Antonín Dvořák – der weltweit meistgespielte tschechische Komponist</p> <p>Antonín Dvořák &ndash; der weltweit meistgespielte tschechische Komponist</p>

Slide 15

Slide 15

Unicode a ä “ BOM U+0061 U+00E4 U+201C U+FEFF HTML escapes ä “ ä “ UTF-16 BE 00 61 00 E4 20 1C FE FF UTF-16 LE 61 00 E4 00 1C 20 FF FE

Slide 16

Slide 16

Unicode a ä “ 😝 BOM U+0061 U+00E4 U+201C U+FEFF HTML escapes ä “ ä “ U+1F61D 😝 UTF-16 BE 00 61 00 E4 20 1C FE FF D8 3D DE 1D UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 ≫ ← ‘😝’.length 2

Slide 17

Slide 17

Unicode a ä “ BOM U+0061 U+00E4 U+201C U+FEFF HTML escapes ä “ ä “ 😝 U+1F61D 😝 UTF-16 BE 00 61 00 E4 20 1C FE FF D8 3D DE 1D UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 UTF-32 BE 00 00 00 61 00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D

Slide 18

Slide 18

Unicode a ä “ BOM U+0061 U+00E4 U+201C U+FEFF HTML escapes ä “ ä “ 😝 U+1F61D 😝 UTF-16 BE 00 61 00 E4 20 1C FE FF D8 3D DE 1D UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 UTF-32 BE 00 00 00 61 00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D 61 C3 A4 E2 80 9C EF BB BF F0 98 9F 9C UTF-8

Slide 19

Slide 19

character set Unicode a ä “ BOM U+0061 U+00E4 U+201C U+FEFF 😝 U+1F61D UTF-16 BE 00 61 00 E4 20 1C FE FF D8 3D DE 1D UTF-16 LE 61 00 E4 00 1C 20 FF FE 1D DE 3D D8 UTF-32 BE 00 00 00 61 00 00 00 E4 00 00 20 1C 00 00 FE FF 00 01 F6 1D 61 C3 A4 E2 80 9C EF BB BF F0 98 9F 9C UTF-8 character encoding

Slide 20

Slide 20

character encoding HTML

<meta charset=”UFT-8”/> XML <?xml encoding=”UFT-8”?>

Slide 21

Slide 21

C3 A4 character encoding U+00E4 LATIN SMALL LETTER A WITH DIAERESIS font ä

Slide 22

Slide 22

Slide 23

Slide 23

Slide 24

Slide 24

OPENTYPE FEATURES

Slide 25

Slide 25

charset ≠ character set

Slide 26

Slide 26

charset = character encoding

Slide 27

Slide 27

The end.