What Brian Cant Never Taught You About Metadata

A presentation at Geek in the Park in August 2008 in Royal Leamington Spa, UK by Drew McLellan

Slide 1

Slide 1

What Brian Cant Never Taught You About Metadata a presentation by drew mclellan

Slide 2

Slide 2

What Brian Cant Never Taught You About Metadata a presentation by drew mclellan Everything You Know About Metadata is Wrong

Slide 3

Slide 3

What Brian Cant Never Taught You About Metadata a presentation by drew mclellan How I Learned to Stop Worrying and Love The Data

Slide 4

Slide 4

geek in the park metadata, html, robots, 1970s/80s childrens television programming, tofu,
truth, honesty, and some made up rules stated as absolutes.

Slide 5

Slide 5

enough about you, let’s talk about me for a minute. allinthehead.com edgeofmyseat.com webstandards.org microformats.org drew mclellan

Slide 6

Slide 6

enough about me, let’s talk about Cant for a while. brian cant

Slide 7

Slide 7

Brian Cant taught us lots of things.

Slide 8

Slide 8

Brian Cant taught us lots of things. Brian Cant Other sources everything I know

20%

80%

Slide 9

Slide 9

Other important stuff Metadata of that 80%

1%

99%

Slide 10

Slide 10

THERE’S MORE THAN BRIAN WAS LETTING ON.

Slide 11

Slide 11

Brian taught us to share. sharing is important

Slide 12

Slide 12

The web is all about sharing just ask Humpty and Jemima.

Slide 13

Slide 13

We use the web to share like Brian taught us. knowledge information data the web is primarily a tool for sharing

Slide 14

Slide 14

What are we sharing? all types of data common obscure

Slide 15

Slide 15

Common data names addresses dates & times things for sale reviews

Slide 16

Slide 16

Obscure data your auntie’s hat collection every place Paul McCartney has sneezed since 1962 how many days a web server has been up

Slide 17

Slide 17

this is why we publish it All data is potentially useful to someone else.

Slide 18

Slide 18

Brian taught us to tell the truth

Slide 19

Slide 19

Data is only useful if it’s correctly described

Slide 20

Slide 20

Data is only useful if it’s correctly described but we’ll come onto that in a bit.

Slide 21

Slide 21

So, metadata then

Slide 22

Slide 22

So, metadata then Metadata is data about other data enables you to unlock data

Slide 23

Slide 23

Bus timetable

Slide 24

Slide 24

Audio

Slide 25

Slide 25

Photographs

Slide 26

Slide 26

Metadata is everywhere Often hidden away, but doesn’t deserve to be.

Slide 27

Slide 27

The more exposed metadata is, the more useful it is and the more useful the original data becomes.

Slide 28

Slide 28

Rule #1 Beware dark data.

Slide 29

Slide 29

Rule #1 Beware dark data. Hidden data gets forgotten and goes out of date.

Slide 30

Slide 30

it’s not complicated metadata is simpler than it sounds

Slide 31

Slide 31

sunny

Slide 32

Slide 32

sunny yesterday’s weather:

Slide 33

Slide 33

sl6 8aj

Slide 34

Slide 34

sl6 8aj postcode:

Slide 35

Slide 35

1980-02-21

Slide 36

Slide 36

1980-02-21 date of birth:

Slide 37

Slide 37

Information is data put into context data is grand on its own without context it cannot inform

Slide 38

Slide 38

Metadata puts data into context turns it into information information is even better than data

Slide 39

Slide 39

information is 3 times better than data Betterness

Slide 40

Slide 40

Metadata isn’t new to the web no more than stalking is new to Facebook

Slide 41

Slide 41

XML is a good example <building> <colour>orange</colour> <type>house</type> <doors>1</doors> <windows>0</windows> </building>

Slide 42

Slide 42

XML is a good example define your own schema describe the data you have

Slide 43

Slide 43

Semantics and metadata aren’t identical concepts different ideas on the web there’s a lot of overlap

Slide 44

Slide 44

HTML has a basic set of tags Some enable us to communicate meaning Some put data into context Often both these things

Slide 45

Slide 45

HTML has a basic set of tags Some enable us to communicate meaning

<p> <h1> <h2> <h3> <h4> <h5> <h6> (useful but not great metadata)

Slide 46

Slide 46

HTML has a basic set of tags Some put data into context Often both these things

<title> <address>

Slide 47

Slide 47

HTML enables us to add metadata the HTML class attribute <span class=“name”>Drew</span> this is a very useful technique

Slide 48

Slide 48

HTML enables us to add metadata this makes HTML extremely flexible a good thing indeed

Slide 49

Slide 49

There’s a really obvious example of metadata use in HTML surely you’ve already thought of it

Slide 50

Slide 50

HTML META a.k.a. meta tags been around since HTML 2

Slide 51

Slide 51

HTML META keywords description author copyright date Dublin Core the spec lists no legal values

Slide 52

Slide 52

HTML META

<meta name="keywords" content="vacation, Greece, sunshine" /> <meta name="description" content="My holiday in Greece" /> <meta name="author" content="Drew McLellan" /> <meta name="copyright" content="Drew McLellan 2008" /> <meta name="date" content="2008-06-12T12:03:56+0100" /> <meta name="DC.identifier" content="http:// www.ietf.org/rfc/rfc1866.txt" />

Slide 53

Slide 53

The use of META elements hasn’t been plain sailing

Slide 54

Slide 54

The use of META elements hasn’t been plain sailing many web designers don’t know how to use them properly leading to inconsistent use dark data

Slide 55

Slide 55

Many misunderstand the purpose I’m looking at you, web marketeers and you, so-called SEO experts

Slide 56

Slide 56

Many misunderstand the purpose META tags aren’t for search engines META tags are used by search engines

Slide 57

Slide 57

Many misunderstand the purpose META tags aren’t for search engines META tags are used by search engines META tags are for describing the data

Slide 58

Slide 58

Many misunderstand the purpose to provide a means to discover that the data set exists and how it might be obtained or accessed; and to document the content, quality, and features of a data set, indicating its fitness for use

Slide 59

Slide 59

Rule #2 The more you lie, the less you can be trusted and the less valuable your info becomes.

Slide 60

Slide 60

Rule #2 This is something Brian Cant taught us.

Slide 61

Slide 61

Rule #3 The fewer distinct consumers, the less valuable the metadata becomes over time.

Slide 62

Slide 62

Rule #3 Only search engines really used META keywords, descriptions Authors began writing targeted for search engines “how do I get well ranked?” vs. “how do I describe this data?”

Slide 63

Slide 63

Rule #3 Search engines can no longer trust keywords, descriptions Abuse has spoiled it for everyone Brian Cant never said anything about that.

Slide 64

Slide 64

What have we learned so far?

Slide 65

Slide 65

What have we learned so far? Sharing is good - the web is for sharing Metadata isn’t new IRL or on the web HTML gives us ways to express metadata It only works if we tell the truth

Slide 66

Slide 66

We need thems robots on our side Part 2:

Slide 67

Slide 67

or against us robots are either with us

Slide 68

Slide 68

so we’d better co-operate we don’t want them against us

Slide 69

Slide 69

and effort robots can save us time

Slide 70

Slide 70

yay.

Slide 71

Slide 71

tofu robot says: data is everywhere

Slide 72

Slide 72

There are lots of idioms for data Opening times Event details Addresses

Slide 73

Slide 73

Idioms are good they’re not always formal you don’t need to be formal to be understood

Slide 74

Slide 74

Informal is good but consistency is important let’s look at why...

Slide 75

Slide 75

Humans are quick to adapt we can easily re-evaluate and adjust we can climb stairs without a trip to the workshop

Slide 76

Slide 76

Robots prefer patterns they rely on known patterns patterns can be formal or informal must be consistent and repeatable

Slide 77

Slide 77

Humans like patterns too we like routine we like repeating patterns robots like patterns because they are repeatable we like patterns because we don’t want to think thinking is hard, uncomfortable and inconvenient.

Slide 78

Slide 78

thinking Hard Uncomfortable Inconvenient Prone to error

45%

29%

4%

21%

Slide 79

Slide 79

So as it turns out what’s good for thems robots is good for us too

Slide 80

Slide 80

it’s not complicated metadata is good - so we want to use it our metadata challenge need to embrace reusable patterns avoid dark data avoid specific data for any consumer make it easy to be truthful embrace existing idioms reuse existing technology

Slide 81

Slide 81

remember this? <span class=“name”>Drew</span>

Slide 82

Slide 82

remember this? <span class=“name”>Drew</span> avoid dark data avoid specific data for any consumer make it easy to be truthful embrace existing idioms reuse existing technology

Slide 83

Slide 83

remember this? <span class=“name”>Drew</span> avoid dark data avoid specific data for any consumer make it easy to be truthful embrace existing idioms reuse existing technology need to embrace reusable patterns

Slide 84

Slide 84

microformats just a bunch of patterns

Slide 85

Slide 85

names and addresses hCard - based on VCARD given-name family-name email url tel title org street-address locality

Slide 86

Slide 86

names and addresses hCard - based on VCARD

<p class=“vcard”>

The announcement followed calls by <span class=“org”>Apple</span> <span class=“role”>Chief Executive</span> <span class=“fn”>Steve Jobs</span> earlier this year...

</p>

Slide 87

Slide 87

names and addresses hCard - based on VCARD

<p class=“ vcard ”>

The announcement followed calls by <span class=“ org ”>Apple</span> <span class=“ role ”>Chief Executive</span> <span class=“ fn ”>Steve Jobs</span> earlier this year...

</p>

Slide 88

Slide 88

events and dates hCalendar - based on iCAL dtstart dtend location url description summary

Slide 89

Slide 89

reviews hReview item reviewer rating description summary photo

Slide 90

Slide 90

relationships XFN contact acquaintance met co-worker friend colleague neighbor child parent sweetheart crush me

Slide 91

Slide 91

many more licenses tags date-based feeds directories products payments geolocation more

Slide 92

Slide 92

remember this? <span class=“name”>Drew</span> avoid dark data avoid specific data for any consumer make it easy to be truthful embrace existing idioms reuse existing technology need to embrace reusable patterns

Slide 93

Slide 93

remember this? <span class=“name”>Drew</span> avoid dark data avoid specific data for any consumer make it easy to be truthful embrace existing idioms reuse existing technology need to embrace reusable patterns

Slide 94

Slide 94

Brian Cant never knew this but I bet he’d be thrilled.

Slide 95

Slide 95

microformats are good a humane method for using metadata on the web easy for us to implement readable by our robotic friends

Slide 96

Slide 96

Slide 97

Slide 97

hCard

Slide 98

Slide 98

Slide 99

Slide 99

hCalendar

Slide 100

Slide 100

Slide 101

Slide 101

Slide 102

Slide 102

For robot masters http://microformats.org/wiki/parsers http://tools.microformatic.com/

Slide 103

Slide 103

For humans http://microformats.org/

Slide 104

Slide 104

For humans http://oreilly.com/

Slide 105

Slide 105

For humans http://microformatique.com/book/

Slide 106

Slide 106

What Brian Cant Never Taught You About Metadata. So that’s Thank you. http://allinthehead.com/presentations

Slide 107

Slide 107

http://flickr.com/photos/gperez/4393118/ http://flickr.com/photos/warmnfuzzy/466382466/ http://flickr.com/photos/stevegarfield/194648339/ http://flickr.com/photos/donsolo/2385041554/ Creative Commons photos used: