Keynote: The Search for Better Glue

A presentation at LonghornPHP in October 2021 in Austin, TX, USA by Taylor Barnett

Slide 1

Slide 1

Slide 2

Slide 2

Thank you for having me This idea is something I’ve thought a lot about over the years. In one of my rst software development jobs, I was integrating di erent APIs inside of our app and I just thought that was something juniors did. I felt like it was almost seen as grunt work. ff fi Better glue means we make better choices for what we need to integrate with through vendor engineering, how we write the glue code, and how we save time with abstracting parts of the glue away

Slide 3

Slide 3

Hypothesis: @planetscaledata @taylor_atx And I honestly think glue work doesn’t have the best reputation. Ugh that thing we use for ___ is broken again Maybe it was a maybe it was reoccurring incidents or bugs that seem to pop up a lot. Maybe even you don’t see personal growth while doing glue work.

Slide 4

Slide 4

@planetscaledata fi But rst… @taylor_atx

Slide 5

Slide 5

@planetscaledata @taylor_atx You know often, we joke that we are piecing together software with glue and popsicle sticks, but some some glues are pretty damn e ective ff fi My de nition…

Slide 6

Slide 6

@planetscaledata @taylor_atx Tanya Reilly created the term glue work to capture “the less glamorous – and often less-promotable – work that needs to happen to make a team successful.” Glue work is valuable – without it, projects fall apart – tasks get dropped, teams miscommunicate, and it is just harder to get things done.

Slide 7

Slide 7

@planetscaledata Glue work can be a lot of di erent things ff Examples: Human: All the things Tanya’s talk gets into, it’s the work that makes a team run and be successful @taylor_atx

Slide 8

Slide 8

@planetscaledata ff Design: How do we build systems that contain many di erent incompatible parts that we glue together? @taylor_atx

Slide 9

Slide 9

@planetscaledata Code: The code we write to integrate with other systems, APIs, integrations. That’s what I want to talk about today @taylor_atx

Slide 10

Slide 10

@planetscaledata @taylor_atx Improving the experience of glue work has been narrow and slow Focused on application development and the technical side of things, platform or operational experience and human aspect has lagged behind A small team cannot improve the experience for their whole team, it’s a bigger cultural change in how we develop and operation, but also how we organize our teams, responsibilities, and incentivize certain types of work. At the same time, tooling is interlaced with some of our human problems, and can change how people work.

Slide 11

Slide 11

@planetscaledata @taylor_atx

Slide 12

Slide 12

@planetscaledata @taylor_atx People aren’t always rewarded for doing it well But the problem with all of these is that it is not always seen as the most important thing ff Because it is seen as just piecing things together, but often it takes a lot of communication, knowledge, and dealing with trade o s and much more

Slide 13

Slide 13

@planetscaledata @taylor_atx There is no good developer and operator experience today that enables really tying all of these problems together, too fragmented Even using di erent AWS services together has a lot of pain points and that is all under one vendor ff So, we are left on our own to solve it in our codebases

Slide 14

Slide 14

@planetscaledata In the words of one of the greatest 80s bands… @taylor_atx

Slide 15

Slide 15

@planetscaledata @taylor_atx This is really a story of the rise of cloud computing as we know it today, APIs, and AWS Early 2000s AWS was founded out trying to solve a recurring need, a faster technology department Merchant.com to help 3rd party merchants build online shopping sites on top of Amazon’s e-commerce engine ffi Amazon was doing internal development for a liates and partners and they kept bumping into the problem, realized a need for internally scalable, reliable infrastructure services. Heard similar things from external partners.

Slide 16

Slide 16

@planetscaledata @taylor_atx After 10 years of Amazon, they had built up a good amount of infrastructure competence, so to develop solutions for external partners they would need an e ective way to communicate with them via dependable APIs, which also meant decoupling the platform. For example, the data was tightly coupled with the presentation layer. “If you believe developers will build applications from scratch using web services as primitive building blocks, then the operating system becomes the Internet,” says Jassy — an approach to development that had not yet been considered. ff First services released were S3 for cloud storage, EC2 for compute, Simple Queue Service for message queuing

Slide 17

Slide 17

@planetscaledata @taylor_atx Like any industry that has grown and evolved a supply chain forms around it. In industries that create physical goods, like cars The automotive manufacturers like Ford, Toyota, and others don’t produce every part themselves. They buy the metals from metal companies, tires from a tire company, who is buying rubber from another company. There’s hundreds and hundreds of suppliers in the chain. And this helps the industry be more e cient and productive. Software now has its own supply chain. It allows the industry to segment and allows companies to focus on their core competencies. Infrastructure companies are just one of our suppliers. ffi fi This software supply chain is heavily made up of APIs, just like a car part, we take chunks of code and piece them together to make nished applications.

Slide 18

Slide 18

@planetscaledata Security spaces @taylor_atx

Slide 19

Slide 19

@planetscaledata @taylor_atx Dare I used a fancy business term but… “a combination of multiple resources and skills that distinguish a rm in the marketplace” “collective learning across the corporation” — not just something the company is good at Three criteria: - Provides potential access to a wide variety of markets. - Should make a signi cant contribution to the perceived customer bene ts of the end product. - Di cult to copy by competitors. For example: precision mechanics, ne optics, and micro-electronics —> might make great cameras Amazon for example had various infrastructure competence that allow it to release products in a wide variety of markets fi fi fi fi ffi To labor the tree metaphor, the trunk and major limbs are core products, branches are business units, leaves and fruit are end products. And nourishing and stabilizing everything is the root system — core competencies

Slide 20

Slide 20

@taylor_atx Before we go further, I want to share a bit on why this topic area is something I’ve gotten in pretty deep on. As a software engineer, APIs are what powered many of the features I built. When I got more into developer experience, I saw the struggles people had when using APIs…. So I went to go work at an API design tooling company… And the OpenAPI speci cation brought me to an API integration focused company. fl fi And now I am at PlanetScale, which is a MySQL compatible cloud database. I believe databases could use a little more abstractions to make them easier to work with so developers can focus on building their business and not working in infrastructure. Databases should work seamless with our developer work ows and shouldn’t feel so painful.

Slide 21

Slide 21

@planetscaledata @taylor_atx These are the new type of problems that I am excited to work on solving. But what are old problems solved? Flexibility and scaling in operations, less maintenance overhead in some areas, improvements in security, reduction in costs, “easier to start and scale” Most of us aren’t maintaining our own mail server anymore Don’t have the tools to go about solving, we just know it is a mess

Slide 22

Slide 22

@planetscaledata @taylor_atx Where are we headed? What’s the future? “unless you’re an infrastructure company, infrastructure is not your mission.” — Charity Majors Sure, you can build up that internal expertise and it may seem useful. But every second that you send on infrastructure or other code that you could have used a service for instead, is a second you have wasted focusing on your core business goals. And even worse, it builds on itself and slowly distracts you away from the problems your business is trying to solve. And this idea is not changing, we aren’t going to suddenly start seeing a business case for these cloud services, unless that is your speci c business’ focus. fi So where do we go from here?

Slide 23

Slide 23

@planetscaledata @taylor_atx

Slide 24

Slide 24

@planetscaledata @taylor_atx Today, managing complexity is about managing the relationships between the di erent vendors, their APIs, and the components we use. This does feel di erent than before. It’s long been true that software development is as much about assembling existing code (an open source library here, a Stack Over ow entry there…), but this has become doubly true in with external services and fools. And we have to do a fair amount of work to ensure the complexity of our systems is made more manageable by managing providers of our software. ff ff fl Takes a good amount of technical breadth to do this work and it’s high impact work

Slide 25

Slide 25

@planetscaledata @taylor_atx Charity Majors: “E ectively outsourcing components of your infrastructure and weaving them together into a seamless whole involves a great deal of architectural skill and domain expertise.” It’s a both rare and incredibly undervalued, especially considering how pervasive the need for it is. ff How do you pick a vendor but also how to do push adoption internally?

Slide 26

Slide 26

@planetscaledata Vendor engineering is as much of what vendors to pick as it is what to build and what not to build. @taylor_atx

Slide 27

Slide 27

@planetscaledata And really, what is the problem does the company solve? What is the strategic value? Is it core to the mission? How does it contribute to the mission? What are you not doing? Competitive advantage when you can focus Once saw 60% @taylor_atx

Slide 28

Slide 28

• Don’t write docs • Don’t do periodic security updates • Don’t do regular maintenance • Don’t have time to fix any bugs @planetscaledata Getting something working is often easy- maintaining it is harder @taylor_atx

Slide 29

Slide 29

@planetscaledata It depends on your situation. Do you collectively have a lot of WordPress competency within your company? What features do you need? What features might you want in the future that a vendor might be building while you focus on your core business? @taylor_atx

Slide 30

Slide 30

@planetscaledata This one is a bit harder What are you actually getting from doing something super custom? How does it bene t the business? fi Custom -> service -> back to custom -> then something else @taylor_atx

Slide 31

Slide 31

@planetscaledata @taylor_atx Traditional database- in-house managed, doing your own deployments onto a cloud services provider, handling your own scaling Scalability - expanding the capabilities of your applications, some help more with this than others 24/7 team making sure the infrastructure of your database doesn’t go down Cost - operational cost including the DBAs to operate on your own is far greater most of the time Security - experts Less failures make it actually more operational manageable

Slide 32

Slide 32

Ask good questions to gauge compatibility and fit What friction you can deal with and what is a dealbreaker? @planetscaledata @taylor_atx Charity Majors: “Learn to evaluate vendors and their products e ectively. Ask piercing, probing questions to gauge compatibility and t. Determine which areas of friction you can live with and which are dealbreakers.” fi ff Of course there’s security and uptime questions, but..

Slide 33

Slide 33

Vendor future roadmaps How do they take input? How flexible are they to work with? @planetscaledata Two way relationship You won’t always have the big bucks to get their attention, but you can tell from things like support questions Parallel progress - they will be building when you are not @taylor_atx

Slide 34

Slide 34

Calculate and qualify the cost to develop and operate Get to end-to-end prototype as quickly as possible to find hidden costs Remove as much as labor as possible @planetscaledata @taylor_atx “Learn to calculate and quantify the cost of your and your team’s time and labor. Be ruthless about shedding as much labor as possible in order to focus on your core business.” If it is unreliable, that is a cost to consider Upfront costs might slew one way, but what about long term

Slide 35

Slide 35

Consider the true cost of ownership and advocate internally Manage up to executive and finance teams @planetscaledata @taylor_atx “Learn to manage the true cost of ownership, and to advocate and educate internally for the right solution, particularly by managing up to execs and nance folks.” fi Management is not going to see the incidents that never happened because you picked the more reliable vendor over another

Slide 36

Slide 36

@planetscaledata @taylor_atx To standardize use and help others understand trade o s Documenting external API challenges ff I think about the numbers of times where I write up sort of a friction log of using another API and how useful it is when someone 6 months later also working on that API has context around issues, challenges, why some decisions were made. It’s this concrete thing, it can be shared in performance reviews and levels up the team.

Slide 37

Slide 37

@planetscaledata @taylor_atx Is this work explicitly in a career ladder? This glue work makes everything else possible, but since it isn’t seen as “interesting” or “cool” as building things from scratch, it doesn’t get the same credit. Managers can elevate this glue work to help people see the bigger impact of this work. Piecing together services takes strategy and design. Often when done well it looks like someone did nothing. Like the thing just works and has few issues.

Slide 38

Slide 38

@planetscaledata @taylor_atx But the reality is, even if you do all of this well, things will break. There will be incidents and outages. Especially when we use a lot of just in time models, like payments, authentication, communication APIs.

Slide 39

Slide 39

! @planetscaledata @taylor_atx

Slide 40

Slide 40

@planetscaledata APIs will change, change is a constant I think back to the S3 Outage in 2017, the one where even the AWS status page was down because it used S3 It really feels like it was a supply chain issue, managing that incident is like managing supply chains. How quickly can you react to a disruption in a supply chain. I remember the moment at Keen where we were like, yeah, some things are broken, but not everything is broken. @taylor_atx

Slide 41

Slide 41

@planetscaledata @taylor_atx 3rd party dependencies When they break, we could break That’s where glue work is a lot more complicated and interesting, how do we build systems that handle these failures? Yeah, that API might be cheaper but if they bring our business to a grinding halt, it isn’t cheaper

Slide 42

Slide 42

@planetscaledata We need to be able to cope with failures and performance availability @taylor_atx

Slide 43

Slide 43

@planetscaledata @taylor_atx Things will go down, that’s inevitable, because change is a constant, the question is how we handle it and how we grow from it

Slide 44

Slide 44

@planetscaledata @taylor_atx

Slide 45

Slide 45

@planetscaledata @taylor_atx When you center things around the users experience and the core business goals, you often look at glue work a bit di erently ff ff Di erent set of problems to solve

Slide 46

Slide 46

• Connections • Rate limiting • Internet connectivity • Retry mechanisms • Error codes and messages • SDK quality • Authentication @planetscaledata It’s not about JUST implementing an API, a lot goes into it when it is done well • • • • • • Progressive data presentation Versioning Caching Validation Service outages Testing @taylor_atx

Slide 47

Slide 47

• • • • • Don’t disable everything! Frontend circuit breakers Non-critical services should fail silently, disable others intelligently Don’t lead users down a broken path, disable what you can Set expectations @planetscaledata @taylor_atx

Slide 48

Slide 48

• 6+ different common authentication types • How are they all being handled safely? @planetscaledata @taylor_atx

Slide 49

Slide 49

• What if someone has a slower data connection? • How do things load or fail? • Progressive data presentation @planetscaledata @taylor_atx

Slide 50

Slide 50

• API retry issues • Requires some restraint and strategy • When do you tell the user? ! @planetscaledata @taylor_atx

Slide 51

Slide 51

• Decoupling services from vendors • Important that your glue code isn’t so tightly glued • API calls directly to an API without a thin wrapper make it hard to switch vendors @planetscaledata Decoupling allows us to be able to revisit later @taylor_atx

Slide 52

Slide 52

@planetscaledata We need to make the glue work we are doing more reusable so we can spend more time on our core business focus Yes, sometimes it is very speci c, other times it be more interchangeable fi We don’t want to spend time on gluing things together all of the time though, it’s borrow time @taylor_atx

Slide 53

Slide 53

@planetscaledata When using abstractions like APIs, we didn’t get rid of the complexity Hence why vendor engineering is more important today @taylor_atx

Slide 54

Slide 54

• • • • APIs SDKs/libraries SaaS infrastructure PHP Composer @planetscaledata Abstractions - Fill the gap in developer or operator experience APIs - take something more complex and make it reusable Libraries - wrapper around something more complex to make it reusable @taylor_atx

Slide 55

Slide 55

• • • • APIs SDKs/libraries SaaS infrastructure PHP Composer @planetscaledata • • • • • • CI/CD Platform as a Service Testing Monitoring Deployment Infrastructure as Code @taylor_atx CI/CD - often pulls together a bunch of di erent scripts and tools to build something ff Composer - dependency manager, you declare the libraries you depend on, and it abstracts away what needs to be installed and installs and updates them

Slide 56

Slide 56

• Debugger • Performance profiler • Observability tools @planetscaledata A debugger does not nd the bug for you, that would be an abstraction. It shows you the stack track, call graph. Same for pro lers, they don’t make your application faster. They show you what is slow. And same for observability tools. fi fi Not all things are meant to be abstracted, but often glue code is a good candidate. @taylor_atx

Slide 57

Slide 57

• Low cost • Allowed businesses to have an online presence • Linux has historically been seen as “glue” for modern web development • Different degrees of abstraction today @planetscaledata @taylor_atx Q: Who here uses some sort of a LAMP stack? Cheap processing and storage for PHP-based services Linux and Apache are hidden today by platforms as a service ff Where are your PHP applications hosted today? AWS, Linode, Laravel Forge, Envoyer, Heroku, Somewhere else? You have di erent levels of abstractions, AWS/Linode on one end with more Platforms as a Service like the other on the other end. Some of these platforms are getting good enough that they buy vs build decision is getting harder.

Slide 58

Slide 58

@planetscaledata @taylor_atx The better the experience, the more we can focus on our core business goals even more than even just the move to competent software has allowed us to do. It builds on itself and gains us back time.

Slide 59

Slide 59

@planetscaledata @taylor_atx The dream is getting to focus on interesting business challenges, focuses on the value being provided and less the annoying development that has to be done This allows us to solve much more interesting business challenges through engineering than ever before. And I’m really excited for that future. It will not only allow us to build more, but more people will be able to build and be much more of a representation of who lives on this planet.

Slide 60

Slide 60

@planetscaledata @taylor_atx Better glue means we make better choices for what we need to integrate with, how we do it, and how we save time with it

Slide 61

Slide 61

@taylor_atx ff Question for discussion: What is a buy vs build decision that you look back on that you would have done di erently? Or that you did revisit and change? What do you wish you could abstract away and make easier to manage? Booth avatars