00:00:20.480
um so today we're gonna talk about microservices because before that something about me
00:00:25.680
i am a consultant from thoughtworks sync up singapore full stack engineer and i also
00:00:31.599
co-founded ideaboards which is a retrospective tool this is my twitter
00:00:39.200
handle and github handle enough about me uh what's in for you
00:00:46.239
so we're going to talk about what are microservices uh why microservices how microservices
00:00:53.039
and when do we use those microservices how many of you have heard about
00:00:59.359
microservices oh how many of you have actually used
00:01:04.879
microservices or wrote code cool um so let's define micro services uh
00:01:13.680
there is no formal definition per se but before jumping on to that let's talk about uh what are services
00:01:21.280
um it's just an implementation of a contract right you have some contract and the service actually
00:01:28.240
implements that particular contract and tells you that if if you call me with these these parameters i'm gonna do
00:01:34.159
something for you what happens if you attach a micro to
00:01:39.759
the micro service
00:01:45.200
um so micro as the name suggests it should be small we're gonna talk about what is small it
00:01:51.840
should be independent it should be self-contained it should perform on itself it should be
00:01:58.799
composable it each service should work together with all the other microservices
00:02:05.600
and it does one thing and that one it does that one thing well um that's the that's the key um
00:02:13.520
but what is the right size we talk about micro services there is a lot of conversation around what should be the
00:02:19.920
size of the service some people rate size based on
00:02:25.440
lines of code like if it's beyond 200 lines of code it's not a microservice some people
00:02:32.640
think about it in terms of team that if one person could develop that over a period of time then that is
00:02:37.760
a micro service or if it's a group of people that could work independently on a service
00:02:43.519
that is a micro service um well for us we've been working with a client and for
00:02:49.280
us what really worked out is defining those services in terms of domain
00:02:55.280
so each service does just one thing and it does that one thing right so that one
00:03:01.599
thing could take 100 lines of code or could take 1000 lines of code
00:03:10.879
tying it to unique philos unix philosophy you write programs which are small
00:03:16.560
programs we've used said that all these small little programs
00:03:22.159
and then use pipes to concatenate them and make them cohesive make them work
00:03:28.560
together in case of microservices yeah http is the new pipe because http
00:03:35.120
microservices are usually exposed over http and each service does its own job and passes
00:03:42.319
it on to the different service
00:03:48.239
it's also kind of uh going back to the object oriented philosophy i talked about there was a
00:03:54.239
talk on solid principles so taking uh talk about single
00:03:59.439
responsibility the service this service does just one thing
00:04:04.480
uh it has low coupling it doesn't it's not chatty with too many different services
00:04:10.239
at the same time it is cohesive and it's small and does that one thing
00:04:16.720
well and as the jeff jeff bay says
00:04:23.759
a monolithic application which is 100k lines of code is nothing but 101k
00:04:31.040
line of applications waiting to happen well he has used the 1k line of code as
00:04:36.960
a parameter for a microservice but you get the idea right i mean rather than having a big one of these
00:04:43.120
monolithic application break it down into smaller so that it's much more manageable
00:04:50.880
um so why do we use microservice what's what's the key objective um
00:04:58.800
rather than answering that question i would like to share my story on my journey of using microservice or how we
00:05:04.560
use microservice in one of the client projects um so i'll tell you a story uh
00:05:12.400
this happened couple of years back when we started an engagement with a client
00:05:17.680
i'll give you a little history on the client uh this is 90 year old business it's a social
00:05:24.400
gaming company it was a 90 year old business and it was quite funny to hear on the first day
00:05:30.720
that their customers were literally dying or dying out but literally dying because
00:05:36.800
of the age so yeah lots of legacy codes over this
00:05:42.080
period of 90 years they've accumulated like a lot of code acquired a few companies the tech stack
00:05:48.720
was mixed of vb scripts vb6 forms and oracle and all
00:05:55.919
crazy stuff because of these the flexi it was not
00:06:03.919
flexible the cost of introducing any change was like unfold even adding a simple feature on a
00:06:10.319
customer was like changing three or four different systems and then uh
00:06:16.000
getting that through so it was really really painful all these uh apps were
00:06:24.240
fully functional in their own silos because of the various acquires and mergers um but they each of the apps had like
00:06:31.520
concentrated complexity yes there's a lump of code sitting there
00:06:37.039
and nobody knows how it works
00:06:43.120
when he started talking more about it that's when we came with this idea that
00:06:49.680
oh this sounds like we need to build something that is small that is independent that is composable and that does one
00:06:56.720
thing so that if there are four different applications wants to do payment
00:07:01.759
it's just one service which should be doing payment right if they want to store customer information there should
00:07:07.039
be just one service holding those customer information so that's when
00:07:12.720
that's where our journey started and building the micro services what we've achieved
00:07:19.759
so far is we have 10 micro services
00:07:24.800
doing a bunch of things 25 vms in production 60 plus vms across other environments
00:07:32.160
like qa tests and performance environments
00:07:37.199
and we could achieve one click deployment across all these environments
00:07:44.720
and you can guess if it's one click deployment who does the deployment so it's product owners which queue is
00:07:51.039
anybody uh it's an interesting story that uh somebody want somebody at the client side wanted a dog to do
00:07:56.960
the deployment so press the button um so for us microservices
00:08:05.680
each of these 10 different microservices was like self-contained they had like their own db their own
00:08:11.759
contracts they were running in their own process
00:08:18.000
talking to each other through htp
00:08:23.280
but how did we start we didn't start with 10 micro services on day one right so try to solve small and valuable
00:08:32.320
problems started with small piece of functionality like customer database and
00:08:38.240
try to migrate that first start with plain old services so we didn't start we
00:08:46.399
didn't start with microservices on day one that hey this let's go with just one resource
00:08:52.399
per service or let's just go with one responsibility for service we
00:08:57.839
started with plain old service and started realizing that some things could be moved out
00:09:03.920
so when services starting doing too much over the period of time like you do object refactoring very
00:09:09.839
factored services so if service responsibility grown we extracted them to smaller ones
00:09:18.720
um so as i was telling you uh about the domain which is uh social
00:09:24.560
gaming typically uh over the web it has three or four components so that's where we
00:09:29.600
started the plain old services there is catalog which is that game catalog
00:09:35.920
customers the orders and payments
00:09:41.519
we slowly realized that customer service is trying to talk to legacy too much so start we extracted out that as a
00:09:48.720
different micro service to talk to legacy database and those kind of things
00:09:54.160
and eventually throw that away the order started growing too much so
00:10:00.640
it's abstract extracted out orders into two which is order processing uh the main order service now is
00:10:07.519
responsible for just taking orders and there is a separate service for processing the orders
00:10:13.360
and there is a separate service for resulting because it's a gaming company so after
00:10:18.560
when you play anything at the end of it you get results so there was this result service
00:10:28.320
from a high level you would still think that well it's okay that just sounds like a plain old
00:10:34.079
service right i mean i had like lot of debates like why is this uh microservice it's just it's just service
00:10:41.680
which is well designed that's it right i mean uh had hard time people asking those
00:10:47.200
questions that why why what is micro in these services
00:10:52.959
well for us it was mainly the single responsibility fact that a customer service just does
00:10:59.839
everything related to customer data and its boundaries are restricted to
00:11:05.600
just customer and it never overstepped it blocked its boundaries
00:11:11.839
um you also said each service has one resource so customer service has
00:11:18.320
just one resource uh well sometimes uh it had two resource
00:11:23.600
so especially with payments when we started dealing with payments it started having two or three resource because we had
00:11:30.000
payment method credit card and debit card which are like three resources and one service but pulling them out into a different
00:11:38.240
their own microservice would mean that breaking that
00:11:43.680
comp law of composition and the services the payment service our credit card
00:11:49.040
service and the direct debit service would become too chatty and
00:11:54.320
it would defeat the purpose of having a service
00:11:59.360
so they communicate our restful contract http and json
00:12:04.480
um so over the period of time when we were into that journey had some
00:12:11.760
thumb rules around what really uh you should do to make these services
00:12:17.200
uh micro and keep the complexity to a minimum is one top level resource as we just
00:12:24.399
talked about well in some cases too focus on contracts so that is very
00:12:30.800
important for each service that the contract should be driven made
00:12:35.839
it clear that contracts should be driven mainly based on the domain and not the client
00:12:42.959
so if the clients the consumer of the service needs certain additional information or
00:12:49.200
summary information or those kind of things then what i have seen is many
00:12:55.360
places we end up creating those service endpoints and those becomes unmanageable so we
00:13:01.760
focus a lot of contract a lot of it on contracts uh each service
00:13:08.000
had their own context and uh they're not allowed to access
00:13:13.040
data which is beyond their context anyway each of the service has their own database so they
00:13:18.639
it was anyway not possible but you made a point that even if something is accessible uh
00:13:26.000
some data is accessible and you're not it is beyond the domain so call the
00:13:32.800
service rather than directly using the data source uh avoid too much of coupling between
00:13:40.000
the services again since they are independent we try to avoid a lot of coupling
00:13:45.519
between the services and since now we have 10 different
00:13:51.519
application each of them logging in their own each of them having running in their own process it was
00:13:57.839
really important to have a sophisticated logging and monitoring framework
00:14:03.519
so that if anything breaks we get to know immediately if any of the request fails through logs
00:14:10.560
we can immediately catch those errors or exceptions
00:14:16.000
um so having said that uh following those thumb rules there were we ended up creating
00:14:22.160
few cross-cutting services which is a spa which was needed across
00:14:28.560
all the other microservices so going back to the services that we had we ended up
00:14:35.839
having a communication service because there was a lot of communication to the clients around
00:14:41.760
hey your payment is due next month welcome welcome message or maybe result message
00:14:48.160
that hey you won this particular game so each of these services had their own communication
00:14:55.440
so we abstracted out the communication the cross-cutting concerns around all the services
00:15:00.560
created a communication service and all the service needs to do is just ping
00:15:06.880
that communication service and say that hey send the communication that customer has won
00:15:12.880
and it's the communication service responsibility to figure out whether to send an email communication
00:15:18.399
or or sms communication or whatever
00:15:24.959
there were a lot of scheduled jobs across these uh services like you need to send a
00:15:30.160
weekly email to customers uh payment emails and stuff so those scheduling was part of all these
00:15:37.759
services abstracted out in a separate service so that it would just ping the service and
00:15:43.759
schedule anything on any service and we talked about error reporting
00:15:49.920
already uh so it was extracted out as a separate service so it would log a request to one
00:15:55.440
common place and could do a search and those kind of stuff on this error reporting tool
00:16:04.639
um so having this form of services uh what uh
00:16:11.440
usually hear about uh thing is service explosion um so you have these services it's
00:16:18.560
difficult to for a developer to check out them and deploy them and
00:16:24.240
things like that so how do you stay productive in spite of having these many services
00:16:31.440
um well we use ruby and rails use rails api to build the service endpoints
00:16:40.240
uh focus lot of uh a lot of our focus was on devops to make
00:16:45.519
things as simple as possible in terms of deployment in terms of setting up a dev
00:16:50.560
box and stuff um we used feature toggles
00:16:56.000
instead of feature branches because doing feature branches with us service oriented architecture is like
00:17:02.160
really really difficult and having ci and cd pipeline itself is like really
00:17:07.520
difficult um we also created a lot of client gems
00:17:13.039
for these micro services the client jumps basically the provides
00:17:18.799
easy to talk to you to the service so you it would feel like you're calling a
00:17:24.000
service in memory because it gives you a nice object you just call a service using that object and you get
00:17:30.960
a object back and a mantra was automate automate
00:17:37.679
whatever it is whatever is the repetitive process just automate everything
00:17:45.360
um so how do i make a small change and still say same i mean if we make a small
00:17:52.960
chain how do we make sure that everything works properly if that change happens the answer is
00:18:00.000
simple test it and if you're thinking of something like this
00:18:09.520
then this is the answer it's funny only when it is a joke i mean
00:18:15.600
if you're building an enterprise software or any software it's important to have tests um so
00:18:23.440
started off with unit test in each of the services whether the object within the service is
00:18:29.200
doing the right thing but then that's too obvious right i mean everybody of us right unit test
00:18:35.840
and we love our spectre the contract test which is uh is my
00:18:43.440
service doing what it should uh which is basically out of container
00:18:48.480
test so we ping a endpoint in memory service and see whether we're getting the right
00:18:54.840
response and there is uh and we test the
00:19:00.000
contracts these are basically black box block pack black box test which test the contract
00:19:09.120
send something and get a data get some response back don't worry about the implementation uh
00:19:15.679
the next is integration test the acceptance test is for uh the boundaries within within the
00:19:22.000
service itself in integration tests we test whether this particular service is behaving
00:19:28.160
nicely with other services so if this service is calling some other surveys or testing the user flow then we write
00:19:34.640
root unit test it tests the distributed effect if anything fails then
00:19:40.080
actually is the error getting reported in the errors error reporting service or if the
00:19:46.720
payment is getting uh deducted then is the kid is the customer getting communication or not um
00:19:55.520
so test async action lot of actions were async like when you're sending a communication it's all async
00:20:02.480
so we were using uh rescue for it and there's nice plug-in rescue that lets
00:20:08.080
you test async actions um so you build these micro services
00:20:15.120
uh how do we actually ship it as james lewis says we are essentially
00:20:21.679
building the complexity of building the software to actually the infrastructure
00:20:27.360
so instead of now having one application to deploy now we have to deploy like 100
00:20:34.000
applications of 1k line each so the code becomes simple easy to
00:20:41.200
understand but infrastructure is slightly complicated
00:20:46.400
so we provision use puppet solo at some point of time we would like to
00:20:51.919
use docker as well um so provisioning it begins at home so
00:20:58.000
even the dev box are provisioned so that if there is any change in
00:21:03.440
any of the versions of the software the same scripts are used across all the
00:21:09.120
environments um so script goes through ci like
00:21:14.960
application code the puppet script we're gonna see that in the ci
00:21:20.559
pipeline slide and immutable server as brian was talking about it in the morning so
00:21:27.440
it doesn't make sense for server to be mutable if you're using provisioning script
00:21:33.760
this is how our integration pipeline continuous integration pipeline looked like
00:21:39.840
we had the ui test each each of the
00:21:44.880
boxes at the top is unit tests so there's ui there is service there is
00:21:51.120
a puppet code which flows through integration uid performance
00:21:56.720
and eventually to production all this is one click deployments across environments
00:22:06.880
so with every check-in we run unit test run integration test that we've written
00:22:12.559
run acceptance test and build a package this is really important because that same version of
00:22:19.200
the package would be deployed across all the other other environments
00:22:24.480
and the most important thing is we shipped often like weekly or less than that uh just ship
00:22:30.880
whatever you have and we shipped it like fedex
00:22:38.640
um so talking about ci and cd followed that in the project what it
00:22:44.480
actually gave us is single click deployments uh we managed to get
00:22:49.600
cut down the server deployments from uh to actually three minutes so each change
00:22:56.240
would actually take like three minutes to deploy to server to production um we had a farm of
00:23:02.240
25 servers and everything just works uh their deployments and everything
00:23:08.880
just works like a charm um yeah and so easy that our product owner
00:23:14.880
does it um we made a point that since we are adding
00:23:20.400
lot of micro services and refactoring to microservices
00:23:25.440
the cost of adding any of these services should be as low as possible
00:23:31.039
so we managed to cut that time to less than less than a day and right from creating
00:23:37.919
a project to taking that but that empty project production was less than a day
00:23:44.799
so having uh talked about microservice when do we use the microservice it's not
00:23:51.840
a silver bullet right i mean it comes with a cost
00:23:57.840
so these are some of the trade-offs the benefits and the cost associated with them
00:24:02.880
so the benefit is you get small reusable and maintainable
00:24:08.080
code which are throw away you can just rewrite them and stuff but at the same time you'll have like a
00:24:14.559
complex infrastructure because you need to deploy those independent services the individual codes
00:24:21.200
each service would grow independently can divide the team such that based on
00:24:28.240
services and e-service would keep growing on their own with the teams
00:24:35.440
but on the same side on the other side the learning curve is quite huge in microservice
00:24:41.520
because now you have to deal with multiple applications and developers would find hard to know what's happening
00:24:49.120
in the other service they scale independently as they are in
00:24:56.080
their own process they have their own database you can make deployments is that if there is any
00:25:02.880
service which is not heavily loaded uh you can make a rational use of
00:25:08.960
the infrastructure rather than having monolithic app and running
00:25:15.760
fat servers at the same time there is network overhead in terms of going
00:25:22.720
through the http going through over the wire and calling those servers they have independent dbs so if there is
00:25:29.360
high load on database the database could be scaled independently but at the same time they
00:25:34.960
end up having a fragmented data and the reporting and all becomes slightly
00:25:40.840
difficult well uh that's all i had uh
00:25:47.919
questions your last comment is the perfect segue into my question which was this seems like it would make
00:25:54.080
reporting a nightmare so is it slightly difficult or is it a nightmare well
00:26:02.000
well some uh some of the complex reports become like nightmare as you said um
00:26:10.320
in some of the cases what we have uh tried out is uh dump these data into a warehouse and
00:26:16.480
start developing reports out of that uh there are some plugins with uh
00:26:22.000
postgresql where it allows you to connect to multiple databases and run sql query across databases
00:26:29.279
that worked out in some cases in some cases it was just fetch the data if it's small enough
00:26:36.000
and do it in memory so it it depends on the usage um if the data is really like really huge
00:26:42.559
then first case where you dump that in our warehouse would really would work fine how do you deal with
00:26:49.120
uh versioning the services and what services you're talking to from
00:26:54.960
presumably not something plot involved according to each other
00:27:00.080
um so as you saw the deployment pipeline it uh to start with we said okay let's
00:27:08.640
not do versioning at all in the services let's deploy all everything or nothing so rather than
00:27:15.279
picking up what needs to be deployed we made our deployment script intelligent enough to figure out if there is a
00:27:20.720
change it would deploy otherwise it won't and since uh in the deployment
00:27:27.919
in the ci pipeline you have tested that hey this version of the service works nicely with these other versions
00:27:34.159
of the service we would deploy that whole lump of all the services together
00:27:39.440
or and roll back if needed all the services together this is just to simplify so that we
00:27:45.360
don't end up having too many versions and too many
00:27:52.080
making code unmaintainable basically a couple of questions
00:27:59.600
on your client library gyms did you uh did you settle on something like there are a few
00:28:04.720
uh json schema standards that are being talked about that sort of made it easier
00:28:10.080
to like the client library to be sort of discovering the layout of the api did you use
00:28:15.840
anything like that or is it kind of that um so we used hashi gem if you have heard of that
00:28:21.360
that worked out really well where we could actually uh create the objects create the models
00:28:27.279
and stuff really nicely using a dsl
00:28:33.039
all right and although if you don't mind that have you found a need or use any tools or building tools like
00:28:39.200
any twitter adults something that it's probably solid specifically for tracing transactions that go through
00:28:45.600
multiple services like if you have some issues that come up especially with the urgent question like do you have a hard time do
00:28:51.679
you want to use something that goes through multiple services with a you mentioned a lot of logging and boundary
00:28:56.840
infrastructure
00:29:03.360
so we passed in a unique identifier from the main caller so the ui
00:29:09.279
layer which talks to the service would typically pass in a common header or which would be
00:29:15.679
a unique identifier and if this service is calling some other service it would pass on the same
00:29:21.760
identifier and the logging actually make sure that we are using that particular identifier so when you're searching in
00:29:29.039
splunk or any other log aggregator service all you need to do is just check based on that
00:29:34.399
identifier to trace where all the requests went through thank you
00:29:44.320
okay i was wondering if you how you went about testing the contract of the service
00:29:51.440
did you use any particular tools um actually our spec has it
00:29:58.559
uh our spec has a way to test the contracts uh without actually spinning up the uh
00:30:05.679
the server like we do controller test right you hit an endpoint and see if it is returning the proper response
00:30:13.520
if you do a render view on top of it uh it will actually renders the view it will actually render the json and give
00:30:19.919
you back the response so we just use plain aspect so does it actually help you test it once
00:30:29.039
the client of the server changes it breaks the client or
00:30:41.120
so so the contract test were was about testing the contracts in isolation so
00:30:46.399
you just test that services contract and if there is any change and if there is any breakage we don't allow to
00:30:52.880
promote that particular package any further there was also integration test pipeline
00:31:00.880
which tests whether this particular service is able to contact works
00:31:07.760
together with other services for that again we used our spec to test the contracts and test
00:31:16.159
its distributed effect across multiple services
00:31:23.600
okay so you're talking about did we calculate the cpu utilization
00:31:28.640
and memory overhead right
00:31:43.840
um so as i said i mean these uh since we could
00:31:50.559
break down the domain into multiple apps it was uh really interesting to you know spin
00:31:56.960
up let's say 10 uh instances of customer service
00:32:02.080
because that is heavily loaded that needs login and we don't want customers to lose out on that while
00:32:08.480
the payment service or communication service which is communication services async so just
00:32:15.840
spin up just one instance of communication service so we played around with lot of those
00:32:22.000
combinations to optimize the infrastructure usage does that that's that
00:32:29.200
answer your question okay well uh thanks
00:32:36.640
thanks anan thank you
00:33:04.799
you