500X Memory Reduction In Rails With Postgres Schemas And Apartment

Summarized using AI

500X Memory Reduction In Rails With Postgres Schemas And Apartment

Ilake Chang • October 12, 2017 • Selangor, Malaysia • Talk

In this talk delivered by Ilake Chang at RubyConf MY 2017, the main focus is on a significant memory reduction achievement in a production server using Postgres schemas and the Apartment gem. The video outlines the circumstances surrounding a severe memory issue faced by the speaker's organization, which was related to the use of AWS EC2 for server management and resulted in excessive memory usage. Notably, they achieved a remarkable 500x reduction in memory usage through strategic restructuring of their database schema and improving their architecture. The session covers the following key points:

  • Background and Context: The speaker introduces their company and context for the talk, explaining how their e-commerce platform was affected by memory consumption issues.

  • Initial Findings: Early development testing indicated memory usage dramatically reduced from 400MB to 800KB when simulating 10,000 schemas, leading to interest in a scalable solution.

  • Root Cause Analysis: The speaker explains how they traced memory usage to a large database in production, which significantly differed from the staging environment. Issues were linked to the operational needs of a multitenancy structure using Postgres schemas.

  • Key Tools and Strategies: Various tools such as Medusa and New Relic were used to monitor memory consumption, and strategies focused on optimizing garbage collection and request handling. Techniques to test memory structure duplicated real production requests in staging environments.

  • Development Approaches: The speaker rationalized the choice of the Apartment gem for multi-tenancy and indicated various adjustments in the source code of Rails and PostgreSQL settings were necessary to handle the influx of records without causing memory bloat.

  • Lessons Learned: The speaker emphasizes the necessity of returning to foundational analysis rather than simply relying on existing tools when addressing serious technical issues. The problems of migration in multi-schema structures and the need for potential refactoring for future scalability were also highlighted.

  • Conclusions: Chang stresses the importance of recognizing architectural choices and their implications on scalability and performance, advising against overly complex schemas that could impede future growth. The experiences shared serve as a cautionary tale for effectively managing data architecture in production environments.

Ultimately, the session combines theoretical insights with practical applications to highlight significant memory management techniques in a multi-tenancy environment using Rails and Postgres, encouraging developers to think critically about their database structures.

500X Memory Reduction In Rails With Postgres Schemas And Apartment
Ilake Chang • October 12, 2017 • Selangor, Malaysia • Talk

Speaker: Ilake Chang

Website: http://rubyconf.my

Produced by Engineers.SG

RubyConf MY 2017

00:00:11.120 Lester hello everyone and in the
00:00:14.129 beginning I want to sense to everyone
00:00:16.830 give me this opportunity to share my
00:00:19.259 experience here and I give me the
00:00:21.509 opportunity to come to Malaysia it's my
00:00:24.210 first hand to here and the topic I want
00:00:30.449 to talk today is a real case happened in
00:00:33.809 our production server we meet a very
00:00:36.360 serious memory problem although he could
00:00:39.780 be used money to resolve because we used
00:00:43.500 AWS ec2 and in the beginning we just
00:00:46.440 endowed to the server in that energy
00:00:48.449 because in our server we have more
00:00:51.750 memory but there is their problem
00:00:55.430 because it's unreasonable to use this
00:00:58.559 tremendous memory so we still need to
00:01:01.500 resort it and in the end the result we
00:01:05.280 have 500 memory reduction this is the
00:01:09.150 result and the structure we use is post
00:01:11.970 query multi-schema structure and they
00:01:13.980 will use apartment gen so today I will
00:01:17.670 share the process of finding the new
00:01:20.070 host and how do we solve this problem
00:01:22.850 including how many memory reduction and
00:01:26.010 white the I need to do list and how they
00:01:29.070 I do is and why the I'd learned from
00:01:31.770 this I just introduce our company of
00:01:38.190 ours we are fine a newbie we wish we are
00:01:41.460 fine times for facial distance and that
00:01:43.710 we always speak in full sustainable for
00:01:46.800 the weak community to be programmer
00:01:48.960 community and we also do some has also
00:01:51.270 seen a nun I and entity programming
00:01:53.909 course and the Tucson technical
00:01:55.890 consultant and we hold a lot Ruby
00:01:59.130 community we organize many kinds of
00:02:01.200 events in Taiwan and this the we were
00:02:06.540 hope Ruby and I this year conference
00:02:09.810 Taiwan next year and now is going for
00:02:13.019 paper so if we want you curve submit
00:02:15.870 your proposal now and let's our website
00:02:19.430 was I think
00:02:21.520 yeah and we hope that we have a true
00:02:27.670 case in our office and I said they they
00:02:30.520 also provide the world notice last day
00:02:32.710 they are watching watching us
00:02:35.290 yeah just like east we celebrate case
00:02:38.980 birthday and there's the fake fish this
00:02:42.700 lake has a favorite toy and about me
00:02:48.190 I can't spawn Taiwan and I also a rest
00:02:50.830 contributor and now I am dressed
00:02:52.720 developer at 500 and I also attended
00:02:56.320 consultant for genetic somatically
00:02:59.170 ecommerce platform which will allow
00:03:02.560 anyone want to say anything I and now
00:03:05.590 they are focused on India market so now
00:03:11.170 back to a topic now on I today I will
00:03:14.590 show is how do we resolve this problem
00:03:18.670 in our production
00:03:19.780 what's the improvement whether I do is
00:03:22.390 how do I do this and the yti then the
00:03:28.180 first first improvement I was sure the
00:03:31.030 status data in our development and the
00:03:33.430 potential environments in my local
00:03:36.640 development environment i stimulate
00:03:39.000 10,000 schemas the memory used to be
00:03:42.040 from 400 megabytes to 800 KB is about
00:03:47.580 510 reduction this with our I I do some
00:03:53.260 page on a trailer unless the static data
00:03:58.330 we will spend my tool to detect the
00:04:00.700 memory usage which always out the page
00:04:02.910 you can see is about four hundred and
00:04:08.770 sixty two kinds different we saw with
00:04:13.330 our page and in our group attachment
00:04:17.380 server on our production server now is a
00:04:20.190 given date time is about act histones
00:04:23.350 that 80,000 schemas and the memory from
00:04:26.950 a GB to 300 and be this hour in our
00:04:31.840 reattachment server
00:04:34.060 and it's fortunately we resolved the
00:04:38.600 memory problem finally because when we
00:04:41.630 made this problem is too serious in a
00:04:46.910 time we will need to restructure our
00:04:49.670 pack 610 structure if we can resolve
00:04:52.670 this problem and our development process
00:04:54.980 will be delayed about two or three
00:04:57.350 months is a big news for a style company
00:05:00.740 so since God raised us we faced this
00:05:03.410 problem so and what I need to do this
00:05:07.690 because we mean a very serious problem
00:05:10.190 memory problem in our production server
00:05:12.890 we will be proven on all production
00:05:15.650 server and always in Dutch easy to scare
00:05:18.470 is not good solution and during they
00:05:20.990 turn one Puma Walker comes consume a GB
00:05:25.100 is not my sense is used too much memory
00:05:28.450 so we need to figure out height so in
00:05:33.290 the how the idea is I just talked about
00:05:36.770 insistence background we are ecommerce
00:05:39.590 platform so and then we post on India
00:05:42.230 Marcus now because we sing there are a
00:05:44.510 lot of potential user in India and we
00:05:47.420 use post query metaschema structure and
00:05:49.760 the user part mention it's very
00:05:51.950 convenient to view a multi-tenancy
00:05:54.080 system with this tool and so we use
00:05:59.210 multi-tenancy structure I see much but
00:06:01.970 hidden in solution is very common
00:06:03.800 basically your sister is a Sexton you
00:06:07.850 must ever seen about multi-tenancy
00:06:09.920 because that is provided by every tenant
00:06:13.520 and as in a key point obsessed is to
00:06:17.000 isolate different tenant
00:06:19.040 so as an apartment journey is a good
00:06:21.470 choice because its separate different
00:06:24.080 tenant and different schema theory and
00:06:27.200 let you go develop Martinez structure
00:06:30.320 very quickly and how do I find a memory
00:06:36.050 the key point
00:06:38.920 the tool which we have ever used to my
00:06:43.550 GC new Redick steak pork and we used to
00:06:47.960 Medusa to optimize the Looby garbage
00:06:50.660 decoration and use new reticle to
00:06:52.940 monitor and atomize our cistern and
00:06:55.790 monitor our memory consumption in every
00:06:59.450 request and used a pro to detect memory
00:07:02.540 Constitution is the Pacifico
00:07:04.700 and we also use go in this tool to
00:07:09.170 duplicate medallion request on our
00:07:12.050 staging server to test because we want
00:07:14.810 to use the same requests and a monkey
00:07:17.210 request by after using of this tool all
00:07:22.070 results are good no problem no memory
00:07:25.520 leak they tell us our calls are good but
00:07:29.210 we really make a very serious mental
00:07:31.490 problem in our production server by all
00:07:35.420 this to us say there are no potentially
00:07:37.550 no problem in our sauce Cole
00:07:39.520 so it's so we're everything we do
00:07:43.840 pricing we do this still helpful because
00:07:47.120 we know our call may be no problem so I
00:07:52.220 seen we need to return to benchmark
00:07:54.580 compilation we shall return to a basic
00:07:57.740 analyze the problem first instead of
00:08:00.710 using those two I mentioned before so I
00:08:03.980 start to sing some point because we make
00:08:06.830 a very serious memory problem so I seen
00:08:09.770 is the environment we noticed they only
00:08:14.780 happen on our production server and not
00:08:17.870 happen under staging server so any
00:08:20.570 production and its request we use cope
00:08:24.440 we use gold to duplicate all real
00:08:27.440 production requests on stage it but is
00:08:30.170 there no problem on stage normally don't
00:08:32.750 memory issue and the Saudis turn
00:08:36.500 accumulation you see where accumulate
00:08:40.400 over time
00:08:40.940 no he will now community consumed a lot
00:08:44.270 of memory who insist and initialized so
00:08:47.540 as in the only related is the
00:08:50.510 environment so I start with
00:08:53.080 why's the kidney from point between
00:08:55.750 production and a stager and a sneaky
00:08:58.870 point is the database size because we
00:09:02.019 have big database in production and
00:09:04.810 maybe just 1% of production there
00:09:07.570 besides in a stager so I start to create
00:09:10.510 a big fake database in my local
00:09:13.269 environment and then I found little
00:09:16.120 cause that they have a size because when
00:09:19.360 I create big database if my local
00:09:21.459 environment he will consume a lot memory
00:09:23.880 so the problem can chorus kanae to our
00:09:27.730 big database he will come to a lot of
00:09:30.430 memory so the problem becomes what is
00:09:36.670 internal consume this tremendous memory
00:09:39.160 used to impose way schemas charger so I
00:09:42.850 need to trace the rest source code to
00:09:44.529 bigger high so I start to trace a
00:09:47.860 traitor to figure out why does the
00:09:50.260 operate and I find whenever an inquiry
00:09:52.899 is
00:09:53.380 kilted he would do something Lewis will
00:09:57.820 remember the type object which based on
00:10:00.130 the OID of the restarting column type
00:10:02.140 another restarting comment is from cos
00:10:04.779 square feet we type table as syndra's
00:10:07.779 need to do this is because there is not
00:10:10.120 only connect one title by database he
00:10:13.089 could use my psychopath quest ago server
00:10:15.190 and every type of different ways are
00:10:18.010 different so less while words need to
00:10:20.170 break off this type and this the coal of
00:10:27.070 the 80 record high system even though
00:10:29.829 you have different schemas is the
00:10:32.110 alcohol detection and remember all this
00:10:34.480 type because first need to remember of
00:10:40.420 these things of course us were preload
00:10:43.540 of this in one go
00:10:45.250 instead of check perc Rory
00:10:48.610 because it's expensive trapper Croix he
00:10:51.970 will reduce your performance the other
00:10:56.110 native will be an additional
00:10:58.089 international with upset by this will
00:11:00.699 also reduce the performance res will not
00:11:03.790 do this and he will cause potential
00:11:06.259 and press one Cori problem in a
00:11:08.569 tiebreaker service don't want to do this
00:11:13.359 so what can the page to cept there's a
00:11:17.989 lot of memory and you realize now we
00:11:21.019 know this tremendous memory comes
00:11:22.789 opportunities because us want to
00:11:25.069 remember whole Cullen's type No so let's
00:11:29.149 start to look white this page could
00:11:31.939 reduce a lot of memory reduction this is
00:11:36.529 the part of the source code which I want
00:11:38.869 to modify because in the major low
00:11:41.119 additional types he was like oh ID from
00:11:44.509 fidget high elbow computer table and the
00:11:49.669 LAN this power source Cole
00:11:51.319 they were sure how rest various
00:11:53.059 different type a Sango ID this I hope
00:11:56.689 this song related source code hi make
00:12:00.649 initializer schema statement and they
00:12:03.259 are based Desmond is all related this so
00:12:08.749 we know that the cost is one pretty high
00:12:12.019 table because we need to know the OID
00:12:14.959 from the pitch high table so I start to
00:12:18.679 submit twice the pivot table in post
00:12:21.109 gray post questing sing table is a
00:12:24.559 campus ID tie because he could represent
00:12:27.470 whole table structure so he is there
00:12:31.039 type and a composite a is automatically
00:12:34.160 created for each table in the database
00:12:36.429 to represent the log structure of the
00:12:39.529 table so if we have a lot of table post
00:12:44.899 query where Auto create a lot of concept
00:12:47.119 I on the pivot high table and then there
00:12:49.669 will be a lot of breakers in the pivot
00:12:51.769 high table and the Lewis will save a lot
00:12:54.889 of PG light data in a memory so there's
00:12:58.100 no cost a lot of table a lot of makers
00:13:01.100 in the pivot high table us will use a
00:13:04.220 lot of memory because we use Marta
00:13:07.039 schema structure so we will have a lot
00:13:11.479 of table so rest well use a lot of
00:13:14.539 memory this is a low cost
00:13:17.970 so and I'm finding the current CEO
00:13:21.310 statement is not in too much post-quake
00:13:23.800 type data from PG KY table
00:13:27.010 and they are allowed to redundant type
00:13:29.110 for a tiebreaker because we are
00:13:30.930 multi-schema we are metaschema structure
00:13:34.210 every schema structures tables allison
00:13:36.910 so they are redundant tables in PD type
00:13:39.820 so now we need to do is try to reduce
00:13:43.330 the rakers from the PGI table we have
00:13:46.810 two way to handle is first is modifying
00:13:51.340 the Postgres 18 because I said the
00:13:54.190 Postgres aping sating he called disable
00:13:56.530 otaku a concept I am PGI table when
00:13:59.710 creating a new table so he will reduce a
00:14:02.710 lot makers in a pivot table and the
00:14:05.770 second is modified real-estate he'd wake
00:14:07.780 her source code just don't get too many
00:14:10.120 takers from PG table so we we start to
00:14:14.940 make a page an 80-acre we need to do is
00:14:18.730 filter some table concept I from pivot
00:14:22.420 high table here we have two step
00:14:27.960 modifications the first we use this
00:14:31.270 thing because we are multi-schema
00:14:33.670 structure we have a lot of duplicate
00:14:36.160 tables because all the tables and each
00:14:39.670 schema Alison so we call use this thing
00:14:42.550 to filter redundant tables in a second
00:14:46.450 we edit settings and rails Hilary's risk
00:14:49.690 which could tie with table ties we want
00:14:53.110 to even know and this page could reduce
00:14:56.470 a lot of career acres from pivot high
00:14:58.690 table for example in my local
00:15:03.340 environment I have 10,000 schemas in a
00:15:07.000 database there will be four hundred
00:15:09.900 twenty thousand records from the old 80
00:15:14.290 record
00:15:15.130 sequel statement and the rest will set
00:15:18.040 all these data in the memory and then
00:15:21.820 with this page we only select one
00:15:23.800 hundred six three records was only need
00:15:27.430 to accept this 100 records in a memory
00:15:30.550 so he's a bit different and we also
00:15:34.750 implement the cage mechanics that we
00:15:39.130 need to implement this is because we
00:15:41.410 mere problem and psyche with the page
00:15:44.350 every psyche workers need to ask you
00:15:47.110 this xico statement when neutralized if
00:15:50.050 you have ten psyche workers you need to
00:15:52.750 is cute and hands and this curry is an
00:15:55.720 expensive glory he needs to take a lot
00:15:58.660 long con so we implement a cage
00:16:01.630 mechanism to remember the result luckily
00:16:04.390 no need to askew on each workers every
00:16:06.790 turns and this request through a rails
00:16:11.860 the first is the user distinct pull
00:16:15.160 request is emerged by an Allen revert I
00:16:19.180 think it's because it's not a general
00:16:21.730 solution maybe will help seung-jae
00:16:25.120 effect and assure because it's okay
00:16:27.790 in our battalion at least and this pull
00:16:32.980 request is I address 18 andreas
00:16:35.430 terrorists which table types we want to
00:16:38.649 ignore and what they are then from this
00:16:45.820 process although we solve the problem I
00:16:50.760 start to sing why other people don't
00:16:54.130 make this problem why to rest not handle
00:16:57.430 this problem as in the due course is our
00:17:01.750 system structure maybe it's not a salute
00:17:04.870 design post square mattis schema can use
00:17:07.720 in this way because we have too much
00:17:10.780 schemas so maybe it's the problem but we
00:17:17.530 know this structure is found this boost
00:17:19.720 multi-tenancy which to us he used this
00:17:22.809 structure to an apartment is a very very
00:17:26.020 famous gen he also used this structure I
00:17:28.929 sent this to famous boos and the Germans
00:17:31.360 also used a structure - he should be no
00:17:33.850 problem but in fact he he caused a very
00:17:38.559 big serious problem
00:17:40.890 and now they are still some problems in
00:17:45.450 our structure just like rest initial low
00:17:50.010 we're taking on cotton because we have
00:17:52.830 too much schema to mainstream us and we
00:17:57.450 have also had my great program because
00:18:00.450 they are a lot of schemas if we will
00:18:02.340 migrate sequin usually he will take a
00:18:05.010 long time and we also make the oval AWS
00:18:09.750 IDs fire in this name a problem because
00:18:13.620 we have too many table over 500,000
00:18:17.220 tables over the oil obvious ideas
00:18:19.890 fighting this limit in a case all the
00:18:23.160 monitor - Bowl and ideas and obvious
00:18:26.370 ideas ok are good all the status are
00:18:29.730 good but we can do any migration we just
00:18:34.530 got the aerial non specs in a post
00:18:37.230 Square log but all the other is RDS
00:18:40.770 monitor status are good so we just
00:18:43.500 called edible ticket to help us and
00:18:46.530 landed here us there are too many tables
00:18:49.950 in your database in your IDs server and
00:18:52.950 then over at aureus ideas wire in this
00:18:55.590 limit now we need you need to reduce
00:18:58.950 your table mount real obvious doesn't
00:19:02.490 meet this problem before so this white
00:19:06.210 allows all a novice monitor - Paul okay
00:19:09.330 I couldn't no no any issue and I start
00:19:13.530 whistling because we are still not a
00:19:15.419 very big cistern I would mean a lot of
00:19:18.179 problem even over a Darwin's ideas
00:19:21.360 fighting the enemy so as in the due
00:19:23.460 course is our Citroen structure post
00:19:26.460 square multi-schema
00:19:27.510 children used on this way so we can keep
00:19:33.480 the feathering we want to change our
00:19:35.940 second multi-schema structure to only
00:19:38.850 one or two schema structure refactor is
00:19:42.450 very important so let me give you a
00:19:45.330 example to describe our situation we
00:19:48.630 already we have already finished the
00:19:51.210 primary skeleton but we find the current
00:19:54.600 and allow us to build only four-story
00:19:57.929 house however your final goal is to be
00:20:01.919 worse skyscraper so you plan to keep
00:20:05.250 building based on the original skeleton
00:20:07.740 but this won't work most people was
00:20:11.640 thick to lay original skeleton since it
00:20:14.640 already is six and look great at the
00:20:17.370 moment but this the beginning of your
00:20:20.789 hard work actually you are supposed to
00:20:23.970 restart and we build a skeleton and the
00:20:26.940 best man which little a bill for
00:20:29.370 skyscraper for sure you can you can keep
00:20:33.030 some folk medical and the eg to use in
00:20:35.640 tests by definitely need to rebuild a
00:20:37.950 skeleton it's difficult to handle one
00:20:40.799 skyscraper with Poe story house skeleton
00:20:44.340 because building history they missed the
00:20:47.039 growth inertia you should simply give up
00:20:51.690 all we build a skeleton during the
00:20:53.970 development when is needed so kid with
00:20:57.390 veteran is our situation now if any why
00:21:03.960 interest on this topic you can contact
00:21:06.090 me via this email thank you
00:21:14.530 thank you so much Riley so before we get
00:21:17.480 to our questions I have a question how
00:21:19.160 many tables do you have
00:21:26.090 if you can count them it can be right
00:21:28.430 one it's like one two three users now we
00:21:32.690 have about 200,000 schema so one schema
00:21:42.410 maybe have I'm not sure maybe forty
00:21:45.970 forty or fifty table how do you onboard
00:21:48.770 new developers like here take the next
00:21:53.090 three years learn the schema because we
00:21:55.430 Teddy's prot you would head is we are
00:21:57.950 with Hades for your Fung other company
00:21:59.840 and the other company have already
00:22:01.400 Billy's charger right yeah so that's why
00:22:05.000 I say we need to keep you fated two
00:22:08.420 hundred thousand I think you need a new
00:22:10.760 job alright guys we're gonna have like
00:22:13.610 some questions massive scale any
00:22:17.090 questions up yes Nick
00:22:23.670 Postgres table inheritance have you ever
00:22:26.190 explored that approach to reducing
00:22:29.750 the number changes you need to change
00:22:32.930 during migration
00:22:36.660 and my other question is because I don't
00:22:39.270 know the October code
00:22:41.380 maybe arms you know this so that's that
00:22:44.049 query that series the PG type when you
00:22:46.720 have table inheritance in Postgres do
00:22:50.770 you wind up with one type for each table
00:22:53.200 or is just a parent table
00:22:58.520 that this question is what type for each
00:23:01.770 table one table one huh
00:23:04.650 yeah because he needs to record every
00:23:08.370 table even though they are different
00:23:10.350 schema so one type one type one table so
00:23:14.280 you won't have a lot of paper they will
00:23:16.380 have a lot of exams we have so nurse
00:23:19.110 will come solid out of memory
00:23:20.340 okay watchmaker oh that one question
00:23:36.710 about my question we use paralyzed
00:23:40.380 migration we not do migration sequencing
00:23:47.030 all right any other questions from the
00:23:50.520 crowd
00:23:51.470 yep spin so now you are using RDS for
00:23:55.049 the DB right so let's say I am not using
00:23:58.230 RDS and I have only one so let's say I
00:24:06.299 only have one server right and I use
00:24:09.120 this logic multi-tenancy so do you think
00:24:11.160 it's dangerous because if there's a
00:24:13.530 failure in that server so I don't have
00:24:16.559 access to any tenant anymore right so do
00:24:19.410 you think like it's dangerous to use
00:24:21.780 this multi-tenancy on only on one server
00:24:25.700 yes because we only use one area so just
00:24:30.450 only what but so fast in future we were
00:24:32.940 separated whichever separate maybe
00:24:35.870 mm he's dead Yost so I think because
00:24:39.540 your artistic something have been
00:24:40.920 through the RDS you still have my backup
00:24:42.480 right yeah but if I have only one server
00:24:44.670 and they don't have that backup for a
00:24:46.620 day no we're having a MIDI alley
00:24:50.220 yes all right
00:24:54.750 any more questions chop chop chop
00:25:03.490 if not that we will adjourn for life but
00:25:06.670 for that one more our applause thank you
Explore all talks recorded at RubyConf MY 2017
+16