Summarized using AI

Import Spreadsheets in Ruby on Rails with Flatfile.com

Andy Maleh • July 05, 2023 • Montréal, Canada • Talk

The video titled "Import Spreadsheets in Ruby on Rails with Flatfile.com" features Andy Maleh, a senior software engineer, presenting at the Montreal.rb Meetup in July 2023. The primary focus of the talk is on the automation of importing spreadsheet data into Ruby on Rails applications using the SaaS product Flatfile.com. Flatfile simplifies the process of uploading flat files, such as CSV or XLSX, enabling users to efficiently import large volumes of data into databases while adhering to performance, security, and validation standards.

Key Points Discussed:
- Definition of Flat Files: Andy explains that a flat file is essentially a simple text file where data rows represent different records without explicit relationships between them, contrasting with relational databases.
- Requirement for Bulk Data Input: The need for a mechanism to allow users to upload large data sets in bulk rather than through traditional form inputs is discussed. Users can import entire spreadsheets containing thousands of records, meeting the demands of applications expecting high volumes of data input.
- Data Validation and Mapping: It is vital to validate data for correctness and to sanitize it against potential security threats, such as script injections. Flatfile.com allows users to map spreadsheet columns to corresponding database attributes, accommodating different formats used by various departments or third-party applications.
- User Experience: Andy showcases how Flatfile provides a user-friendly interface for managing data imports with visual feedback and validation, enhancing usability by catching errors before records are processed into the database.
- Backend Architecture for Processing: The talk details the architecture necessary for handling data imports, utilizing Flatfile's REST API in conjunction with background processing tools (like Sidekiq) to efficiently handle large sets of records batch by batch, which is crucial for performance.
- Customizations and Flexibility: The tool also supports various customizations for field mappings and provides features such as data hooks for further validation and transformation during the import process.

Import Spreadsheets in Ruby on Rails with Flatfile.com
Andy Maleh • July 05, 2023 • Montréal, Canada • Talk

Software systems require input in order to compute data and produce output. And, one of the ways to provide input to a software system is by uploading a flat file that contains information about various entities (e.g customers or products) in the form of a spreadsheet, following the CSV or XLSX format.

In order for a software system to properly accept a spreadsheet file as input and effectively process it into output, it usually needs to map spreadsheet columns to entity attributes to store the data in a database. And, often, there is a requirement to validate the data before accepting it (e.g. ensure a column value is within a certain range), and to apply transformations to the data before storing on records in the database. That is in addition to performance, security, usability, and reliability non-functional requirements that ensure that a user could upload a spreadsheet quickly enough while seeing progress indicators in a user-friendly experience and getting their data processed and transformed in the backend without any data loss.

Enter Flatfile.com! A SAAS product (Software As A Service) that automates the handling of all the concerns and requirements mentioned above and more in a Ruby on Rails web application.

This talk will provide a brief overview of the Flatfile.com SAAS product features while demonstrating a real world example of using it in a Ruby on Rails web application.

Montreal.rb Meetup July 2023

00:00:02.700 um okay so the topic today for uh that Montreal RB Meetup of July 2023 is going
00:00:08.400 to be import spreadsheets and Ruby on Wheels with flatfile.com that's going to
00:00:13.620 be the first half of that event the second half of the event is open Gemini Viking talks which is basically a play
00:00:21.420 on the word open Jam night at vars which which is the meaning of it is that people are going to be able to improvise
00:00:27.119 any talk they want about a ruby gem that they've discovered that they find interesting or useful
00:00:33.360 so uh let's get started with the first part uh first I'm going to introduce myself so my name is Angie Maley I'm a
00:00:40.760 full stack senior software engineer at lexoft lexstop is the digital Deb
00:00:46.739 collection agency and uh I help them by doing Ruby on Rails plus react work so
00:00:58.940 so uh to get started what is a flash file I'm gonna give this
00:01:05.880 question to the audience in case somebody knows what a flat 5 is I have no clue mm-hmm
00:01:13.140 well probably people a lot of people probably don't know what it means uh because they mostly work with relational
00:01:20.100 databases earlier in my career I have to do a some work with a lot of CSV files and they
00:01:27.659 called them flat file at the time that was about 19 years ago and a flat file is
00:01:34.380 basically usually an empty text file that you put data in grow after row
00:01:39.420 after row where the data represent different entities so for example uh I I
00:01:46.079 worked on exporting emails from Microsoft Outlook if I remember right and we exported the email bodies as
00:01:53.340 basically rows after rows would comma separated data so that that's one
00:01:58.860 version of a flat file but to go into the official definition of flat file on Wikipedia
00:02:05.360 database is a database stored in a file called flat file records follow a uniform format there
00:02:13.860 are no structures for indexing or recognizing relationships between records meaning a flat file is not like
00:02:20.099 a relational SQL database like MySQL or postgresql or Oracle database that lets
00:02:27.300 you establish relationships instead you simply list top level entities as
00:02:33.540 records that are not necessarily related to each other explicitly they might be implicitly related but not explicitly
00:02:40.860 the file is simple a flag file can be a plain text file a CSV a txt a TSB or a
00:02:47.580 binary file a binary file is like an Excel it's spreadsheet so an Excel spreadsheet is a binary flat file
00:02:55.160 relationships can be inferred from the data implicitly in the database but the
00:03:00.180 database format itself does not make those relationships uh explicit
00:03:07.560 so uh let's talk about the requirements that lent my company Alexa to actually
00:03:14.940 explore the flat file.com product so we have requirements that were very
00:03:20.220 similar to the ones that I'm going to talk about right now so we have a Ruby on Rails application
00:03:26.340 that must allow users to input data in bulk
00:03:31.440 in the form of a flat file which is uploaded by employees and other departments or third-party employees or
00:03:38.159 even end users like customers so the problem is usually if people are for
00:03:44.640 example buying a product from an e-commerce site after they select the product they're going to enter their
00:03:50.940 billing information they simply put the enter the building information to a form however if you're entering data in bulk
00:03:58.799 it means I want to import like a user wants to import a thousand
00:04:03.920 customers for example to that database all at once they're not going to use the
00:04:09.060 form they're not going to type into a form letter by letter character by character word by word in order to input
00:04:15.900 this information instead they will use a much more efficient way of importing and putting the data which is by importing
00:04:22.560 an Excel spreadsheet for example uh additionally users must be able to
00:04:30.120 that flat file columns or the spreadsheet columns into our own database entity columns so in our case
00:04:37.500 we wanted to import people into our database and basically to import those
00:04:43.680 people you need to map you know like this field mean is the first thing this field is the last name this field is the
00:04:49.919 email and the reason this is important is because different spreadsheets that are
00:04:55.199 output by other applications or other departments at companies might not have
00:05:00.419 the uniform format so the user will be able to map that format to our format of
00:05:05.520 what a what that people what a person is flat file records must be validated as
00:05:10.919 well meaning we need to validate that the thousand people in the spreadsheet that are getting imported are all valid
00:05:16.860 entities so we also need validation in bulk we also need to sanitize the data
00:05:22.740 for malicious data people can for example put in script injections in the
00:05:28.380 data we want to prevent that we don't have one we don't want to have that problem uh especially after you know if
00:05:34.800 we import a spreadsheet that had bad data and we export to another spreadsheet to be opened in Excel later
00:05:40.500 on if people put some scripts in the imported data and we export it into a spreadsheet the person who opens that
00:05:47.460 spreadsheet will execute scripts on this machine that could be malicious so that is completely prevented for us
00:05:54.620 by applying a solution to flatfile.com
00:06:01.380 um okay and then the system must apply and require Transformations and
00:06:06.780 normalizations to the data so for example if we have a client ID and we always like to represent client
00:06:13.680 IDs as capital with capital letters all caps we basically normalize the data
00:06:20.400 into all caps and I'll show you other examples of that once I get into the demo uh imported
00:06:29.160 flat files could contain thousands or tens of thousands of Records so the
00:06:34.620 system must be able to handle large loads of data with fast performance provide progress indicators in a
00:06:40.680 user-friendly experience process data in the background in a reliable fashion
00:06:46.340 and finally the non-import process must be fully secured with end-to-end encryption
00:06:55.139 so next let's proceed into the architecture
00:07:00.180 of an application that is taking advantage of a flatfile.com import process so
00:07:07.500 basically the first thing you would want to do so this is a sort of a uml
00:07:12.780 communication diagram this style of diagram and communication diagrams usually will label each arrow with a
00:07:20.639 number so that will give you an idea of how of the sequence so number one we're going to show a flat
00:07:26.460 file dialogue to the user on the web browser number two the user is going to import a
00:07:32.160 spreadsheet it could be a CSV or an XLS file which will send that data into
00:07:38.639 platform.com at flat file.com once it's imported the data it will give us a flat file import
00:07:46.740 ID or a batch ID they call it a batch ID this ID will then the JavaScript code in
00:07:53.340 the front end will actually forward it to our server there will be a real server which will store the import job with its ID
00:08:00.180 finally in the database a relational database like postgresql and then
00:08:05.940 finally the database will load that data into a background worker job and we actually use sidekick
00:08:14.280 to do that so we have a sidekick worker that will basically pick up a job from the database
00:08:19.860 and we'll finally download all that imported data from flat file and then it will convert it into the
00:08:26.699 format that we expect in our database tables and we'll score records in our database
00:08:32.580 so in a nutshell that's the full architecture of an application that's taking advantage of flyfile.com or in
00:08:39.120 order to import a spreadsheet and then store it in the database web server database like a Rubio rails
00:08:45.899 web server uh so I mean in the demo that I'm going to cover today I'm going to use
00:08:53.339 that flatfile.com portal V2 this is not the latest version there is V3 and
00:08:58.680 there's even newer versions than that but what I'll be covering today to give you an example is the V2 version
00:09:06.240 and uh so on the front end it's the V2 JavaScript SDK on the back end it's the
00:09:12.720 flat file.com rest API um which is consumable from a Ruby on
00:09:17.880 Rails backend like web server
00:09:23.279 so first I'm going to give you a very quick run through of how the flight file experience feels to a user so first they
00:09:29.640 get this dialog that lets them add data in bulk could be anything it could be adding uh like here I'm adding people
00:09:37.200 but you could be adding products you could be adding um emails it could be adding anything that
00:09:43.500 you exported into expressions so um next you will upload it and it does
00:09:49.560 give you a nice friendly experience that shows you progress of upload and then it asks you which spreadsheet
00:09:56.760 sheet do you want to import so you select one and then finally it gives you the
00:10:02.220 columns that you want to map to that model in the backend database so for example
00:10:08.339 if a user is a French user they call that email Colin kuvianu but
00:10:14.160 they want to import it to email or the first name is not going to be called first name it's going to be depend on French so this will enable
00:10:21.420 people to basically map The Columns one by one into what the database model attributes
00:10:29.760 music not only that they can even select uh
00:10:34.860 so values as well they can we basically will give them an
00:10:41.100 expectation of what kind of value do we expect for a locale so here for example the first name is
00:10:47.940 the free value could be any text you want but a Locale has to be a specific value so the flat file experience will
00:10:55.140 enable people to figure out that they can offer one of those values like French English or f e or one two Etc
00:11:05.880 finally import the data and it will display to the users a preview of the
00:11:11.700 data in JavaScript in the iframe that the fly file experience lives in so that way
00:11:18.779 people will see validation errors for example those fields are required but
00:11:24.000 they're missing from the spreadsheet so then people will know right away that they have some records that are
00:11:30.060 valid so then they will correct it they will either delete them or they or they will type text into them
00:11:38.640 and some of our customers actually praised that feature a lot they told us
00:11:43.800 that we helped them catch many issues that they used to not catch till they've imported all the data into the database
00:11:50.399 at which point it's a bit too late now we help them catch all of those issues in advance
00:11:55.920 so that way they don't waste time reporting everything first catching the issues later and then going back again
00:12:01.800 that gives them a much shorter feedback cycle and finally they continue they click the
00:12:07.620 continue button and that submits the data thankfully if there are rows that they did not fix flat file will give
00:12:13.980 them the option to discard those rows and submit data minus the invalid data so that's also another feature that
00:12:20.640 customers like before I continue I saw somebody raise their hand you know I was wondering
00:12:27.620 the file directly on the Webster yes yes so you're asking if they can edit this
00:12:33.779 data in the spreadsheet preview yes this is fully editable this is almost like you having Excel
00:12:40.260 in the web page which is awesome
00:12:46.399 say we are uploading a CSV file so you should require that uh we'll have separated values for each columns and uh
00:12:54.360 the whole data for all the rules so is it required that we have to order in the same way that we have for the database
00:13:00.540 with the duration database yes usually CSP format is expected to be consistent that if you have five values separated
00:13:08.639 by commas you need to have five values on every row or if you're leaving a value off you would put two commas next
00:13:15.120 to each other well to the audition it's the same order exactly yes you have to be the same order as well
00:13:23.760 no she's asking about CSV files whether they need to have attributes in the same
00:13:29.880 order on every row of the cspa and the answer is yes they have to be
00:13:36.480 uniform okay however they don't have to be in that specific as we could have you know you could have
00:13:42.660 the last name the first name oh yeah yeah because it doesn't matter because
00:13:48.360 it recognizes it based on the the header and then you you want to associate first name to
00:13:55.320 email minus the validation problems you're going to get you can if you want and that was one of the reasons why
00:14:01.320 we went to this is because like Andy said a lot of our customers have very different ways of exporting data okay
00:14:07.500 and you hear you might have ml you might have first name you might have a main customer and Fe you might have any
00:14:15.540 number of different different column headers that would require a whole lot of manual
00:14:22.019 code base updates every type of support whereas this just kind of yeah so to clarify the answer uh they have to be
00:14:29.339 consistent but they don't but we don't require a specific order for the columns they could be an in order but as long as
00:14:35.279 the data is consistent within that order then it's affordable okay do you know if
00:14:40.680 I found checks for double engines for website double entries yes double
00:14:45.720 entries is a feature that you can add if you need it which is you mean uniqueness
00:14:51.300 validation making sure the data yes it supports it so yes it does support a uniqueness
00:14:57.660 validation so let's go uh continue now into so now I showed you the front end
00:15:02.760 of flag here's the back end the back end is flat file has a rest API
00:15:07.800 that has a lot of different calls I'm not going to go over them by the way to clarify today's presentation is a 30
00:15:13.740 minute presentation it's talking about what what you can do with but it's not a tutorial of how to do
00:15:19.920 everything in order for you to learn how to do everything I'm only going to give you the headlines and then you have to
00:15:25.440 do your own research and study the apis and sdks and fun but this should give
00:15:30.839 you a very good idea I mean I already showed you the full architecture of how to build that flat file.com application
00:15:37.620 that gives you a very good idea to how to get started so uh so here what matters to us is only
00:15:44.100 grabbing the batches of data that got uploaded so we grab Pros so when you
00:15:50.160 have a spreadsheet it has rows and what we do is on the back end in our sidekick worker the background worker we grab the
00:15:57.000 rows and batches so we divide the work into batches of 500 or 1000 at a time or
00:16:03.600 100 you can decide what the batch size is and then we grab the rows batch by
00:16:08.820 batch and we process them and we convert them to our database format so this is what we use for now it's the flash file
00:16:15.180 rest API and this is just the Swagger page for it Swagger is basically what what got me
00:16:22.620 named recently to open an open API so that's the open API documentation for it
00:16:28.079 and this is the the call that we care about the most the one that grabs the rows so we can give it a batch ID uh and
00:16:35.699 then we tell it you know first we grab the first hundred so this skip is going to be zero and take is going to be 100.
00:16:42.180 when we process the second batch it's going to be start from 101 and then also
00:16:47.459 take 100 etc etc that's how you back you process the data batch by batch
00:16:52.920 and that enables us to so no record the reason why bashing is very important is because
00:16:59.040 thousand rows you don't want to grab all 20 000 rows at once on the network uh the the
00:17:04.919 sidekick worker will take too long to process at my time out over that work uh
00:17:10.380 so you will break the data down and graph 100 by 100 or 200 by 200 that way
00:17:15.839 you will not face any issues with timeouts over the network
00:17:27.600 no no you will process the data row by row I'm just talking about you you will not grab all 20 000 rows at once at a
00:17:35.160 time you can do that
00:17:40.260 yes yes it's a trade-off no no it's a trade-off so yes if you if the size is
00:17:46.799 too small you will make many Network requests so then you will waste Network performance it will take longer for you
00:17:53.340 to make all those costs yeah so yeah
00:17:58.679 yeah um okay so let's talk about the implementation so first you I'm gonna
00:18:05.400 talk about the JavaScript implementation using the flavor sorry the basic
00:18:10.799 JavaScript way of doing it not using any special libraries so uh using the basic
00:18:16.500 JavaScript way there's just a flat file recorder class just have to uh import it first into
00:18:22.919 your file and then construct it like instantiate it API key
00:18:29.100 some options and then finally um you once you're ready to present the
00:18:35.580 dialogue to the user you will invoke the request data from user that's it that's
00:18:41.039 the overall structure what you have to do in JavaScript however most of the work
00:18:47.160 happens in options so options gives you the way to provide the user with mappings
00:18:53.400 from different columns into different attributes so basically you need to declare to the user what are we
00:18:59.280 importing we need to import first name last name email media client ID and
00:19:04.799 maybe Locale like language so you need to tell the user that we're importing these in order for the user to map their
00:19:10.919 columns into those attributes so that's where most of the work happens so I'm going to dig into the options now
00:19:16.799 so the first thing and options that you might fields which help us map the spreadsheet
00:19:22.980 columns to model attributes so those model attributes will get stored in a ruby on Wheels application in active record models basically
00:19:30.900 basically and you can have that the type of data you have to specify a type of data it could be a free string like a
00:19:37.559 first name last name it could be a select box it's kind of like the language options like en or e or F or
00:19:44.880 one or two for language and then finally it could be a checkbox meaning a true or false value
00:19:51.200 the match strategy uh do you want to match exactly on the name of the column
00:19:56.460 or do you want a flat file to do a guess of it a fuzzy guess so fuzzy matching
00:20:02.400 could be helpful sometimes uh and then finally you can add basic validations like is this field required
00:20:09.000 or not is it unique or not that answers your question you can make stuff feels unique uh does it match the special
00:20:14.220 regex so one one place where this is very helpful the right next match is if
00:20:20.160 you have a special format for client ID or for email you want to make sure the email is a valid email so you would use
00:20:26.460 the regex for that uh finally this is very useful field alternates meaning
00:20:33.059 you can pre-predict what what columns customers might use and that and then
00:20:39.480 flat file will Auto map for them the first time they see the dialogue they
00:20:44.580 can change the mapping but FIFA will make a best guess so for example for us we feed it we tell its email can be
00:20:51.299 email in English or kohia in French and that way if a customer is a French speaking customer they will get it
00:20:57.600 mapped right away they will not have to manually map it because we guessed that of course if a customer has a very
00:21:02.940 special field name like a business email of the HR department of
00:21:10.440 experience like that that's a very custom name then they will have to map it manually
00:21:15.600 90 of the cases we we can guess what field uh what column they're going to map into our attribute
00:21:22.380 uh finally we can do the same thing with values so value alternates let us do the language possibilities thing like a
00:21:30.419 French could be F or FRN or frca meaning that Locale and all of them will map to
00:21:36.059 French so here's an example of you can basically configure the options either
00:21:42.419 in a yaml file in the back end and it will be on Wheels application here we have it in an ammo file or you can
00:21:48.659 configure it in a Json in JavaScript but in the end it's going to get past to the front end and it's
00:21:54.840 going to get passed in the form of JavaScript to the file options attribute but this is an example of like mappings
00:22:01.380 like we have first name last name email for example match strategy is exact we
00:22:06.659 have this is required required and this one has a regex validation and finally
00:22:12.360 all the alternates like people can have an email underscore address column for example we want it to map to email
00:22:19.440 Etc and these are examples of Locale and client number
00:22:25.260 um so this is how you do alternates on the values so I showed you alternatives on the
00:22:31.320 fields but also you can do it on the values and that's pretty much it
00:22:37.200 so um next is data hooks data hoax lets you do field hooks record hooks or step hooks
00:22:44.340 which let you do your own custom validation so I can do my own custom violation or transformation on fields
00:22:51.659 on on one field in a spreadsheet or I can do it on full records
00:22:57.179 or I can do step hooks which is logic that executes upon changing the page in
00:23:02.340 the flat following process so this will let you do anything you want if you want to add more advanced validations and
00:23:08.760 normalizations on the data finally you can customize the text you
00:23:15.179 can switch it all to a different language so you can support different multiple languages we support English
00:23:20.220 and friendships like so you can support whatever languages you want and you can customize the text tool
00:23:27.059 as well you don't have to use the translations that platform gives you and also you can customize the color
00:23:32.460 response patterns margins the borders whatever so you can customize the theme
00:23:38.039 finally as far as the back end we have the flat file rest API we process the
00:23:43.919 data and background workers we access it using for example HTTP party HTTP party
00:23:49.440 is a good tool for accessing the rest apis you pull the data in batches
00:23:54.900 and you create active models from the mapped raw data meaning flat file after it extends the user mapping from the
00:24:01.980 spreadsheet it'll give you the map data so then you can use them right away to build your active records
00:24:07.260 you can even do your own extra processing on them in the back end if you want inactive records as well but I
00:24:13.260 mean this is like an example of us calling out using HTTP HTTP party to a
00:24:19.260 flat file and then we use our own class that will uh
00:24:25.980 take those batch batches that grows and process them so and it'll take a batch
00:24:31.200 ID a batch size and a start Ed to process them in batches but like I
00:24:37.200 mentioned I'm not going to show you all the details here but I'm giving you a general idea of what we do
00:24:42.900 so let's get to the final part of the presentation which is the
00:24:48.900 assistant so um actually never mind I have the demo open here there you go so this is an example
00:24:56.400 where I have a list of contacts and I want to add more context to them I'm going to add more people
00:25:02.460 so for example right now if I search for John Smith
00:25:07.679 I'm not going to find anybody so now let's import a thousand John Smiths
00:25:14.460 examples you and they're gonna their names are going to have numbers in them to make sure each number each name is unique so
00:25:21.419 they're not going to be the same exact person but uh I'm showing you the experience so see it's it already
00:25:27.120 predicted all the mapping automatically it predicted will map to email you know to first name
00:25:34.980 Etc uh it even gives me the the alternate values here so then now I can
00:25:41.100 review the data notice how fast the experience is that was a thousand girls they got important instantly
00:25:46.980 um if noticed immediately that those four columns are missing data so I I can fill them and let me show you if I fill
00:25:54.000 this one the the error goes away because it revalidates so unfortunately I'm Gonna Fill the manly
00:26:01.740 and here this one is a drop down the language so I can click here and then select English for example
00:26:08.640 um yeah so let's hit continue now oh by the way I can scroll I'm just gonna show you all the data this is like
00:26:14.640 having Excel opened in a browser it's amazing I can I think Scrolls see we have a thousand and one rows including
00:26:21.900 the header so that's a thousand uh rows so now if I click continue are you ready
00:26:27.720 to submit yes so it gives me a nice friendly experience while it's uploading and then
00:26:33.360 it finished uploading so now the data is getting processed in the back end so now if I hear search for John Smith
00:26:41.460 it has 981 contacts because it processed 981 of them already of course if I uh
00:26:50.100 update it one more time now it has all thousand contacts it just finished processing so that's
00:26:56.159 demonstrating how the back end is doing some work too and there there you go this is all the data that are just
00:27:01.320 important to show you one more example I'm going to show you an important spreadsheet that has a 10 000 so that's
00:27:07.980 10 times what I just showed you and I want to show you how fast it is so right now it's importing 10 000 people
00:27:15.779 so there you go it just finished loading them
00:27:21.120 and then mapping them and then boom it's that fast it's amazing
00:27:26.340 that's why we love flat file again I can submit them and we'll report
00:27:31.860 them but in the meantime and yes it will give us a nice progress indicator so
00:27:37.200 that way I know exactly how much time is passing while it's uploading it uh so that way even if you have a lot
00:27:42.779 more data like say 20 000 at least the user is seeing progress which is excellent
00:27:47.820 uh so yeah that's pretty much it um actually let me get back to the
00:27:53.400 slides first before I take any questions um so that's pretty much it we went
00:27:59.279 through the definition of flat file requirements architecture the SDK and API the implementation on
00:28:06.779 the front end on the back end and finally the demo so that concludes that presentation
00:28:12.000 um any questions yeah I have a question yes go ahead and yes so that you were feeling like a
00:28:18.960 void field yeah but does it maintain any calculations that are based on that
00:28:25.260 field over those spreadsheet um I believe the answer is yes it doesn't maintain any calculations from
00:28:31.860 spreadsheet uh from what I remember yes it actually will execute those calculations
00:28:38.820 uh we actually wanted to prevent certain scripts from executing them so what we
00:28:45.000 did is we added a flat file data hook to sanitize the data and remove any
00:28:50.880 of spreadsheet scripts so that way we make sure nobody will execute any malicious scriptures so that's another
00:28:59.159 nice feature of platform let me talk a bit about reasons why we moved the platform
00:29:05.760 why did you choose this based on compared to what we had before
00:29:11.640 what what issues will be solving sure so so one thing I I forgot to do is I
00:29:26.539 so uh uh so so yeah
00:29:31.679 I'm a senior software engineer that was working on one of the team the platform team on like stock when we implemented
00:29:37.679 the flat file uh integration and Michelle selected the solution for exactly that question that he just asked
00:29:43.740 me so do you want to answer the question Michelle to be very fair I think
00:29:50.220 it's okay you can upload it initially so I just did something just read the assessment so
00:29:56.340 but we had an important process before that we built ourselves that we initially built when the app was
00:30:03.240 catering to customers that were reported very very small businesses
00:30:10.860 so a lot of it was running entirely in the front end was not necessarily patching or
00:30:17.720 asynchronously processing the rows Michelle is the CTO of Lexa by the way
00:30:22.740 for those and so at you know a list of 20 or 50 you can
00:30:29.279 easily run that tremendous on idea that's not great but the capacities nobody's over 20 minutes for the process and then
00:30:36.539 we started catering to much bigger customers who had like and we said a thousand five thousand ten thousand
00:30:42.240 dollars and then the process broke down which you can't run that at the front end you can't have a user waiting well five
00:30:49.320 thousand rolls would be processed and also the memory requirements of the Cena
00:30:56.100 we got kubernetes architectures orchestrator which skills required skills that are
00:31:01.740 just running nuts with memory and performance requirements and
00:31:08.399 because we have a lot of customers like Andy pointed out that had very different export processes I've had completely
00:31:16.700 obscure and inconsistent naming schemes to the columns show you every time they
00:31:23.100 had a new colleague if you wanted to Auto map we have the hyper-based changes
00:31:28.559 assignment from a weird color names first name or email letter
00:31:33.960 very cumbersome not ideal because you have to make copious changes to deployment every time you're a new customer
00:31:39.000 and also not easy to have a flexible front end
00:31:45.240 experience for them to be easily able to move a column here and change a field and update this and drag and drop that
00:31:52.260 over there so we initially built it ourselves we had a lot of works
00:31:57.899 memory issues came in as well and the bigger the lists the more issues as we
00:32:03.000 were managing ourselves and at some point we went to a tool like this and and solved
00:32:09.419 a lot of our issues three any other questions
00:32:16.799 yeah maybe one question regarding from the client's point of view does this fulfill all their requirements in terms
00:32:23.279 of security I this can be sensitive data right I mean
00:32:29.460 the client wants to make sure that the data doesn't going in One Direction so yes I believe
00:32:35.700 everything is encrypted into SSL so all the data that passes between the import process from the browser into
00:32:42.179 the flat files encrypted so it is it also whenever you use a tool that's
00:32:48.779 getting some of your sensitive data and you're going to do a vendor assessment which you did what do you identify what
00:32:54.000 their own data encryption processes are with their own hiring processes are so you make sure you have like an ISO or
00:32:59.820 whatever certification or soft2 certification with things like
00:33:06.240 timely security audits automated code scans to make sure that they have a good
00:33:18.360 we did also in there was there's a there's an automatic
00:33:25.019 there's an automatic process to delete any important data after a certain amount of time
00:33:32.399 in whatever case you want right so you don't have you know the vendor holding on to like five years
00:33:38.580 worth of people
00:33:44.659 hdpr compliant software so there are software compliant
00:33:50.039 which which was one of our requirements as well
00:33:57.179 any other questions it's so at the end of the day it's going
00:34:05.039 to work like I know it's good database right like any data that I will flash if only one record our data goes into a
00:34:12.179 relational database relational so so basically we're using active
00:34:18.060 record rails on top of a single quotas and what happens if one of the record let's say from 10
00:34:24.240 000 has a phone number as the other record they don't have it like like how
00:34:29.339 is the stronger it's gonna return only the fields that are mapped by the customer are important the ones that are
00:34:34.980 not mapped are ignored so if they have a CSV file or spreadsheet that has 50 columns and they only need to map 10 of
00:34:42.119 them or 20 of them yeah the rest are important it's not a problem okay
Explore all talks recorded at Montreal.rb Meetup
+6