• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
LiveCloud architecture and a look at the future
#1
Hi John,

CanelaDB stores data locally in a single array. This array encapsulates the entire project for a single region. The data is stored to disk in clusters. Clusters are a collection of records that have a defined number chars of their recordIDs in common.

-LiveCloud structure as seen in LCM
An account contains the following:
 Projects
  Tables
   Keys (columns)
    Data

-LiveCloud structure in terms of VMs
Each region consists of many VMs that hold our data. The following core sections are in walled-off VMs: Developer account management, Metric & Billing data, Users (cdbUsers), App Data, BLOBs. The region can scale both vertically and horizontally.
 Vertical: VMs can always get improved resources (RAM, CPUs, Disk) as needed.
 Horizontal: More VMs can be added to a region as needed.

-Some useful facts
 -No region is aware of the other regions.
 -We have developed software that analyzes the regions in near real-time to determine the general health of all the VMs. This software is responsible for building new VMs, moving data from one instance to another.
 

When the toolkit is exported, LCM builds the following structure for your project.
CanelaDB folder
  • Config folder with config file
  • Database folder with database data stored locally
  • Libraries folder with CanelaDB libraries

When considering locally stored data, the following array is created:
-Structure of local array
 dataBaseArray
  tableID
   cluster
    list of recordIDs that fit in the cluster
     individual recordID
      app keys
       app data
      cdb keys
       cdbDateCreated
       cdbDateModified
       etc...

-Limitations
 -Mobile devices have less RAM available than desktops. Thus, it is essential to sync data using defined recordIDs if your data is growing beyond what mobile devices can support.
 -Arrays consume more memory than the actual char count of the data being stored. The LiveCode indexing for each array consumes more RAM but makes lookups very fast. Thus, your data is not stored as a one-to-one relationship.
 -The instances that reside on the cloud side of things is also subject to these considerations. But, we have much more extensive systems to store all of this data. We are continuously developing improved methods that will ultimately improve the database architecture. Some incremental improvements are already in place and are being tested in isolated regions. Using RAM as a storage medium is expensive for the LiveCloud service as RAM costs much more than disk storage. We have plans to improve on this further.

-Future considerations
We are planning many additions to the architecture to improve scaling for large data sets, assignable processes, project tracking/analytics, application support for multiple regions, relationships, developer-defined indexing, sharing projects with other accounts.

Some of these topics fall into the architecture side of things, while others will have a front end API to improve access. It is important to note that these are complex systems to develop, test, improve, and test a lot more before we can make them publicly available. Thus, they take time to prepare. We are not committing to the development of any of these future technologies by discussing them here. This is not an exhaustive list of what is written on whiteboards in our studio. I am openly bringing them to this discussion to demonstrate that we are looking at our next generation of improvements. All this said we enjoy watching people dig this technology. It is exhilarating to see its adoption grow. We will do everything possible to bring some of these to light as soon as possible.

Scaling
We have been testing accessing data from our cloud-side cache system. NurseNotes has been using this on and off for the last 15 months. Improvements to the system have allowed us to rely on it for the previous 6 months. Cloud reads are faster and scale to more simultaneous hits for data. This improvement will make it to the LiveCloud regions very soon.

Project tracking and analytics
We have been using this feature to help us track down NurseNotes performance issues. The feature allowed us to identify areas where we could do better. It could be used for tracking how your clients use your software and provide the timing of code. From the data collected, you can make critical decisions based on real data from your apps. The feature programmatically can place the triggers for you in common areas. If your code is encrypted, LCM will ask you for your passcode so it can crawl through your code. You can optionally place triggers anywhere you want. We have developed a robust data visualization view. You can see a simple example of it in the Account/Usage section in LCM. The data collected from this feature can be varied and quite extensive. We have not dedicated time to flesh the rest of this feature out. This feature will be released when it is ready.

Application support for multiple regions
We figured this would come up eventually. We do not need it ourselves at this time. It has not been prioritized because we notice that the majority of development appears to be geographically bound. I think this could be developed quickly. We have discussed the requirements. There aren't any plans to start this right now.

Developer indexing, assigned processes, and more caching
We have been planning this for a few weeks now. Our goal is to improve scaling, distribute power to those that need it, and generally lower the cost of providing the LiveCloud service. We have already worked out a caching system, as previously discussed for NurseNotes, that brings us closer to moving forward with the other ideas listed here. 

Assigned processes would allow a developer to buy one or more agents to process their data. Currently, all transactions are queued to be processed by shared instances. You could fast track your processing with dedicated agents that have less responsibility than instances have today. A given instance is responsible for an enormous amount of tasks. They are capable of understanding and working on all possible transactions. The agents would focus solely on a given teamID's projects. Multiple agents could be ready to pounce on traffic as it scales upward. We could even scale down as traffic subsides.

Taking the load off of RAM will lower costs. And, it forces us to develop the supporting technologies to meet performance expectations. Allowing developers to choose which data should be in indexed RAM and which does not need to be there will be the first step.

I hope this in-depth look is useful to everyone. If this generates questions, please share them with us here in the forum.
  Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)