Editor’s note: This interview with Joe Karlsson was recorded for Coding Over Cocktails - a podcast by TORO Cloud.
According to MongoDB Senior Developer Advocate Joe Karlsson, NoSQL is a "massive term" that encompasses "basically anything that isn't a relational database", which include time series databases, graph databases, key value stores, and document based databases.
MongoDB falls under document-based databases.
"What that means is instead of saving your data in like a relational rows and columns table type format, you're saving it in documents. The key differentiator there is programmers were used to saving and working with data in JSON-like objects or dictionaries. We can save the data the way that we think about it, without having to use an ORM (object-relational mapping) to map that back and forth between data." Karlsson shares during an interview on Coding Over Cocktails, a podcast by TORO Cloud.
Because it is document-based, Karlsson argues that there could be a lot of benefits to using it, but you still need to think about entity design.
"I think that’s a common misconception with NoSQL databases and particularly document-based databases." he says.
Karlsson explains that schema design is one of the key parts that users should consider if they want to achieve faster query speeds and improve overall performance.
In a recent blog, he shares that a well designed MongoDB schema is "the most critical part of deploying a scalable, fast, and affordable database". Unfortunately, it is also one of the most overlooked facets of MongoDB administration.
"I think it's one of the things that people don't give enough time and energy to when they're developing or like working on a NoSQL database." Karlsson says. "A lot of people who complain about their MongoDB database not scaling well – nine times out of ten – it's a schema design problem and kind of making sure that they in fact try to reconsider it."
He adds, "There's very prescribed and well-researched approaches to SQL schema design. We typically do that with normalization. Most developers normalize to the third form. What that means is that, like with a relational database, your concern is not how that's going to be used – it's what data you have."
He goes on to say: "The only thing that matters is designing a schema based on the needs of your database. A schema might work for you but at a very similar application, it may be totally different for someone else…You can still map relationships and model things however you want."
As of version 4.0, MongoDB supports multi-document ACID transactions.
ACID stands for Atomicity, Consistency, Isolation, and Durability. These four characteristics are needed to formally guarantee the validity of transactions, which are groups of database reads or writes that all need to succeed, or fail altogether.
Karlsson stresses that another major misconception about MongoDB is that it doesn’t support ACID transactions.
"I think even six months ago, you could do asset transactions on shore to data clusters. So, you have data distributed all around the world. You can still run an ACID transaction on that and you can control the amount of the right concerns, how many replicated shards it goes to or you can control all that." he explains.
Finally, Karlsson also addressed the claim that MongoDB doesn’t support relational joins.
A relational join (or SQL join clause), combines columns from one or more tables in a relational database.
"In our aggregation pipeline, you can do a join – we call it a ‘lookup’ – and you can join data from separate collections databases, no problem. Relationship building is not a problem. When you're designing a schema of MongoDB, there's only two choices you have to make for every piece of data: either embed this directly in the document, or I've referenced it using a foreign key, just like you would with a relational database." Karlsson remarks.
He adds that any relationship you can model with an SQL database, you can model with a MongoDB database.
"You actually have additional flexibility because you can start embedding the data directly in it, which increases performance... In an SQL database, joins are really expensive. If I have data in two separate tables, I do a join in them. It basically pulls all those tables in the memory. Then it runs an SQL query on that joint data set in memory that's expensive time-wise and memory wise, and it can become a blocking operation at scale. But if you don't have to do that, that's a massive gain."
You can dive deeper into databases with Joe Karlsson in this podcast.
This podcast series tackles issues faced by enterprises as they manage the process of digital transformation, application integration, low-code application development, data management, and business process automation. It’s available for streaming in most major podcast platforms, including Spotify, Apple, Google Podcasts, SoundCloud, and Stitcher.