The Firebase Blog: query

"Why is my Cloud Firestore query slow?"

August 8, 2019

Todd Kerpelman
Developer Advocate

This article originally appeared in the Firebase Developer Community blog. We like saying lots of impressive things about Cloud Firestore's performance -- "performance scales with the size of the result set, not the underlying data set", and that "it's virtually impossible to create a slow query." And, for the most part, this is true. You can query a data set with billions upon billions of records in it, and get back results faster than your user can move their thumb away from the screen. But with that said, we occasionally hear from developers that Cloud Firestore feels slow in certain situations, and it takes longer than expected to get results back from a query. So why is that? Let's take a look at some of the most common reasons that Cloud Firestore might seem slow, and what you can do to fix them. Reason #1: It's the data, silly! Probably the most common explanation for a seemingly slow query is that your query is, in fact, running very fast. But after the query is complete, we still need to transfer all of that data to your device, and that's the part that's running slowly. So, yes, you can go ahead and run a query of all sales people in your organization, and that query will run very fast. But if that result set consists of 2000 employee documents and each document includes 75k of data, you have to wait for your device to download 150MB of data before you can see any results.

How to make this faster The best way to fix this issue is to make sure you're not transferring down more data than you need. One simple option is to add limits to your queries. If you suspect that your user only needs the first handful of results from your employee list, add a limit(25) to the end of your query to download just the first batch of data, and then only download further records if your user requests them. And, hey, it just so happens I have an entire video all about this!

If you really think it's necessary to query and retrieve all 2000 sales employees at once, another option is to break those records up into the documents that contain only the data you'll need in the initial query, and then put any extra details into a separate collection or subcollection. Those other documents won't get transferred on that first fetch, but you can request them later as your user needs them.

Having smaller documents is also nice in that, if you have a realtime listener set up on a query and a document is updated, the changed document gets sent over to your device. So by keeping your documents smaller, you'll also have less data transferred every time a change happens in your listeners. Reason #2: Your offline cache is too big So Cloud Firestore's offline cache is pretty great. With persistence enabled, your application "just works", even if your user goes into a tunnel, or takes a 9-hour plane flight. Documents read while online will be available offline, and writes are queued up locally until the app is back online. Additionally, your client SDK can make use of this offline cache to avoid downloading too much data, and it can make actions like document writes feel faster. However Cloud Firestore was not designed as an "offline first" database, and as such, it's currently not optimized for handling large amounts of data locally. So while Cloud Firestore in the cloud indexes every field in every document in every collection, it doesn’t (currently) build any of those indexes for your offline cache. This means that when you query documents in your offline cache, Cloud Firestore needs to unpack every document stored locally for the collection being queried and compare it against your query. Or to put it another way, queries on the backend scale with the size of your result set, but locally, they kinda scale with the size of the data in the collection you're querying. Now, how slow local querying ends up being in practice depends on your situation. I mean, we're still talking about local, non-network operations here, so this can (and often is) faster than making a network call. But if you have a lot of data in one single collection to sort through, or you're just running on a slow device, local operations on a large offline cache can be noticeably slower. How to make this better First, follow the best practices mentioned in the previous section: add limits to your queries so you're only retrieving the data that you think your users will need, and consider moving unneeded details into subcollections. Also, if you followed the "several subcollections vs a separate top level collection" discussion at the end of my earlier post, this would be a good argument for the "several subcollections" structure, because the cache only needs to search through the data in these smaller collections. Second, don't stuff more data in the cache than you need. I've seen some cases where developers will do this intentionally by querying a massive number of documents when their application first starts up, then forcing all future database requests to go through the local cache, usually in a scheme to reduce database costs, or make future calls faster. But in practice, this tends to do more harm than good. Third, consider reducing the size of your offline cache. The size of your cache is set to 100MB on mobile devices by default, but in some situations, this might be too much data for your device to handle, particularly if you end up having most of your data in one massive collection. You can change this size by modifying the cacheSizeBytes value in your Firebase settings, and that's something you might want to do for certain clients. Fourth, try disabling persistence entirely and see what happens. I generally don't recommend this approach -- as I mentioned earlier, the offline cache is pretty great. But if a query seems slow and you don't know why, re-running your app with persistence turned off can give you a good idea if your cache is contributing to the problem. Reason #3: Your zig-zag merge join is zigging when it should zag So zig-zag merge joins, in addition to being my favorite algorithm name ever, are very convenient in that they allow you to coalesce results from different indexes together without having to rely on a composite index. They essentially do this by jumping back and forth between two (or more) indexes sorted by document ID and finding matches between them.

But one quirk about zig-zag merge joins is that you can run into performance issues where both sets of results are quite large, but the overlap between them is small. For example, imagine a query where you were looking for expensive restaurants that also offered counter service. restaurants.where('price', '==', '$$$$').where('orderAtCounter', '==', 'true') While both of these groups might be fairly large, there's probably very little overlap between them. Our merge join would have to do a lot of searching to give you the results you want. So if you notice that most of your queries seem fast, but specific queries are slow when you're performing them against multiple fields at once, you might be running into this situation. How to make this better If you find that a query across multiple fields seems slow, you can make it performant by manually creating a composite index against the fields in these queries. The backend will then use this composite index in all future queries instead of relying on a zig zag merge join, meaning that once again this query will scale to the size of the result set. Reason #4: You're used to the Realtime Database While Cloud Firestore has more advanced querying capabilities, better reliability, and scales better than the Firebase Realtime Database, the Realtime Database generally has lower latency if you're in North America. It's usually not by much, and in something like a chat app, I doubt you would notice the difference. But if you have an app that's reliant upon very fast database responses (something like a real-time drawing app, or maybe a multiplayer game), you might notice that the Realtime Database feels… uhh… realtime-ier. How to make this better If your project is such that you need the lower latency that the Realtime Database provides (and you're anticipating that most of your customers are in North America), and you don't need some of the features that Cloud Firestore provides, feel free to use the Realtime Database for those parts of your project! Before you do, I would recommend reviewing this earlier blog post, or the official documentation, to make sure you understand the full set of tradeoffs between the two. Reason #5: The laws of physics are keeping you down Remember that even in the most perfect situation, if your Cloud Firestore instance is hosted in Oklahoma, and your customer is in New Delhi, you're going to have at least 80 milliseconds of latency because of that whole "speed of light" thing. And, realistically, you're probably looking at something more along the lines of a 242 millisecond round trip time for any network call. So, no matter how fast Cloud Firestore is to respond, you still need time for that response to travel between Cloud Firestore and your device. How to make this better First, I'd recommend using realtime listeners instead of one-time fetches. This is because using realtime listeners within the client SDKs gives you a lot of really nice latency compensation features. For instance, Cloud Firestore will present your listener with cached data while it's waiting for the network call to return, giving you the ability to show results to your user faster. And database writes are applied to your local cache immediately, which means that you will see these changes reflected nearly instantly while your device is waiting for the server to confirm them. Second, try to host your data where the majority of your customers are going to be. You have the option of selecting your Cloud Firestore location when you first initialize your database instance, so take a moment to consider what location makes the most sense for your app, not just from a cost perspective, but a performance perspective as well. Third, consider implementing a reliable and cheap global communication network based on quantum entanglement, allowing you to circumvent the speed of light. Once you've done that, you probably can retire off of the licensing fees and forget about whatever app you were building in the first place. Big exciting conclusion goes here! So the next time you run into a Cloud Firestore query that seems slow, take a look through this list and see if you might be hitting one of these scenarios. While you're at it, don't forget that the best way to see how well your app is performing is to measure its performance out in the wild in real-life conditions, and Firebase Performance Monitoring is a great way of doing that. Consider adding Performance Monitoring to your app, and setting up a custom trace or two so you can see how your queries perform in the wild.

Better Arrays in Cloud Firestore!

August 9, 2018

Todd Kerpelman
Developer Advocate

Arrays haven't always been the best data structure for multi-user environments like Cloud Firestore. As Kato describes in the "Arrays are evil" section of his blog post, bad things can happen if you have multiple clients all trying to update or delete array elements at specific indexes. In the past, Cloud Firestore addressed these issues by limiting what you can do with arrays. That means that until now, you could really only "update" arrays by replacing the entire array (no appending or deleting!), and you couldn't perform meaningful queries on arrays, either. This was problematic for those of you who wanted to use arrays in simple cases like keeping a list of tags or keywords. Previously, we've recommended you try a workaround (some might say, a "hack") of converting your arrays into maps like this:

Well, with our latest improvements to arrays, none of this is necessary! For starters, we've added the ability to query for elements within arrays using the new "array-contains" feature. This means you can keep your elements as an array, and easily query for them without having to resort to the map hack.

Even better, you can query for items in arrays that aren't strings, which was a problem with the previous "convert your array into a map" workaround. You also have the ability to add or remove elements from an array. But in order to avoid some of the issues that can arise in a multi-user environment, you'll be adding them with more of a set-like functionality. So rather than asking to delete an item at index 3, you would ask to remove, for example, all elements of the string "sly" with the arrayRemove operator.

With the arrayUnion operator, you can append an element to an array, but only if it doesn't exist in the array already.

Adding "clever" to our array doesn't do anything, because it already exists.

But adding "fuzzy" adds a new element to the array! These changes also come with some improvements to security rules as well. Now you can create security rules that allow queries based on whether or not a certain element exists inside of an array. So doing things like querying for a list of documents, but only allowing that query if the user is listed inside of the documents' "viewers" array is significantly easier than before. This kind of query request is now possible in Cloud Firestore All of these features should be available with the latest client SDKs, so make sure you update to the latest versions of your libraries, and start having fun with arrays! As always, if you have questions, you can join the Cloud Firestore Google discussion group, or use the google-cloud-firestore tag on Stack Overflow.

Sort and Filter in the Firestore Console

June 19, 2018

Susan Goldblatt
Software Engineer

The Cloud Firestore data viewer in the console is a great place to view and update data. You can watch in real time as documents and fields update. We all know that Cloud Firestore scales to huge amounts of data automatically -- but what about the data viewer? Until now, it was hard to navigate through a big dataset. To solve the problem, we added a new feature that lets you order and filter right in the data viewer.

image showing Cloud Firestore data viewer in the console with red circle around filter icon and arrow pointing to it

We think this will be especially useful in two scenarios: Sorting by a field. Let's say you have a field last_updated on all of your documents in a collection users, and you want to see the documents that were updated most recently. Just open the menu, choose the field last_updated, select Descending and click apply.

Image showing Cloud Firestore data viewer in the console with menu opened and update_time inputted into the field input box

Image showing Cloud Firestore data viewer in the console with the filter applied and documents ordered according to update_time ascending

Finding a specific document. Perhaps you have a collection of users which has the fields email and last_updated, and someone tells you they are having a problem with their account. Using the filter menu, input the field email and add a condition (email == "test@gmail.com") to instantly find that user's document.

Image showing Cloud Firestore data viewer in the console with menu opened, add filter section opened and ‘== is equal to’ selected with ‘test@gmail.com’ typed into input field

Image showing Cloud Firestore data viewer in the console with the filter applied and a single document with ‘test@gmail.com’

These are just a few ways that you can use the new menu. We hope it helps you browse large datasets with ease.

Introducing Query-based Security Rules

January 26, 2018

Tom Larkworthy
Engineer

Securing your Firebase Realtime Database just got easier with our newest feature: query-based rules. Query-based rules allow you to limit access to a subset of data. Need to restrict a query to return a maximum of 10 records? Want to ensure users are only retrieving the first 20 records instead of the last 20? Want to let a user query for only their documents? Not a problem. Query-based rules has you covered. Query-based rules can even help you simplify your data structure. Read on to learn how! The new query variable Security rules come with a set of variables that help you protect your data. For instance, the auth variable tells you if a user is authenticated and who they are, and the now allows you to check against the current server time. Now, with the query variable, you can restrict read access based on properties of the query being issued. messages: { ".read": "query.orderByKey && query.limitToFirst <= 100" } In the example above a client can read the messages location only if they issue an orderByKey() query that limits the dataset to 100 or less. If the client asks for more than 100 messages the read will fail. The query variable contains additional properties for every type of query combination: orderByKey, orderByChild, orderByValue, orderByPriority, startAt, endAt, equalTo, limitToFirst, and limitToLast. Using a combination of these properties you can restrict your read access to whatever criteria you need. Get the full reference and see more examples in our docs. Simplifying data structures Another benefit of query-based rules is that they make it easier to manage a shallow data structure. In the past you might index your items location by a user's id. { "items": { "user_one": { "item_one": { "text": "Query-based, rules!" } } } } This structure made it easy to query and restrict item reads on a per-user basis. { "rules": { "items": { "$uid": { ".read": "auth.uid == $uid" } } } } This is great because your user's items are secured, but it requires you to index off of the user's id, potentially replicating data or complicating your client code. With query-based rules, you can now get the same security without the nesting! { "rules": { "items": { ".read": "auth.uid != null && query.orderByChild == 'uid' && query.equalTo == auth.uid" } } } The above rule will restrict any read on a per-user basis without indexing by the "uid" key, which means you can write a simple query to retrieve a user's items without sacrificing security. Now the data structure is reduced by one level: ``` { "items": { "item_one": { "text": "Query-based, rules!", "uid": "user_one" } } } `` db.ref("items").orderByChild("uid") .equalTo(auth.currentUser.uid) .on("value", cb) The above query will retrieve all the items belonging to the currently authenticated user. If a user tries to access items that don't belong to them, the read will fail. When should I upgrade? Query expressions are new feature that can be used in parallel with your existing rules. If you find that query expressions can improve your security or simplify your structures you can upgrade to them incrementally. Give it a try! Query-based rules are available for use today. Go to your Firebase console and use the emulator to give them a whirl. Check out our official documentation for more information. If you run into any problems make sure to check out our support channels or hit up the Firebase Slack community.