Dropbox has rolled out a new search engine called Nautilus that aims to not only protect data privacy, but also pave the way for intelligent document ranking and retrieval features.
The company says that personalising search experiences for its 500 million registered users was a challenge at a massive scale and dealt with hundreds of billions of content pieces – especially when users have different search behaviours and preferences.
These challenges created the baseline for the development of Nautilus, which uses machine intelligence to support different stages in the search pipeline. These include content-specific machine learning like image understanding systems, and search result ranking that suits every user's preferences.
According to Dropbox's Diwaker Gupta, content can also change all the time, which affects search results.
"For example, think about a user (or several users) working on a report or a presentation. They will save multiple versions over time, each of which might change the search terms that the document should be retrievable by," Gupta explains.
With that in mind, the Nautilus team spent a lot of time fine-tuning Nautilus, which meant experimentation with different algorithms and techniques. The search engine was built to four key objectives:
- Deliver best-in-class performance, scalability, and reliability to deal with the scale of data
- Provide a foundation for implementing intelligent document ranking and retrieval features
- Build a flexible system that would allow Nautilus engineers to easily customise the document indexing and query processing pipelines for running experiments
- And, as with any system that manages users' content, the search system needed to deliver on these objectives quickly, reliably, and with strong safeguards to preserve the privacy of users' data.
"After a period of qualification where Nautilus was running in shadow mode, it is currently the primary search engine at Dropbox. We've already seen significant improvements to the time-to-index new and updated content, and there's much more to come," Gupta says.
The team is working on new features and ways to improve search quality.
"We're exploring new capabilities, such as augmenting the existing posting-list-retrieval-algorithm with distance-based retrieval in an embedding space; unlocking search for image, video, and audio files; improving personalisation using additional user activity signals; and much more.
Gupta adds that Nautilus is just one example of how Dropbox uses machine learning and data retrieval for the benefit of its users.
The first part of the Nautilus update has been rolled out. The second part is due to be released in October.