You're 21-34 years old, have an Android phone and like football, computers, and video games. You enjoy cycling, have a bachelor's degree, and rent your apartment with your wife.
The above may not be accurate but is the type of information Google's ad service collects and stores about you. That's not including the data Google collects from other services, including your name and birthdate, recent searches, the websites you've visited, where you live and work, what time you turned your lights on, and the videos you watch.
Whether you trust Google or not, the data it gathers in the name of advertising can be quite unsettling. While you can use anonymizing services and delete your data regularly, a more convenient solution may be to make your own search engine via tools like Searx, YaCy, and others.
Whether its web, document, or application search, a self-hosted search engine is typically easy to set up and more private. Today we'll be taking a look at some of the best options for self-hosted search.
Searx is among the most popular privacy-respecting search engines, largely because it ties in search engines users are already familiar with. As a meta search engine, it lets users gather results from Google, Bing, DuckDuckGo, Yandex, Yahoo, and more.
Despite using popular search services, Searx doesn't track, profile, or store the cookies of users, or share these with the search engines it gets its results from. There are various public instances of Searx, but users can also self-host an instance on a VPS. Searx can further be configured to use a proxy or Tor to enhance privacy. It's also open source, docker-compatible, and easy to install.
YaCy is a decentralized Java-based search engine that doesn't share user requests but allows users to use a crowdsourced peer-to-peer index for their web results. It cuts out both search censorship and advertising, while allowing users to manually crawl and index pages and topics they expect to search for often. It can be installed on Windows like any other application, or via its bundled script on Linux.
Your out-of-the-box experience with YaCy, however, may vary. Its index is not nearly as big as a centralized service like Google, and some have reported results displaying in a somewhat useful order.
That said, if you search for quite specific, niche topics, YaCy may work out better as you can manually crawl sites. YaCy can also be a strong search alternative for use in applications, intranets, or shared filesystem.
Elasticsearch is a Lucene-based search engine that's designed less as a Google alternative and more geared towards site, application, and document search. Open-source and supported by Apache Software Foundation, it's incredibly fast, easy to use, and can be customized to suit a wide range of use cases.
It can also be utilized for a self-hosted web search engine however, through setup with Docker and crawling via Apache Nutch. Naturally, a regular user would not be able to efficiently build an index of the entire web, but they can create a database of their most-used sites for fast, customized results.
Originally an in-house CNET Networks project, Apache Solr was donated to the Apache Software Foundation in 2006. It has seen significant improvements since then, and now boasts hundreds of features and powerful search capabilities.
The highly scalable and fault tolerant nature of Solr has led to its use by some of world's biggest websites, including search engine operator DuckDuckGo, Salesforce, and BestBuy. Like many of the other solutions mentioned here, it can be installed via Docker and customized to suit your needs.
While self-hosted search isn't going to match the features of search giants any time soon, there are a growing community of enthusiasts and solutions that want to help people de-Google their lives.
YaCy, which takes a distributed approach to search, is particularly interesting. Though it won't be a suitable replacement to centralized search for most people, further adoption and development could propel it to such heights. In the meantime, Searx provides workable, familiar results while addressing some privacy concerns.