edit: This is a fancy way of saying: self-hosted (with more diving into the software).
Technical Self-Reliance, what does that even mean? Simply put, relying on yourself, your skills and existing open-source software instead of using managed services / software (where is pragmatic or may warrant a home-grown solution or hell if the challenge is fun enough). That being said, the irony is not lost on me by nature of this being written & published on Medium.
My interest of this series is to talk about how to replicate some common / up-n-coming services with more open-source alternatives, learning how to use those alternatives, and maximizing the utility of those alternatives. This series as a result of 2 massive things: privacy & open-source software. Managed services are inherently murky areas when it comes to privacy, who is not to say that the data received can’t be used by that company or organization. Furthermore, the power and prevalence of open-source software is not something to take lightly, and having technology that rivals functionality offered by big tech companies as a tool can only bolster one’s skills, abilities and self-reliance. Lastly, the argument is not that managed services are bad and not to use them, it is instead to offer a solution to the small and niche population that is looking for some help (or at least provide some new ideas to a solve a similar problem).
The best way to keep a secret is by not revealing it in the first place. Sounds simple enough to make privacy sound reasonable, until one realizes how difficult being truly privacy is. Everything done on the Internet has the potential to be recorded. Even using privacy-respecting technologies is not enough, as there is still the potential to be susceptible to a de-anonymizing attack.
Privacy or more aptly, the lack thereof can be seen in something as fundamental as DNS. DNS is responsible for translating the domain name to IP addresses, which is important for loading in internet resources. By default, traditional DNS queries are transmitted in the clear (not encrypted). Additionally, DNS servers can log queries. These queries can be used for troubleshooting, analysis or security purposes, but it can also potentially track users’ online activities and presence. Oftentimes, DNS resolution is done via users’ ISP (Internet Service Provider), meaning that an ISP can have insight into an user’s browsing habits and more. These are some current privacy concerns with DNS, scale that to the rest of the software stack and it becomes a game of privacy wack-a-mole.
Cloudflare released a blog ‘Have your data and hide it too: An introduction to differential privacy’ by Pierre Tholoniat. The article talks about how to better protect aggregated data, why it is important, and the science behind it. This is a complete gloss over what the article goes through, but an interesting read within the article was how security researchers discovered that it was possible to extract information from the training set via the model. This in tandem with ChatGPT using chat history for training and improving their model (this is done by default), it would be naive to think that there isn’t another security flaw within chatGPT or OpenAI with dangerous implications, just waiting to be discovered.
(If curious on how to turn off chat history and model training functionality, click here)
The takeaway here is not that privacy is impossible and to just willingly sell it without any forethought. Instead, it is to understand that there are several real ramifications to on how you handle your data. This data can be as little as a simple search engine query and your chatGPT chat history to more important data like your name, email, address and other forms of PII (Personal Identifiable Information) and more.
An argument can be made that the usage of open-source software fundamentally breaks the idea of self-reliance because it can be regarded as externally sourced software, i.e. not built from scratch. This is a rather naive argument however, an idea to consider that might have some merit:
This section comes more from the trend of companies who once had open-source licenses on large projects beginning to adopt non-Open Source licenses. There is too much context and politics to do justice on explaining the lore behind it, so simply looking at it from a ‘does it contain open-source or not’ is enough for this series.
Likewise the goal is to stay away from managed services, if an open-source alternative respects privacy and respects an open-source license throughout it’s lifecycle than why should it not be considered?
Ultimately, the best rationale that I can muster for the reason behind this series is said by 12th century theologian and author, John of Salisbury:
“We are like dwarfs sitting on the shoulders of giants. We see more, and things that are more distant, than they did, not because our sight is superior or because we are taller than they, but because they raise us up, and by their great stature add to ours.”
To wit, innovation is created by past innovation, in a cyclical manner. To not adapt is to risk being left in the past. Progress always ends up forward, doesn’t matter if it is the path most taken and straightforward or the most convoluted, erratic, and squirrelly path. Especially when the squirrelly path is necessary to hold onto certain convictions. And hey, sometimes the path of the squirrel is fun and interesting and offers lots of deep and fundamental knowledge. Ultimately, at times the name of the game is trying a self-hosted solution, so that is the game of this series.