The Memex Building Blocks

Rebeca Sarai · May 18, 2021 · (Updated: May 29, 2021)




I’m building myself a place to store, query, and analyze all my personal data extracted from the multitude of devices available. A series of posts on this blog details the how.

Many names describe this project, timeline, lifelogging, and memex. I’m very fond of the memex term since it doesn’t connect to any type of commercial software or development term currently used. The memex was a device, proposed a long time ago as a way to manage personal information, details can be found here. Inspired by the link before and the work of many other open-source contributors, I’m building myself one.

Let’s start from the top, this project circles around analyzing and organizing available data, including how to make different formats available and useful. Before engaging in architectural steps and design plans, the wise thing to do would be to define an initial scope.

When talking about tracking, one can always go crazy and select more than one can tackle. The multitude of categories available can be overwhelming, vide Sleep, Well-being, Fitness, Work, Media, Drink, Food, Social, Travel, Screen Time, etc. Collect and process all this data at once to make it available to the memex system right from the beginning would be extremely counter-productive. Simple is better than complex, therefore for the first version of the system I chose to contemplate four data sources:

  • Mood tracking
  • Habit tracking
  • Bash commands
  • Browser history

My mood tracking is a daily tracking with multiple options, the idea is to have a data source with very few empty days. My habit tracking system is an app that presents me with a daily binary option, the intuition here is to have something where it is possible to have a lot of days off. Bash commands and Browser history were selected for the high amount of daily items, which will allow me to test the behavior of the application as it scales. Another interesting point is that there is some similarity between the data, but not a complete overlap, this will fit for testing different schemas and no schemas at all.

mood

  • datetime
  • mood
  • activities
  • note

bash

  • datetime
  • command
  • folder
  • device

habit

  • name
  • description
  • datetime

browser history

  • datetime
  • link
  • device
  • visit count

With the data in hand, the first step should be to define a default architecture for the application. The building blocks of the Memex: Exporters, Importers, Service API.

Exporters

Exporters are the essential part of the system. They are responsible for downloading the data from the appropriate provider and storing it safely locally. The goal behind this project is to own my personal data, removing it from silos that I don’t control, so keeping it local is a functional requirement of my project. Note, that even though I foresee the use of a database to power the memex, the data is still required to be available locally to assure consistency, ownership, and to account for future changes on the database modeling due to unexpected reasons. Exporters are also responsible for the processing of the data to a friendly format to be used by the next scripts.

Importers

Have the responsibility to feed the memex with the most recent data. I will discuss implementation strategies in the next posts.

Service API

Has the responsibility to deal with all the interactions of the memex, requirements include fetching the data and querying data. On top of this API, a minor frontend will provide an easy way to interact with the data. I will discuss implementation strategies in the next post.

Exporters Details

Nothing major here, exporters rely on the presence of local files on specific folders defined by a configuration config. Folders are periodically updated by cron jobs that or copy local files to correct directory1, or download files from a specific drive folder, or just fetch new information from an API. Additionally, exporters read the appropriate files and process items accordingly. My default implementation is simple and can be found here, note that I require a config file which is not detailed inside the project but is just a python file with different classes, one for each provider.



Inspiration for exporters is based on the HPI package, been using for a while now and have nothing to complain about 👍🏾.



Footnotes

1 As is the case with bash commands which are appended to a single file and with browser history which is a sqlite file.


Did I make a mistake? Please consider sending a pull request.