From b0b14187c193503eac47af54422470992feea199 Mon Sep 17 00:00:00 2001 From: Sebastian Kuzminsky Date: Tue, 22 Jul 2025 12:45:29 -0600 Subject: [PATCH] edit description of issue 781360ade670846ed0ccdbfd19ffa8fd --- 781360ade670846ed0ccdbfd19ffa8fd/description | 117 +++++++++++++++++-- 1 file changed, 108 insertions(+), 9 deletions(-) diff --git a/781360ade670846ed0ccdbfd19ffa8fd/description b/781360ade670846ed0ccdbfd19ffa8fd/description index d74158f..30b9ed7 100644 --- a/781360ade670846ed0ccdbfd19ffa8fd/description +++ b/781360ade670846ed0ccdbfd19ffa8fd/description @@ -1,15 +1,114 @@ -add a layer to the architecture with a DB api between the FS and ent? +add a database layer to the architecture between the FS and ent? -This issue grew out of the discussion in issue -08f0d7ee7842c439382816d21ec1dea2. +This issue is to consider adding a database abstraction between the +entomologist library and the git-backed filesystem. -Currently the entomologist crate code directly reads and writes files -and calls `git commit` when it wants. The directory is managed by the -application, generally as an ephemeral worktree containing a (possibly -detached) checkout of the `entomologist-data` branch. +Currently the entomologist library code directly reads and writes files +and directories, and calls `git commit` when it wants. The directory +is generally an ephemeral worktree containing a (possibly detached) +checkout of the `entomologist-data` branch. We may be able to simplify the ent internals (and the application) by adding a "database" API between ent and the filesystem. -There is some preliminary design brainstorming in the issue mentioned -above. +(This issue grew out of the discussion in issue +08f0d7ee7842c439382816d21ec1dea2. I've distilled the discussion in that +issue here.) + +for the filesystem DB, i think it might make sense to have a hashmap that stores everything as a key-value pair, where each key is a file, and each value is the contents of that file. + +once we go up the stack, i think it makes sense to have things in concrete structs, since that's easier to reason about. + +this also frees up the filesystem DB to get used for other things potentially, not just issues. + +So we'd have a system like this: +``` +git-backed filesystem +^ +| +v +db layer: key/value pair +^ +| +v +entomologist library layer: concrete structs (like `Issue` etc) +^ +| +v +presentation layer: (CLI / TUI / etc.) +``` + + +# Entomologist library API (presented up to the applicationA) + +Very similar to current entomologist library API. + +* Open the database at this DatabaseSource (filesystem path or git branch, + read-only or read-write, returns an opaque "entdb"(?) handle) + +* List issues in entdb + +* Add/edit issue + +* Get/set issue state/tags/assignee/done-time/etc + +* Add/edit comment on issue + + +# Database API (presented up to entomologist library) + +* Open the database at this DB Source (filesystem path or git branch, + read-only or read-write, returns an opaque "db" handle) + +* Read a db object into a key/value store. + + - Keys are filenames. Values are the file contents of that file, + or a database if the filename refers to a directory. + + - The read is by default *not* recursive for performance reasons; + the application may choose to read a "sub-database" referred to + by a key in the "parent database" if it wants, when it wants. + + - The application receives a k/v store and is responsible for + unpacking/interpreting/parsing that into some app-level struct + that is meaningful to the application. + +* Write a key-value store to a db. + + - Commits by default (the application supplies the commit message), + though maybe we want a way to stage multiple changes and commit + at the end? + + - The application transcodes its internal struct into a generic k/v + store for the db library. + +On write operations, the git commit message should be meaningful to +the application. Maybe that can be done generically by the db library, +or maybe the application needs to supply the commit message. + + +# Design + +A filesystem stores two kinds of things: directories and files. +A directory contains files, and other directories. + +Git stores two kinds of things: trees and blobs. Trees contain blobs, +and other trees. + +This DB tracks two kinds of things: databases and key/value objects. +Databases store key/value objects, and other databases. + +Some things we'd want from this DB layer: + +* Filesystem objects correspond to structs, like how we have each struct + Issue in its own issue directory. + +* Structs are nested, like how struct Issue contains struct Comment + +* Some fields are simple types (`author` is String), some are + less simple (`timestamp` is chrono::DateTime), some are custom + (`state` is enum State), and some are complicated (`dependencies` + is Option>, `comments` is Vec) + +* Filesystem objects are optimized for getting tracked by git - minimize + merge conflicts.