2020/02/16

#Emacs

Exporting my Slack data to an Org mode file

Table of Contents

StoryExporting the data from SlackStarting to write itThe final logicThe main scriptOther commentsFull code

Story

In 2017, I tried to use Slack as some sort of todo list, with channels serving as categories, messages being stuff to process, and myself being the only member. It did not work out, partly because I didn't have internet on my phone; I abandoned the personal Slack workspace shortly after.

My personal notes system has evolved a lot, and is now a central Git repository with a bunch of Org files in it. However, I never actually exported the data on the Slack workspace --- some notes were still only available there.

That's why I'd like to move them to my notes repository --- this only has to be done once.

First I tried to do it manually, checking the timestamp of each message, writing the text into a new Org file, categorized per channel. This is somewhat doable, as I only about 3 months worth of intermittent messages (in 20 channels) to move. Still, "checking the timestamp" involves hovering my cursor on the message and hoping it shows me the full timestamp --- I want to keep the seconds, as throwing away archival data feels wrong. Doing this quickly became tedious, so I started trying to work on the exported data direcly instead.

Exporting the data from Slack

Workspace → Administration → Workspace settings

Import / Export Data

Export → Choose "Entire workspace history"

The data is in a zip file, structured like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
slack-export
├── a_channel
│   ├── 2017-02-14.json
│   └── 2017-03-02.json
├── another_channel
│   ├── 2017-02-14.json
│   ├── 2017-02-19.json
│   ├── 2017-02-21.json
│   └── 2017-03-02.json
├── channels.json
├── integration_logs.json
└── users.json

Messages from each channel is in its own folder. channels.json contains metadata for all channels, users.json contains all users in the workspace; I don't care about integration_logs.json, but it seems to be the installed Slack Apps.

Starting to write it

The code I wrote grew organically. Starting from:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
(defun my/file->json (file)
  "Return contents of FILE read through `json-read'."
  (save-match-data
    (with-temp-buffer
      (insert-file-contents-literally file)
      (decode-coding-region (point-min) (point-max) 'utf-8)
      (goto-char (point-min))
      (json-read))))

(with-temp-file "slack.org"
  (insert "#+COLUMNS: %ITEM %CREATED %TOPIC %PURPOSE\n\n")
  (let ((channels (append (my/file->json "channels.json") nil)))
    (seq-doseq (channel channels)
      ;; Insert channel information
      (let-alist channel
        (insert "* =" .name "=\n")))))

I can then run eval-buffer to update slack.org for me to explore.

The final logic

The main script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
(with-temp-file "slack.org"
  (insert "#+COLUMNS: %ITEM %CREATED %TOPIC %PURPOSE\n\n")
  (let ((channels (append (my/file->json "channels.json") nil)))
    (seq-doseq (channel channels)
      ;; Insert channel information
      (let-alist channel
        (insert "* =" .name "=\n")
        (my/insert-properties
         `((created . ,(my/unix-time-to-iso8601-local .created))
           ;; `let-alist' does not work with `() syntax
           ,@(unless (equal "" .topic.value) (list (cons 'topic .topic.value)))
           ,@(unless (equal "" .purpose.value) (list (cons 'purpose .purpose.value))))))
      ;; Insert events / messages
      (seq-doseq (event (my/get-channel-events channel))
        (my/insert-event event)))))

The logic is roughly:

Other comments

1
2
3
4
5
6
7
8
(defun my/insert-properties (alist)
  "Insert ALIST as an Org property drawer."
  (insert ":PROPERTIES:\n")
  (map-do
   (lambda (k v)
     (insert ":" (upcase (format "%s" k)) ":  " v "\n"))
   alist)
  (insert ":END:\n\n"))

Omitting items from an alist conditionally is easier than doing the same with a string, hence this little helper.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(defun my/insert-text (text)
  "Insert TEXT with necessary newlines added amongst other processing."
  (save-match-data
    (insert
     (with-temp-buffer
       (insert text "\n\n")
       (goto-char (point-min))
       (while (re-search-forward "<\\(.*?\\)>" nil t)
         (let ((matched (match-string 1)))
           (cond ((string-prefix-p "@" matched)
                  (replace-match
                   (format "=@%s=" (alist-get 'name (my/get-user
                                                     (substring matched 1))))
                   t t))
                 ((string-prefix-p "http" matched)
                  (replace-match (format "[[%s]]" matched) t t))
                 ((string-prefix-p "#" matched)
                  (replace-match
                   (format "=#%s=" (alist-get 'name (my/get-channel
                                                     ;; Channel IDs are 9 digits
                                                     ;; + 1 for the #
                                                     (substring matched 1 10))))
                   t t)))))
       (buffer-string)))))

Slack exports user and channel mentions as <@user ID> and <#channel ID>, so to make it more readable I extracted the names from their respective JSON files. Links are exported as <http://example.com>, which doesn't work well in Org, so I also replace that with the Org syntax.

It is easier to work with buffers in Emacs than with strings, which is why I did the processing in another temporary buffer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
(defun my/file->json (file)
  "Return contents of FILE read through `json-read'."
  (save-match-data
    (with-temp-buffer
      (insert-file-contents-literally file)
      (decode-coding-region (point-min) (point-max) 'utf-8)
      (goto-char (point-min))
      (json-read))))

(defun my/array-files->json (&rest files)
  "Like `my/file->json', except that top-level arrays are merged."
  (cl-reduce
   (lambda (json-a json-b)
     (cl-merge 'list json-a json-b
               (lambda (elem-a elem-b)
                 (< (string-to-number (alist-get 'ts elem-a))
                    (string-to-number (alist-get 'ts elem-b))))))
   (mapcar #'my/file->json files)))

json-read changes match data, so it needs to be wrapped in a save-match-data. This caused me a few minutes of pain as I tried to figure out why my (while (re-search-forward ...) (replace-match ...)) didn't work.

my/file->json is pretty straight forward, it just runs json-read on a file. my/array-files->json is less so. It is used to merge two JSON arrays together: as messages of the same channel are stored as multiple arrays in multiple files, getting all messages of a channel requires merging them. We use cl-merge to do the actual merging (the inner lambda is the comparasion function that cl-merge requires for its magic), and cl-reduce to make the two-input cl-merge work on the whole list of arrays.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(defun my/get-channel-events (channel)
  "Get events for CHANNEL.

CHANNEL can be either a string for its name, or an alist, in
which case the `name' property is used."
  (let ((name (cond ((stringp channel)
                     channel)
                    ((json-alist-p channel)
                     (alist-get 'name channel))
                    (t (error "CHANNEL must be a string or a `json-alist-p'")))))
    (apply #'my/array-files->json
           (directory-files name t "json$"))))

The use of my/array-files->json. I called them "events" here, but I later realized that all of them have the type "message".

Full code

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
;; -*- lexical-binding: t; -*-

(require 'json)
(require 'map)
(require 'seq)
(require 's)

;; Where I extracted the downloaded archive. For a script that isn't
;; written have reusability in mind, setting this is more convenient
;; than having to `defvar' and pass a path around.
;;
;; By the way, the current directory in Emacs is default-directory,
;; the current frame is (selected-frame), the current window (pane) is
;; (selected-window), and the current buffer is (current-buffer).
;; English and Emacs are both weird.
(setq default-directory "/tmp/slack-export/")

;; Used to break up long lines when trying to insert raw JSON objects
;; to see the data.
(defun my/fill-string (string)
  "Like `fill-paragraph', but on a STRING."
  (with-temp-buffer
    (insert string)
    (goto-char (point-min))
    (fill-paragraph)
    (buffer-string)))

(defun my/array-files->json (&rest files)
  "Like `my/file->json', except that top-level arrays are merged."
  (cl-reduce
   (lambda (json-a json-b)
     (cl-merge 'list json-a json-b
               (lambda (elem-a elem-b)
                 (< (string-to-number (alist-get 'ts elem-a))
                    (string-to-number (alist-get 'ts elem-b))))))
   (mapcar #'my/file->json files)))

(defun my/file->json (file)
  "Return contents of FILE read through `json-read'."
  (save-match-data
    (with-temp-buffer
      (insert-file-contents-literally file)
      (decode-coding-region (point-min) (point-max) 'utf-8)
      (goto-char (point-min))
      (json-read))))

(defun my/unix-time-to-iso8601-local (unix-timestamp)
  "Convert UNIX-TIMESTAMP into a ISO 8601 timestamp in local time.

Does not take leap seconds into account."
  (format-time-string "%FT%T%z" (seconds-to-time unix-timestamp)))

(defun my/get-user (user-id)
  "Get user JSON object from USER-ID."
  (seq-find
   (lambda (item)
     (equal (alist-get 'id item) user-id))
   (my/file->json "users.json")))

(defun my/get-channel (channel-id)
  "Get channel JSON object from CHANNEL-ID."
  (seq-find
   (lambda (item)
     (equal (alist-get 'id item) channel-id))
   (my/file->json "channels.json")))

(defun my/get-channel-events (channel)
  "Get events for CHANNEL.

CHANNEL can be either a string for its name, or an alist, in
which case the `name' property is used."
  (let ((name (cond ((stringp channel)
                     channel)
                    ((json-alist-p channel)
                     (alist-get 'name channel))
                    (t (error "CHANNEL must be a string or a `json-alist-p'")))))
    (apply #'my/array-files->json
           (directory-files name t "json$"))))

(defun my/insert-properties (alist)
  "Insert ALIST as an Org property drawer."
  (insert ":PROPERTIES:\n")
  (map-do
   (lambda (k v)
     (insert ":" (upcase (format "%s" k)) ":  " v "\n"))
   alist)
  (insert ":END:\n\n"))

(defun my/insert-text (text)
  "Insert TEXT with necessary newlines added amongst other processing."
  (save-match-data
    (insert
     (with-temp-buffer
       (insert text "\n\n")
       (goto-char (point-min))
       (while (re-search-forward "<\\(.*?\\)>" nil t)
         (let ((matched (match-string 1)))
           (cond ((string-prefix-p "@" matched)
                  (replace-match
                   (format "=@%s=" (alist-get 'name (my/get-user
                                                     (substring matched 1))))
                   t t))
                 ((string-prefix-p "http" matched)
                  (replace-match (format "[[%s]]" matched) t t))
                 ((string-prefix-p "#" matched)
                  (replace-match
                   (format "=#%s=" (alist-get 'name (my/get-channel
                                                     ;; Channel IDs are 9 digits
                                                     ;; + 1 for the #
                                                     (substring matched 1 10))))
                   t t)))))
       (buffer-string)))))

(defun my/insert-event (event)
  "Insert EVENT as Org format, handling some types."
  (let-alist event
    (insert "** " (my/unix-time-to-iso8601-local (string-to-number .ts)) "\n")
    (cond (.files
           (my/insert-properties '((type . "files")))
           (seq-doseq (file .files)
             (insert "=" (map-elt file 'name) "=\n")))
          ((equal "channel_join" .subtype)
           (my/insert-properties '((type . "event-join")))
           (my/insert-text .text))
          ((equal "channel_leave" .subtype)
           (my/insert-properties '((type . "event-leave")))
           (my/insert-text .text))
          ((equal "channel_purpose" .subtype)
           (my/insert-properties '((type . "event-set-purpose")))
           (my/insert-text .text))
          ((equal "channel_topic" .subtype)
           (my/insert-properties '((type . "event-set-topic")))
           (my/insert-text .text))
          ((equal "channel_name" .subtype)
           (my/insert-properties '((type . "event-set-name")))
           (my/insert-text .text))
          (t
           (my/insert-properties '((type . "message")))
           (my/insert-text .text)))))

(with-temp-file "slack.org"
  (insert "#+COLUMNS: %ITEM %CREATED %TOPIC %PURPOSE\n\n")
  (let ((channels (append (my/file->json "channels.json") nil)))
    (seq-doseq (channel channels)
      ;; Insert channel information
      (let-alist channel
        (insert "* =" .name "=\n")
        (my/insert-properties
         `((created . ,(my/unix-time-to-iso8601-local .created))
           ,@(unless (equal "" .topic.value) (list (cons 'topic .topic.value)))
           ,@(unless (equal "" .purpose.value) (list (cons 'purpose .purpose.value))))))
      ;; Insert events / messages
      (seq-doseq (event (my/get-channel-events channel))
        (my/insert-event event)))))

;; Local Variables:
;; mode: lisp-interaction
;; End: