Our goal: ship an ambient Mem presence on the desktop — always-visible, always-one-gesture-away — that makes Mem feel like an assistant looking over our shoulder rather than an app we have to open.
How to read this: top sections are the briefing; lower sections add depth. A working build doesn't need to touch every section — pick the milestones and go.
TL;DR
- A tomato-colored Mem bubble floats bottom-right on the desktop, always on top, always reachable.
- Push-to-talk from anywhere into the bubble sends voice-captured thoughts straight into Mem.
- Clicking the bubble expands it into a persistent chat thread (think: Facebook floating heads, but for Mem).
- Mem can orchestrate the main app — "open my Europe trip plan" brings the main Mem app forward with that note.
- Auto-expand when the response warrants being seen (a looked-up piece of info, not just "got it, saved").
- Mem peeks back from the bubble when it has something to surface — proactive, non-intrusive.
Why this matters
Today Mem is a place you go. Floating Mem makes Mem a place you are in — ambient, low-friction, persistent. It's the clearest visible product surface in our current concept set and the best demo vehicle for "Your AI Thought Partner is always with you." It also becomes the input channel for everything else we're building — voice conversations launch from it, content opens into it, recall nudges surface through it.
Floating Mem is inspired by existing primitives, not literally built on them. What we've learned:
- People love voice capture — we saw it in Voice Mode.
- People love proactive recall — we saw it in Heads Up.
- People love a conversational agent over their Mem — we saw it in Mem Chat.
Floating Mem rebundles those signals into a new surface that lives on the desktop with the user, always. We'll likely build new plumbing under the hood for this; we're not surfacing the existing features through a bubble.
Core workflows (the hero paths)
1. Push-to-talk capture
You're in the middle of writing an email. An idea hits you. You hold a global hotkey, speak for eight seconds, release. A small animation on the bubble confirms it's in Mem. You never left your email.
This is the primary workflow. Voice in → Mem processes → bubble confirms. Zero app switching.
2. Chat thread
You click the bubble. It expands into a compact chat panel anchored to the bottom-right — one ongoing conversation with Mem. You see your recent voice dumps, Mem's responses, a message Mem sent you this morning. You type a follow-up, it answers in place. You close the panel; the bubble remains.
One persistent thread. Not per-note, not per-session. The floating-heads metaphor.
3. Orchestrate the main app
You push-to-talk: "Open my Europe trip plan." Mem interprets this as an intent to pull up a note/view, invokes a tool call, and the main Mem app comes forward with that content loaded. Same gesture, different outcome.
Floating Mem is a voice-driven launcher for your own Mem content. Capture and recall use the same gesture.
4. Intelligent auto-expand
You're on a flight booking page. You push-to-talk: "What's my frequent flyer number?" Mem looks it up. Because the answer is informational (not just an ack), Mem auto-expands the chat panel so the number is visible. No app switch. No hunt.
The bubble decides whether to stay collapsed ("got it, saved") or auto-expand (answer the user needs to see). This is an LLM judgment, not a rule — the agent calls an expand_history tool when its response is worth showing. This is the difference between annoying and magical.
5. Peek (Mem → you)
You're working. A reminder surfaces: Mem peeks a small message above the bubble: "Sarah just asked a question that looks like it needs the SOC 2 report — want me to help?" The peek persists until you click it. Clicking it opens the chat thread with full context.
Peeks are Mem's proactive channel. They persist (unlike OS notifications) until acknowledged. Peeks are the visible manifestation of Recall.
Milestones
Milestones are sequenced so each one can be demoed on its own and each one layers value onto the last. Pick a milestone to land and aim for it.
M1 — Bubble + push-to-talk + chat thread
- Persistent bottom-right bubble on macOS. Always on top. Draggable. Tomato, Mem arrows.
- Hold a global hotkey anywhere → voice captured → transcribed → posted into the chat thread AND saved into Mem via the Mem Agent pipeline.
- Clicking the bubble expands to the chat thread. Thread persists across sessions.
This is the fundamentals. Everything else layers on.
M2 — Auto-expand
- Mem Agent gets a new tool:
expand_history(). - When the agent's response is user-facing information (a looked-up number, a retrieved fact, a suggested next step), it calls the tool and the chat panel opens automatically so the user sees the answer.
- Judgment is LLM-driven, not rule-based — we are agent-building experts; we should trust the model.
M3 — Orchestrate the main app
- Mem Agent gets a new tool:
open_in_mem(target)— brings the main Mem app to the foreground with the requested note / view loaded. - "Open my Europe trip plan" → main app jumps forward. "Show me my last notes on Acme Co" → same.
- This is where Floating Mem becomes a voice-driven launcher for your own Mem.
M4 — Heads Up extension (scoped)
- Floating Mem becomes aware of limited screen context (active app, maybe active URL or page title).
- When something in the user's Mem is relevant to the current context, it peeks or floats the relevant asset.
- Scope tight: pick one or two concrete trigger types and ship them end-to-end rather than building a generalized attention system.
M5 — Peek (Mem → user via Mem Agent)
- Mem Agent gets a new tool:
send_floating_message(content)— the agent can send a message that surfaces as a peek on the user's bubble (same path Mem Agent uses to send to Slack today, but routed to Floating Mem). - Peek UI: small bubble above the icon, message content visible, persists until clicked.
- This unlocks Recall as a first-class channel.
Stretch (pick if we're ahead)
- Screenshot ingestion — system screenshot → routed to Floating Mem → optional voice context → saves to Mem. Possibly with multi-shot + annotate flow. Strong demo if we can land it.
- Multi-monitor handling — which screen, follow cursor.
- Dodge-active-window — bubble gracefully moves out of the way of active work.
- Positioning persistence — remember where the user last put it.
Example vignettes
Use these to pressure-test whether a change feels right. If a demo doesn't naturally cover several, scope is probably off.
Capture
- The post-meeting download. You end a video call. Before the next block starts, you press and hold: "Action items from that call — send Sarah the SOC 2 report, update the pricing slide, and follow up with Ben on integration scoping." The bubble pulses. The items land in Mem. Bonus magic: Mem notices you had a meeting note open, and the agent updates that note directly with the action items rather than creating a new one.
- The mid-work thought. You're writing a doc. An idea hits. Hold, speak for five seconds, release. Bubble pulses. Keep writing.
- The remind-me. You're in email. "Remind me to follow up with Acme Co tomorrow morning." Bubble pulses. You get a peek tomorrow morning at 9am.
- The quick save. You're reading a blog post. "Save: we should do something like this for our onboarding flow." Bubble pulses. Idea filed.
Recall
- The looked-up number. On a flight-booking page: "What's my frequent flyer number?" Panel auto-expands with the number, formatted for copy-paste.
- The pulled-up plan. "Open my Europe trip plan." Main Mem app snaps to front with the plan loaded.
- The what-did-I-say. Drafting an email: "What was the pricing I quoted Acme Co last month?" Panel auto-expands with the number and a reference to the source note.
- The summary. "Summarize everything I've captured about onboarding this week." Panel expands with a short rollup.
Mem → you
- The morning peek. You open your laptop at 8am. Within a minute, a peek: "Your invoice to Acme Co is 3 days overdue."
- The contextual prompt. You're drafting a reply to Sarah. A peek appears: "She asked about SOC 2 last week — want me to pull the report into this draft?"
- The heads-up in a booking flow. You open a travel site. Your airline number floats softly in the corner, clickable to copy.
Decisions we've already made (so we don't relitigate them)
- Shell: Electron. We already use it for our desktop app; no reason to reinvent.
- Mac only. Windows is out of scope.
- Push-to-talk gesture: global hotkey (e.g., hold ⌥+Space). Not click-and-hold.
- Auto-expand: LLM-judged via a tool call. Not rule-based.
- Thread persistence: one forever-thread. Not per-topic, not per-session.
- Integration layer: Mem Agent. We're not reusing Voice Mode's pipeline — Floating Mem goes through the agent, same way Mem Agent runs today in Slack. The agent is the brain; Floating Mem is a new surface it can receive from and send to.
What "done" looks like for the week
Minimum demo (M1):
- Bubble visible on desktop, always on top.
- Push-to-talk works from anywhere; content flows through Mem Agent and lands in the user's Mem.
- Clicking the bubble opens the chat thread with history.
Strong demo (M1–M3):
- All of the above, plus responses auto-expand when warranted, and voice requests can open specific content in the main Mem app.
Stronger demo (M1–M5):
- The above, plus Heads Up working end-to-end for at least one trigger, and Mem Agent can send peeks via Floating Mem.
Wow demo:
- The above, plus screenshot ingestion working smoothly as an additional capture path.
Things Floating Mem is not (for this hack week)
- Not a full-blown voice conversation surface. Push-to-talk is one-shot — speak, release, Mem answers. Sustained real-time voice calls are the Huddle team's territory. Floating Mem might one day be the launching point for a huddle, but Floating Mem itself is not the huddle.
- Not a replacement for the main Mem app. The bubble is the on-ramp and ambient channel; deep browsing, editing, and organizing still happen in the main app.
- Not an always-listening assistant (this week). Nothing is captured until the user holds the hotkey. But: there's a related idea we've been calling the buffer — the notion that ambient context about what's been happening on your screen (visually, maybe audio-wise) could be useful as context input to Floating Mem. We're not building that this week, but it's directionally relevant. If anyone wants to experiment with minimal screen-context awareness as an M4 Heads Up trigger, that's a reasonable path.
- Not where we'll build out everything Heads Up does today — but directionally Floating Mem is plausibly the form factor for Heads Up going forward. For this week we're not trying to port all current Heads Up capabilities over; we're showing what Heads Up could become when it lives in an ambient, agent-driven surface.