Google Drive Sync

Phase 4 Push webhooks + polling fallback drive-sync service

Overview

xbrain continuously synchronizes Google Drive folders to your team's memory. When a file is added or updated in a mapped folder, it's automatically processed and indexed in xbrain with full tagging — team_scope, project_scope, truth_level=WORKING.

The drive-sync service handles the sync loop. It runs in its own container and communicates with memory-api to upsert content. Each mapped folder gets a unique mapping_id (UUID) that ties the Drive credentials to the correct team and project scope.

How It Works

Mode 1 — Push Webhooks (Primary)

When a Drive file changes, Google sends a push notification to xbrain within seconds. This is the default for mapped folders and requires no polling overhead.

Drive file updated
       │
       ▼
Google → POST /v1/drive/webhook (xbrain)
       │
       ▼
drive-sync processes the changed file
       │
       ▼
memory-api upsert (truth_level=WORKING)

Mode 2 — Polling Fallback (5-minute intervals)

If webhooks are unavailable (e.g. the VM is temporarily unreachable by Google), drive-sync falls back to polling Drive every 5 minutes for changes. Only modified files are re-processed — the sync is incremental and uses Drive's pageToken mechanism to track position.

Setting Up Drive Sync

Step 1 — OAuth authorization (admin only)

An admin must authorize xbrain to access the team's Google Drive. This redirects to Google OAuth and stores the credentials encrypted in PostgreSQL using Fernet encryption.

bash# Admin initiates OAuth flow
GET https://api.grooveos.app/v1/admin/drive/auth
# → Redirects to Google OAuth consent screen
# → After consent, credentials stored with Fernet encryption in PostgreSQL

Step 2 — Map a folder

Once authorized, map a specific Drive folder to a team and project scope. The folder_id is the last segment of the Drive URL for that folder.

bashcurl -X POST https://api.grooveos.app/v1/admin/drive/mappings \
  -H "Authorization: Bearer $ADMIN_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "team_scope": "excalibur",
    "project_scope": "fundraising",
    "folder_id": "1ABC...xyz",
    "folder_name": "Fundraising Docs"
  }'
# Returns: {"mapping_id": "uuid-...", "status": "active"}

Step 3 — Webhook auto-registered

After mapping, drive-sync automatically registers a Drive push notification channel pointing to POST /v1/drive/webhook. Webhook channels expire after 7 days by default — drive-sync renews them automatically before expiry using a background scheduler.

Info

The webhook channel renewal is handled automatically by the drive-sync scheduler. No manual intervention is required after initial mapping.

Multi-Folder Mapping

Multiple Drive folders can be mapped to the same team, each with a distinct project_scope. This lets you scope Drive content to the relevant project without creating separate team accounts.

bash# Map fundraising folder
POST /v1/admin/drive/mappings
{"team_scope": "excalibur", "project_scope": "fundraising", "folder_id": "1ABC..."}

# Map engineering folder (same team, different project scope)
POST /v1/admin/drive/mappings
{"team_scope": "excalibur", "project_scope": "engineering", "folder_id": "2DEF..."}

Files in the fundraising folder get project_scope='fundraising', and engineering files get project_scope='engineering'. They are indexed and searchable independently within xbrain memory.

List Current Mappings

bashcurl https://api.grooveos.app/v1/admin/drive/mappings \
  -H "Authorization: Bearer $ADMIN_JWT"
# Returns a list of all active folder mappings for the team

Sync Status

Check what has been synced by querying memory-api with the drive-sync source filter:

bashcurl "https://api.grooveos.app/v1/memory/search?q=drive+sync&source=drive-sync" \
  -H "Authorization: Bearer $JWT" \
  -H "X-Team-Scope: excalibur"

Security

Drive credentials are stored encrypted using Fernet encryption. Never commit OAuth tokens, service account keys, or the FERNET_KEY value to the repository. These must be injected via environment variables or a secrets manager.

Supported File Types

drive-sync extracts text from supported formats and passes it to memory-api. Unsupported formats are logged and skipped — they do not cause sync failures.

File Type Processing Notes
Google Docs Text extraction Exported as plain text via Drive export API
Google Sheets Text extraction First sheet only; cell values joined as text
PDF Text extraction Via PyMuPDF; scanned PDFs without OCR may be empty
Markdown (.md) Direct No conversion needed; indexed as-is
Text (.txt) Direct No conversion needed; indexed as-is
Images (.png, .jpg, etc.) Skipped Not yet supported — logged but not indexed

Architecture Notes

The OAuth state parameter used during the Drive authorization flow is set to the mapping_id UUID — not the team_scope. This ensures the callback correctly resolves which mapping to associate credentials with, even when multiple mappings are in progress simultaneously.

The push webhook endpoint — POST /v1/drive/webhook — is public (no auth header required) because Google Drive sends notifications without Bearer tokens. Request authenticity is verified using the channel ID and resource ID returned during webhook registration.