Encoding You

Who we are is no longer just physical. From our clicks and scrolls to what we ignore, a new kind of identity is being formed: digital, dynamic, and deeply encoded in data. This post explores how AI is learning to capture, represent, and adapt to that identity.

Vector Representations of Human Identity

At the core of digital identity research lies the concept of embeddings: high-dimensional vector representations that capture semantic meaning. Initially popularized in natural language processing, these techniques have found powerful applications in modeling human behavior.

Personal Knowledge Graphs

Research by Microsoft’s Project Alexandria and similar initiatives have explored representing individual users as knowledge graphs, where nodes represent entities of interest and edges capture relationships between them. These graphs can be embedded into vector spaces, creating a computational representation of personal knowledge.

Researchers from Stanford’s HCI group demonstrated that these personal knowledge graphs can be incrementally constructed from digital traces and evolve to reflect changing interests and expertise, providing a dynamic representation of intellectual identity.

Behavioral Embeddings

Beyond knowledge, our behavioral patterns form a crucial component of identity. Work from CMU and Google has shown that interaction patterns—how we navigate interfaces, read content, and respond to information—can be encoded into what they term “behavioral embeddings.”

A seminal 2022 paper by Chen et al. showed that these behavioral embeddings could predict future user actions with remarkable accuracy, suggesting they capture something fundamental about individual decision-making patterns.

Self-Supervised Learning of User Representations

Traditional personalization systems relied heavily on explicit feedback—ratings, likes, clicks—to learn user preferences. Recent research has shifted toward self-supervised approaches that require minimal explicit input.

Contrastive Learning for User Modeling

Research from Amazon’s personalization team demonstrated the power of contrastive learning for building user representations. Their system learns by contrasting observed user behaviors against alternative behaviors, creating embeddings that capture subtle preferences without requiring ratings.

The “You Are What You Click” paper (Zhang et al., 2023) showed that these contrastive embeddings outperformed traditional collaborative filtering approaches by a significant margin, especially for users with limited explicit feedback.

Temporal Dynamics in Identity Representation

Our identities aren’t static—they evolve over time. Recent work from MIT’s Media Lab explored temporal dynamics in user embeddings, creating models that gracefully adapt to changing preferences while maintaining continuity.

Their “Memory-Augmented User Representations” incorporated both short-term and long-term memory components, allowing systems to distinguish between transient interests and fundamental aspects of identity.

On-Device Personalization

Privacy concerns have driven research toward on-device personalization, where identity models reside entirely on users’ personal devices rather than in central servers.

Federated Personalization

Google’s research on federated learning has shown promising results in building personalized models without transmitting raw user data. Their system updates a local model on the user’s device, sharing only model improvements rather than behavioral data.

Apple’s “Private Federated Learning” took this further by adding differential privacy guarantees, ensuring that individual behaviors cannot be reverse-engineered from model updates.

Self-Rewarding Learning Systems

Perhaps most relevant to our topic, recent work has explored self-rewarding learning systems that can improve without explicit feedback. Microsoft Research’s “Intrinsic Reward Functions for Online Learning” demonstrated a personalization system that generates its own reward signals based on prediction accuracy.

Similarly, DeepMind’s work on “Unsupervised Predictive Memory” showed how systems can learn to predict user behavior sequences, using prediction accuracy as an intrinsic reward signal. This removes the need for ratings or explicit feedback loops.

Ethical Considerations and Research Challenges

The ability to encode human identity into vector spaces raises important ethical questions. Research from the Oxford Internet Institute has highlighted concerns about algorithmic reductionism—the risk of reducing the complexity of human identity to limited computational representations.

Similarly, work from Princeton’s Center for Information Technology Policy has explored the potential for these systems to reinforce existing patterns rather than supporting personal growth and exploration.

Recent research has begun addressing these concerns through:

Explainable identity models that allow users to understand and modify their representations
Multi-faceted embeddings that capture different aspects of identity rather than collapsing everything into a single vector
User-controllable personalization where identity models remain fully transparent and editable

Future Research Directions

The most promising direction in this field involves what researchers term “co-evolved identity models”—systems that not only adapt to user behavior but actively participate in shaping digital identity through recommendation and interaction.

Work from UC Berkeley’s Center for Human-Compatible AI explores how recommendation systems and identity models co-evolve, potentially creating feedback loops that either diversify or narrow user experiences.

Research from the Alan Turing Institute suggests that identity should be viewed not as a fixed representation but as an ongoing dialogue between user and system—a perspective that aligns well with contemporary philosophical views on identity formation.

The encoding of human identity into computational systems represents one of the most profound intersections of technology and humanity. As research continues to advance in this domain, we move closer to systems that can truly understand and adapt to the complexities of human behavior.

The future likely holds more sophisticated, multi-modal representations that capture not just what we do online, but how we do it: our rhythms, patterns, and unique digital body language. These advances promise personalized experiences that feel less like interactions with algorithms and more like extensions of ourselves.

The most successful approaches will likely balance powerful representation learning with user agency and transparency, ensuring that these digital reflections of ourselves remain under our control while still providing the benefits of personalization.

Vector Representations of Human Identity

Personal Knowledge Graphs

Behavioral Embeddings

Self-Supervised Learning of User Representations

Contrastive Learning for User Modeling

Temporal Dynamics in Identity Representation

On-Device Personalization

Federated Personalization

Self-Rewarding Learning Systems

Ethical Considerations and Research Challenges

Future Research Directions

Comments