NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...1 min read

A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

Share
NOW LET US Article – A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

Researchers have proposed a novel three-phase deep reinforcement learning system that addresses key limitations in financial AI. The model enables tax-aware, highly personalized portfolio management by leveraging time-series foundation models and adapting to real-world user trading behaviors.

Computer Science > Artificial Intelligence

Title:A Three-Phase Foundation Model for Tax-Aware Personalized Portfolio Management

View PDF HTML (experimental)Abstract:We present a three-phase deep reinforcement learning system for personalized portfolio management that addresses three limitations shared by all prior financial RL work: 1) ticker lock-in, 2) monolithic objectives , and 3) static user models. Phase 1 pretrains a ticker-identity-free cross asset encoder via self-supervised learning on a multi-asset corpus, augmented by a frozen parallel branch using Chronos, a T5-based time series foundation model, fused via a learned gating mechanism. To our knowledge, this is the first application of a time series foundation model to portfolio management RL. The encoder generalizes to any publicly traded asset via a 50-dimensional observable metadata vector that requires no retraining for new tickers. Phase 2 fine-tunes a MoE (Mixture of Experts) portfolio actor critic with PPO under an objective-conditioned reward that simultaneously serves six distinct investment goals sampled per episode: short-term alpha, short-term gain, long-term gain, capital preservation, tax-loss harvesting, and long-term-gains-only. A MoE architecture assigns each objective to a specialized expert head (momentum, growth, defensive, tax-aware), and a learned intent router blends experts based on the active objective and current market regime, which eliminates cross-objective gradient conflict. Phase 3 adds a lightweight personalization layer further adapted at inference time to each individual via a 76-parameter LoRA module fine-tuned on real brokerage transaction history, inferring investment objectives from revealed trading behavior rather than questionnaires. A natural language intent parser converts free-form goals directly into structured investment objective parameters.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

© 2026 Now Let Us. All rights reserved.

Source: arXiv cs.AI Recent

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Investigating Multi-Agent Deliberation in Law

agentic-systems

Investigating Multi-Agent Deliberation in Law

A new study investigates the potential of multi-agent AI systems in the legal domain. By simulating courtroom procedures and legal argumentation, this approach opens up new ways to solve complex cases requiring multi-perspective critical thinking.

NOW LET US Related – HyPOLE: Hyperproperty-Guided Multi-Agent Reinforcement Learning under Partial Observation

agentic-systems

HyPOLE: Hyperproperty-Guided Multi-Agent Reinforcement Learning under Partial Observation

Researchers introduce HyPOLE, a novel framework that guides Multi-Agent Reinforcement Learning (MARL) under partial observability using formal specifications and HyperLTL temporal logic, outperforming traditional baselines.

NOW LET US Related – What Drives Interactive Improvement from Feedback?

agentic-systems

What Drives Interactive Improvement from Feedback?

A new study reveals that multi-turn improvements in LLMs are often driven by repeated attempts rather than feedback quality, highlighting that the student model's ability to act on feedback is the primary bottleneck.

NOW LET US Related – MultiUAV-Plat: An LLM-Oriented Platform, Benchmark and Framework for Multi-UAV Collaborative Task Planning

agentic-systems

MultiUAV-Plat: An LLM-Oriented Platform, Benchmark and Framework for Multi-UAV Collaborative Task Planning

Researchers have introduced MultiUAV-Plat, a breakthrough simulation and benchmarking platform for LLM-based multi-UAV collaborative task planning, alongside the Agent4Drone framework which significantly improves task success rates.

NOW LET US Related – AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance

agentic-systems

AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance

Researchers have introduced AgRefactor, an LLM-based multi-agent workflow that automates the refactoring of software code into HLS-compatible programs. Featuring a self-evolving memory system and tool integration, AgRefactor outperforms existing solutions and paves the way for automated chip design.

NOW LET US Related – Contrastive Reflection for Iterative Prompt Optimization

agentic-systems

Contrastive Reflection for Iterative Prompt Optimization

Researchers have introduced Contrastive Reflection, an iterative prompt-optimization framework for agentic information retrieval workflows. By comparing failed and successful execution traces, the method improves exact-match accuracy on HotpotQA from 51.4% to 60.4%.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.