1. Introduction
Data has become a central strategic asset in modern e-commerce systems. In addition to structured information such as customer profiles, products, orders, payments, and inventory, e-commerce platforms must process a considerable amount of semi-structured and unstructured data. Such data includes product images, customer reviews, browsing histories, clickstream events, system logs, social media interactions, delivery information, and customer service conversations. A seemingly simple online purchase may involve a sequence of complex operations, including user authentication, product availability verification, price and promotion calculation, inventory reservation, payment authorization, order creation, loyalty point calculation, and delivery scheduling. These operations often need to be completed within seconds while maintaining acceptable levels of consistency and reliability. During major promotional campaigns, the number of concurrent users and transactions may increase dramatically, creating significant pressure on database infrastructure.
Traditional relational database management systems remain essential for workloads that require Atomicity, Consistency, Isolation, and Durability, commonly referred to as ACID properties. Typical examples include payments, order processing, accounting, and inventory management. However, using a single relational database for all data types and workloads may create limitations in scalability, flexibility, and system performance. Modern e-commerce architectures therefore increasingly adopt polyglot persistence, in which different database technologies are selected for different business and technical requirements [1, 2].
This development does not imply that relational databases are being completely replaced by NoSQL technologies. Instead, contemporary database architectures combine relational, non-relational, analytical, streaming, and vector-based data management technologies. The objective is to assign each workload to the most appropriate data platform while maintaining consistent governance and integration.
This paper has two main objectives. First, it analyzes major modern database trends and their relevance to e-commerce. Second, it proposes an integrated database architecture that combines transactional consistency, distributed processing, real-time analytics, and AI-enabled functions.
2. Data requirements of modern E-commerce systems
The database workloads of an e-commerce platform can generally be divided into four major groups.
The first group is online transactional processing, which includes order creation, payment processing, refund management, invoice generation, inventory updates, and customer account management. These operations require strong integrity constraints, concurrency control, transaction isolation, and reliable recovery mechanisms.
The second group is low-latency operational data access. Examples include retrieving product information, updating shopping carts, maintaining user sessions, accessing promotion rules, and displaying personalized content. In many e-commerce applications, read operations significantly outnumber write operations, especially during high-traffic events.
The third group is analytical data processing. E-commerce companies need to analyze sales performance, customer behavior, product demand, marketing effectiveness, conversion rates, customer churn, fraud patterns, and supply chain efficiency. These applications normally require the integration of historical and real-time data from multiple operational systems.
The fourth group consists of artificial intelligence applications, such as recommendation systems, semantic product search, conversational shopping assistants, demand forecasting, sentiment analysis, automated content generation, and fraud detection. These systems require more than traditional key-based or condition-based queries. They may also need similarity search, graph analysis, feature storage, and access to unstructured data.
Because these workloads differ considerably in their data models, consistency requirements, access patterns, throughput, and latency expectations, no single database technology is optimal for all e-commerce functions [1; 3, p. 10-11]. Database selection should therefore consider workload characteristics, query patterns, scalability requirements, consistency levels, operational complexity, cost, and security.
3. Modern database trends
3.1. Cloud-native and Serverless databases
Cloud-native databases are designed to take advantage of the elasticity, automation, and geographical distribution offered by cloud computing. Under the Database as a Service model, cloud providers handle many operational tasks, including database provisioning, patching, backup, monitoring, replication, and failover.
This approach allows e-commerce companies to focus more on application development and less on infrastructure administration. It is particularly beneficial for small and medium-sized enterprises that may not have a large database administration team.
Serverless databases extend this model by automatically allocating and releasing computing resources according to actual workload demand. Instead of maintaining fixed infrastructure capacity, an organization may pay according to the number of queries, transactions, or computing resources consumed.
This elasticity is appropriate for e-commerce because workloads often vary significantly according to time, season, marketing campaigns, and special sales events. However, serverless databases may introduce challenges related to connection limits, cold-start latency, unpredictable costs, and dependence on cloud service providers. Therefore, serverless adoption should be based on workload measurements rather than solely on architectural trends.
3.2. Distributed SQL
Distributed SQL combines the relational data model, SQL query language, and ACID transactions with horizontal scalability across multiple computing nodes. It is designed to provide the advantages of relational databases while addressing some of their traditional scalability limitations. Distributed SQL is particularly relevant to e-commerce applications that require both high transaction volumes and strong consistency, such as order processing, inventory management, and payment-related services. Unlike manually partitioned relational databases, many Distributed SQL platforms provide automated sharding, replication, load balancing, and failure recovery.
However, distributed transactions may increase communication overhead and transaction latency because coordination is required among multiple nodes. Appropriate partitioning strategies are therefore essential. Data may be partitioned according to geographic region, customer identifier, merchant identifier, or business domain to reduce cross-partition transactions. Distributed SQL should not be considered a universal solution. For small systems, a conventional managed relational database may remain simpler and more cost-effective. Distributed SQL becomes more advantageous when transaction volume, geographical distribution, or availability requirements exceed the capacity of a single database instance.
3.3. NoSQL and specialized data stores
NoSQL databases remain important for applications that require flexible schemas, high write throughput, horizontal scalability, or specialized query capabilities [2; 3, p. 10-11]. Document databases are suitable for product catalogs because different product categories may have different attributes. For example, mobile phones may include storage capacity and screen size, whereas clothing products may include material, color, and size. Representing these variations in a rigid relational schema may require numerous optional columns or complex table structures. Key-value databases are commonly used for shopping carts, session management, distributed caching, access tokens, and temporary personalization data. Their simple access model enables high performance and low latency. Wide-column databases may be used to store large volumes of clickstream data, browsing histories, or system events. Graph databases are appropriate for analyzing complex relationships among customers, products, transactions, devices, addresses, and payment methods. Such relationship analysis can support recommendation systems and fraud detection.
The use of different databases for different workloads is referred to as polyglot persistence [2]. Although this approach can improve performance and scalability, it also increases operational complexity. Each additional database technology introduces new requirements for administration, monitoring, backup, access control, and staff expertise. Clear service boundaries and data ownership rules are therefore necessary.
3.4. Multi-Model Databases
Multi-model databases support more than one data representation or query model within a single database platform. Depending on the system, these models may include relational tables, JSON documents, graphs, spatial data, time-series data, and vectors.
Multi-model databases can reduce infrastructure complexity because organizations do not need to deploy a separate database for every data representation. This approach may be particularly suitable for small and medium-sized e-commerce companies.
Nevertheless, the presence of multiple features does not necessarily mean that every feature has the same performance or maturity as a specialized database. For example, a relational database with basic graph or vector support may not provide the same functionality as a dedicated graph or vector database under large-scale workloads. Database selection should therefore be supported by practical benchmarking. Organizations should evaluate representative queries, dataset sizes, transaction patterns, indexing requirements, failure behavior, and operational costs.
3.5. Real-Time stream processing and event-driven architecture
Traditional batch processing is insufficient for many modern e-commerce requirements. Inventory synchronization, dynamic pricing, fraud detection, customer behavior analysis, and personalized recommendation may require data to be processed within seconds or milliseconds.
An event-driven architecture allows services to publish business events such as “Order Created,” “Payment Confirmed,” “Inventory Reserved,” or “Product Viewed.” Other services may subscribe to these events and process them independently. This design reduces direct dependencies between services and improves scalability.
Change Data Capture, or CDC, detects database changes from transaction logs and transfers them to downstream systems. CDC enables operational data to be synchronized with search engines, analytical platforms, caches, and machine learning systems without continuously querying the source database.
Event-driven systems commonly adopt eventual consistency, according to which replicas or downstream systems may temporarily contain different versions of data but converge after a period of time [4, p. 40-44]. Eventual consistency is acceptable for product search indexes, analytical dashboards, and recommendations. However, it may not be acceptable for payment authorization or inventory reservation.
To ensure reliability, event-driven architectures should implement patterns such as transactional outbox, idempotent consumers, retry mechanisms, dead-letter queues, and event versioning. The Saga pattern may be used to coordinate distributed business processes through a sequence of local transactions and compensating actions.
The CAP principle also remains relevant when designing distributed databases. During a network partition, a distributed system must make trade-offs between strong consistency and availability [5, p. 23-29]. Therefore, consistency decisions should be made according to business risk rather than applied uniformly across all services.
3.6. Data Lakehouse architecture
Traditional data warehouses provide structured schemas, optimized queries, and business intelligence capabilities. Data lakes provide low-cost storage for raw, semi-structured, and unstructured data. However, conventional data lakes may suffer from weak governance, inconsistent schemas, and poor data quality.
A data lakehouse combines the flexibility and scalability of a data lake with transactional management, schema enforcement, metadata management, and query optimization. This architecture can store structured transaction data, clickstream events, product images, customer reviews, logistics information, and customer service data in a unified analytical environment.
In e-commerce, a data lakehouse may support sales analysis, customer segmentation, demand forecasting, marketing attribution, recommendation model training, and supply chain optimization. Stream processing systems can continuously transfer operational events into the lakehouse, allowing analytical systems to access near-real-time data [6, 7].
However, a lakehouse should not directly replace operational databases. Transaction processing systems require low latency, strict integrity constraints, and predictable concurrency control. Analytical platforms are optimized for large scans, aggregations, and historical processing. Clear separation between operational and analytical workloads remains important.
3.7. Vector databases and AI-Powered commerce
Vector databases are one of the most significant recent developments in data management. They store high-dimensional numerical representations, known as embeddings, generated from text, images, audio, products, or user behavior.
Traditional search systems mainly rely on keyword matching. Vector search identifies items according to semantic or contextual similarity. For example, a customer may search for “a lightweight jacket suitable for rainy but warm weather.” A vector-based search system can retrieve products with relevant functional characteristics even when the exact words in the query do not appear in the product description.
Vector databases can support several e-commerce functions:
- Semantic product search;
- Image-based product search;
- Similar-product recommendation;
- Personalized recommendation;
- Customer service knowledge retrieval;
- Retrieval-Augmented Generation for conversational shopping assistants.
However, vector similarity should not be the only mechanism used to rank or select products. Product price, availability, delivery region, merchant policy, access permissions, and business rules must still be verified through structured operational data.
Consequently, many practical e-commerce systems adopt hybrid search. Hybrid search combines vector similarity, keyword matching, metadata filtering, and business ranking rules. A vector database should therefore be regarded as a complementary component rather than the authoritative source for orders, inventory, or payments.
3.8. Artificial Intelligence-assisted database administration
Artificial intelligence and machine learning are increasingly incorporated into database administration. Modern systems may automatically recommend indexes, detect inefficient queries, identify workload anomalies, predict resource consumption, and suggest configuration adjustments.
These capabilities can reduce administrative workload and improve resource utilization. Nevertheless, fully autonomous database administration may also create operational risks. An automatically created index may improve one query while increasing storage or write overhead. A configuration change may have unintended effects on another service.
AI-assisted administration should therefore include monitoring, explainability, approval mechanisms, and rollback procedures. Human database administrators and system architects remain responsible for business-critical decisions.
3.9. Security, Privacy and Data governance
E-commerce systems process sensitive information, including personal data, addresses, authentication credentials, transaction histories, and payment-related information. Security must therefore be implemented throughout the data lifecycle. Important database security measures include encryption in transit and at rest, centralized key management, role-based or attribute-based access control, multi-factor authentication, secret management, database activity monitoring, audit logging, data masking, and backup protection. A zero-trust architecture assumes that no user, device, or network location should be trusted by default. Every access request must be authenticated, authorized, and evaluated according to contextual information [13].
For payment data, organizations should minimize the storage of cardholder information and use tokenization or payment service providers where possible. Systems that store, process, or transmit payment card information should follow applicable Payment Card Industry Data Security Standard requirements [14]. Data governance is equally important. Organizations need to define data ownership, data quality rules, retention periods, consent requirements, and deletion procedures. Metadata catalogs and data lineage tools can help identify where data originates, how it is transformed, and which systems use it.
4. Proposed database integration model for E-commerce systems
Based on the trends discussed above, this paper proposes a six-layer database integration architecture for modern e-commerce systems.
4.1. Interaction layer
The interaction layer includes websites, mobile applications, social commerce platforms, point-of-sale systems, marketplace interfaces, and partner applications. User and system requests are received through an API gateway.
The API gateway performs functions such as authentication, request routing, rate limiting, protocol conversion, access logging, and protection against excessive or malicious traffic.
4.2. Business service layer
The business service layer divides the e-commerce system into business domains such as:
- Customer management;
- Product catalog;
- Shopping cart;
- Pricing and promotion;
- Order management;
- Payment;
- Inventory;
- Delivery;
- Customer service.
Each service should own its business logic and data. A service should not directly modify the internal database tables of another service. Communication should occur through defined APIs or business events.
This principle reduces coupling and allows services to be developed, deployed, and scaled independently. However, organizations should avoid unnecessary microservice decomposition. In smaller systems, a modular monolithic architecture may provide adequate separation with lower operational complexity.
4.3. Operational data layer
The operational data layer applies polyglot persistence according to workload characteristics.
A relational or Distributed SQL database may manage orders, payments, invoices, and inventory because these domains require strong consistency and transactional control.
A document database may store product catalogs and flexible product attributes. A key-value or in-memory database may support session management, shopping carts, distributed locks, and frequently accessed data.
A full-text search engine may provide product search, filtering, and ranking. A graph database may analyze relationships among users, products, devices, addresses, and transactions for recommendation or fraud detection.
Each type of business data must have an authoritative source. Search indexes, caches, analytical tables, and vector representations should be treated as derived copies that can be rebuilt from the authoritative data source.
4.4. Event and integration layer
The event and integration layer connects operational services without requiring direct database access. Business events are published to a message broker or stream-processing platform.
CDC mechanisms may capture changes in operational databases and distribute them to search indexes, data lakehouse systems, feature stores, monitoring systems, and machine learning pipelines.
Distributed workflows, such as order creation, payment confirmation, inventory reservation, and delivery scheduling, may be coordinated using the Saga pattern. If one step fails, compensating transactions may cancel or reverse previously completed operations.
This layer should include event schema management, event versioning, duplicate detection, retry policies, and dead-letter processing. Observability is essential because failures may occur across several asynchronous services.
4.5. Analytics and AI data layer
The analytics and AI layer stores both historical and streaming data. A data lakehouse serves as the primary analytical platform, supporting business intelligence, data science, forecasting, customer segmentation, and model training.
A vector database stores product, document, image, and customer behavior embeddings. The vector database may be connected to recommendation engines, semantic search systems, and conversational assistants.
A feature store may manage reusable machine learning features such as customer purchase frequency, average order value, product popularity, and fraud indicators. Feature consistency between training and production environments is important for model reliability.
Before an AI application presents information to a customer, dynamic information such as product price, stock availability, promotion eligibility, and delivery status should be verified against the operational data source.
4.6. Governance and security layer
The governance and security layer applies across the entire architecture. It includes identity and access management, encryption, data cataloging, metadata management, data lineage, quality monitoring, retention policies, privacy controls, and regulatory compliance.
Important operational indicators include:
- Transaction response time;
- Query latency;
- Database throughput;
- Cache hit rate;
- Replication delay;
- CDC latency;
- Event processing delay;
- Data freshness;
- Error and retry rates;
- Cost per transaction;
- Recovery Point Objective;
- Recovery Time Objective.
Continuous monitoring of these indicators allows organizations to identify performance problems and evaluate whether architectural complexity is generating measurable business value.
5. Discussion
The main advantage of the proposed architecture is that database technologies are selected according to workload requirements. Transactional systems maintain strong consistency, document databases support flexible product data, caches reduce latency, event platforms enable asynchronous integration, lakehouses support historical analysis, and vector databases enable semantic search and AI applications.
Nevertheless, architectural flexibility introduces complexity. Data may be duplicated across operational databases, caches, search indexes, lakehouses, and vector stores. If synchronization mechanisms are poorly designed, users may receive outdated product prices, unavailable products, or inconsistent order information.
Organizations should therefore apply several principles.
First, every data domain should have a clearly defined authoritative source. Second, derived data should be reproducible. Third, consistency requirements should be specified according to business risk. Strong consistency may be mandatory for payments and inventory, while eventual consistency may be acceptable for recommendations and analytical reports.
Fourth, new database technologies should only be introduced when supported by measurable workload requirements. Polyglot persistence is not automatically superior to a simpler architecture. Each additional database requires staff expertise, monitoring, backup procedures, security configuration, and integration maintenance.
Fifth, migration should be incremental. An e-commerce company may initially use a managed relational database, a cache, and a search service. CDC, event streaming, lakehouse, graph, or vector technologies may then be introduced when transaction volume, analytical demand, or AI requirements justify them.
Finally, performance evaluation should include not only query speed but also availability, recovery, data freshness, operational cost, security, and developer productivity.
6. Conclusion
Database technology is evolving from centralized relational database systems toward distributed, cloud-native, multi-model, real-time, and AI-integrated data ecosystems. Distributed SQL provides scalable transactional consistency. NoSQL databases support flexible and high-volume workloads. Event-driven architecture and CDC enable near-real-time integration. Data lakehouse platforms unify analytical data, while vector databases provide a foundation for semantic search, recommendation, and conversational commerce.
For e-commerce systems, the appropriate solution is not to select a single database technology for all workloads. Instead, organizations should design a workload-oriented and domain-oriented data architecture. The proposed model integrates business services, polyglot persistence, event streaming, analytical storage, vector search, and unified governance.
The proposed architecture can improve scalability, system availability, analytical responsiveness, and customer experience. However, its benefits depend on effective data ownership, consistency management, security, observability, and cost control.
Future research should implement the proposed model using a realistic e-commerce dataset and evaluate it according to transaction throughput, query latency, synchronization delay, data consistency, search quality, recommendation accuracy, infrastructure cost, and failure recovery performance.

