Ensuring Accuracy in Data Records Reconciliation between CGSN and IN for 2G/3G and 4G Networks
Introduction
In the telecom industry, data usage records are logged by various network nodes, such as SGSN/GGSN in 2G/3G networks and SGW/PGW in 4G networks. These records, known as data session EDRs (Event Detail Records), capture critical information about data sessions, including the volume of data used, session duration, and charging details. Meanwhile, the Intelligent Network (IN) records the billing details associated with these data sessions. Reconciliation between these two sources is essential to ensure accuracy in billing, revenue assurance, and network management. In this blog post, we will explore the importance of reconciling these data records, the challenges involved, and how Big Data tools like Apache Spark can streamline this process.
Why Data Records Reconciliation is Important
- Accuracy in Data Billing: Each data session, whether in a 2G/3G or 4G network, must be accurately billed to the customer. Discrepancies between the volume of data recorded by the CGSN and the charges recorded by the IN can lead to billing errors, causing revenue loss and customer dissatisfaction.
- Revenue Assurance: Ensuring that all data usage is correctly captured and billed is crucial for preventing revenue leakage. Reconciliation helps identify missing, duplicated, or incorrect records, allowing operators to correct discrepancies proactively.
- Network Performance Monitoring: Reconciliation can also provide insights into network performance by comparing the expected usage (as recorded by SGSN/GGSN or SGW/PGW) with the actual charges. This helps operators in network planning and optimization.
How to Achieve Data Records Reconciliation
- Matching Using MSISDN, IMSI, and Timestamp:
- MSISDN and IMSI are unique subscriber identifiers that link data sessions across network and billing systems.
- The timestamp is a crucial attribute that captures the start and end times of a session. Matching records based on MSISDN, IMSI, and timestamp helps in accurately linking data usage records from CGSN and IN.
- Using a Unique Correlation ID:
- Some systems generate a unique correlation ID for each data session, linking records between the network and billing nodes seamlessly. This ID makes reconciliation straightforward by directly associating each data session with its corresponding billing record.
- However, in many instances, this unique ID is not available, complicating the reconciliation process.
Challenges in Data Records Reconciliation
- Absence of a Unique Correlation ID:
- When there is no unique ID linking records between CGSN and IN, operators must rely on MSISDN, IMSI, and timestamp for matching. This approach is prone to errors, especially when dealing with sessions that start and stop frequently or overlap.
- Time Synchronization Issues:
- Even a minor time discrepancy between SGSN/GGSN (or SGW/PGW) and IN can lead to unmatched records. These discrepancies can arise due to differences in system clocks, network delays, or processing times.
- To address this, operators often use a time window to match records, where sessions are considered correlated if their timestamps fall within a predefined range, such as ±10 seconds.
- Handling High Volumes of Intermediate Records:
- Data sessions often generate multiple intermediate records, especially during long or fragmented sessions. These records need to be consolidated into a single session record before reconciliation.
- For 4G networks, SGW/PGW may generate separate records for different parts of a session, further complicating the consolidation process.
Leveraging Big Data Tools for Efficient Reconciliation
- Using Apache Spark:
- Apache Spark’s distributed processing capabilities are ideal for handling large volumes of data records from both CGSN and IN. It allows for efficient matching of records based on multiple keys like MSISDN, IMSI, and timestamp.
- Spark’s in-memory processing reduces latency, enabling near real-time reconciliation, which is critical for maintaining billing accuracy and revenue assurance.
- Consolidating Intermediate Records:
- Spark can aggregate multiple intermediate records into a single session based on MSISDN and IMSI, while applying business rules to filter out duplicates and handle overlaps.
- For example, all records with the same MSISDN and IMSI within a session can be grouped together, and their data volume and duration summed to create a consolidated record.
- Handling Time Differences:
- Spark’s window functions allow for flexible time-based grouping and aggregation. A time window can be defined to match records with slight timestamp differences, accounting for system clock discrepancies between CGSN and IN.
- This helps in accurately correlating records, even when exact timestamps do not match.
- Scaling with Data Growth:
- As data usage continues to grow, the volume of EDRs from SGSN/GGSN and SGW/PGW increases exponentially. Spark’s ability to scale horizontally by adding more nodes to the cluster ensures that reconciliation processes can keep pace with the growing data volumes without compromising performance.
Conclusion
Reconciliation of data records between SGSN/GGSN (or SGW/PGW) and IN is crucial for accurate billing, revenue assurance, and network management. Despite the challenges such as the absence of a unique correlation ID, time synchronization issues, and high volumes of intermediate records, big data tools like Apache Spark provide a robust solution. Spark’s distributed processing, in-memory computation, and advanced aggregation capabilities enable efficient and scalable reconciliation, ensuring data integrity and billing accuracy.
In the next blog post, we will provide a step-by-step guide on implementing a Spark-based data records reconciliation pipeline, complete with code examples and best practices. Stay tuned.