Guides & support » Reports » Avoid duplicated data in exports

Avoid duplicated data in exports

Accurate and complete data is essential in analytics. Sometimes, you might see duplicate sessions in your reports. This is normal and helps ensure all data is tracked properly. 

Piwik PRO automatically reduces duplicates, but knowing why they happen and how to handle them can help you keep your data reliable.

Why does duplicate data happen?

When visitors use your website, their actions, like page views, are tracked and sent to Piwik PRO. At first, these events are temporarily stored while we determine session conditions. A session ends when certain conditions are met, such as reaching a set duration, event limits, or 30 minutes of inactivity. Once processed, the data is stored in the analytics database.

Sometimes, events arrive late due to:

  • Operating system restrictions: Some operating systems delay tracking requests.
  • Poor internet connection: Limited internet connection may cause retries or delays.
  • JavaScript errors: Errors in the tracking code can disrupt event tracking.
  • Browser extensions and privacy settings: Some extensions block or delay event tracking.
  • System errors: Rare technical issues on our platform may also cause delays.

      When delayed events belong to sessions that have already been processed and stored, Piwik PRO reprocesses them, enriching the original session with missing data. While this ensures data accuracy, it may also create temporary duplicate sessions.

      How deduplication works

      Managing billions of events while keeping sessions accurate is complex. We can’t just delete duplicates instantly, so here’s how we manage them:

      1. Duplicate detection: We detect and mark duplicate sessions.
      2. Temporary storage: Duplicates are briefly stored alongside your existing data.
      3. Regular cleanup: Older duplicates are removed in batches, about once per hour.

      Because of this process, duplicates may appear in your data for a short time before being removed.

      How duplicates impact reports and data exports

      Because deduplication happens in batches, duplicates may temporarily affect:

      • UI reports: Duplicates may appear temporarily, but unique metrics (sessions, visitors, unique page views) are unaffected.
      • Scheduled reports: Reports sent between 4 – 8 AM UTC might include duplicates if generated before deduplication is finished.
      • Exporter services: Exported data might contain duplicates if retrieved too early.
      • API data retrieval: Data pulled before deduplication is completed may include duplicates.

      How to minimize duplicates

      Duplicate sessions are rare and usually make up only a small part of your data. However, if you notice them often, try these solutions:

      • Delay scheduled exports to give Piwik PRO time to remove duplicates before exporting your data. Contact your account manager for details.
      • Choose the best timing for exports:
        • Most data: Export after 1 PM (website’s time zone) to allow deduplication and other processes to finish.
        • Google Ads integrations: Export after 9 PM UTC (paid plans) or 10 PM UTC (free plans) for the most accurate data.
        • Early morning reports: If you see duplicates, they may disappear once the deduplication is complete.
      • Align your scheduled exports and API calls with deduplication timelines, as delayed events and reprocessed sessions can affect your data.