@awsdevelopers
  @awsdevelopers
AWS Developers | Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5 @awsdevelopers | Uploaded 5 months ago | Updated 3 hours ago
Find out how you can use Apache Flink to tackle late or duplicated data and improve data quality with exactly-once processing. We’ll also dive into archiving raw events for on-demand replay or reprocessing with Amazon Data Firehose.

In this series, Anand Shah (Data Analytics and Streaming Specialist at AWS) will help you build a modern data streaming architecture for a real-time gaming leaderboard. This architecture includes data ingestion, real-time enrichment with database change data capture (CDC), data processing, as well as computing, storing and visualizing the results. You will also learn advanced streaming analytics techniques, such as the control channel method for A/B testing, updating features and parameters with zero downtime, and how to handle late arrival of data. Anand will also talk you through the process of data de-duplication, as well as how you can store historical data for replay on-demand. 🎉

🌟 Get started with Amazon Managed Service for Apache Flink today, to build and run your fully managed Apache Flink applications on AWS!

🔗 Github repository: github.com/build-on-aws/real-time-gaming-leaderboard-apache-flink

Resources used in this video:
🔗 Intro to Amazon Data Firehose: docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html
🔗 Data de-duplication with Apache Flink: nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/table/sql/queries/deduplication
🔗 Apache Flink late data handling (Watermarking and reordering): nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/operators/windows/#allowed-lateness
🔗 Apache Flink Filesystem source: nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem

Continue your learning:
🔗 Automate deployment and version updates for Amazon Kinesis Data Analytics applications with AWS CodePipeline: aws.amazon.com/blogs/big-data/automate-deployment-and-version-updates-for-amazon-kinesis-data-analytics-applications-with-aws-codepipeline
🔗 SQL-based streaming analytics with Apache Flink: github.com/aws-samples/sql-based-streaming-analytics
🔗 Amazon Managed Service for Apache Flink Workshop: https://catalog.workshops.aws/managed-flink/en-US
🔗 Application scaling in Managed Service for Apache Flink: docs.aws.amazon.com/managed-flink/latest/java/how-scaling.html
🔗 Logging and monitoring in Amazon Managed Service for Apache Flink: docs.aws.amazon.com/managed-flink/latest/java/monitoring-overview.html
🔗 Audit AWS service events with Amazon EventBridge and Amazon Kinesis Data Streams: aws.amazon.com/blogs/big-data/audit-aws-service-events-with-amazon-eventbridge-and-amazon-kinesis-data-firehose

Follow AWS Developers:
👾 Twitch: twitch.tv/aws
🐦 Twitter: twitter.com/awsdevelopers
💻 LinkedIn: linkedin.com/showcase/aws

Follow Anand Shah: 
🐦 Twitter: twitter.com/anandshah110
💻 LinkedIn: linkedin.com/in/anandshah110

00:00 Intro
00:21 Impact of late data arrival
01:23 How to handle late data arrival
01:52 Impact of duplicate messages
02:52 How to de-duplicate data
03:30 Demo: CDK source code walkthrough and deploy
05:00 Demo: Handling late arrival of data
05:26 Demo: Challenge 5.1 - De-duplicate data
06:04 Demo: Setup Amazon Data Firehose for data archival
10:32 Demo: On-demand replay of archived data
11:29 Demo: Challenge 5.2 - Replay data
11:53 Conclusion

 #LateDataArrival, #ExactlyOnce, #ArchivalAndReplay, #ManagedServiceForApacheFlink
Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5Mastering Amazon Bedrock with Claude 3: Developers Guide with Demos5 Game-changing Generative AI Apps with PartyrockHigh Availability vs. Disaster Recovery ExplainedPerform Data Analytics using Python and Amazon SageMaker Studio Lab to combat systemic racismBoost your Primary Database Performance with Amazon ElastiCache - AWS Databases in 15Secure Your Amazon Redshift Data Warehouse - AWS Analytics in 15Build a UGC Live Streaming App with Amazon IVS: Schema Overview (Lesson 1.5)Serverless Generative AI: Amazon Bedrock Running in LambdaArchitect a Modern Web Application with .NET & AWSSave Money on Your AWS Bill (5 Simple Tips in 5 Minutes)Move to managed Amazon ElastiCache to increase efficiency and innovation- AWS Database in 15

Handle Late or Duplicated Data and Archive Events for On-Demand Replay | 5/5 @awsdevelopers

SHARE TO X SHARE TO REDDIT SHARE TO FACEBOOK WALLPAPER