Pushing the boundaries – High Performance Data Processing
On a daily basis infohubble is pushing multi-million web pages for deep analysis, content extraction and data aggregation through their data processing pipelines. All of it is powered by a very large portion of all services and APIs provided by Amazon Web Services.
After a quick overview of the actual need and the business context, we will delve into technical details of the various stacks deployed on our grid and how we are constantly trying the push the limits of the system. We will also cover topics like automation, deployment, storage, high-performance logging, monitoring, metering and how this can be achieved on AWS. Lessons learned and tips & tricks will be discussed so everybody can learn from our mistakes.