Daily Process Automation And On-time Updates By Web Scraping Services
ABOUT THE CLIENT
The Client is an Internet Payment Service Provider (IPSP) that enables online companies to accept global payments without the need to obtain and manage their own merchant account. They offer worldwide acceptance, multiple currencies, state of the art merchant tools, subscription and per-unit billing, world class customer service, and a full suite of marketing and revenue features.
Prohibiting the customers from selling illegal products / services through its payment gateway
The client was providing payment gateway services to its customers’ E-commerce portals & subscription-based websites. There were regulations from government that online payment services must be checked and prohibited from illegal activities. The core objective was to make sure that its customers should not use the services for selling illegal products/services. To gather this information, they were using a query based search engine that was labor-intensive and often incomplete because of the constraint in terms of human query composition and result interpretation/analyzing techniques. They also had blind crawl option to crawl the whole web that was again an inefficient and unaffordable choice.
Manual process is time-consuming, labor-intensive thus still provide incomplete information
High volume & velocity of incoming data made it difficult to process manually that resulted in incomplete information
Manual capture and consolidation of targeted information was labor-intensive, time-consuming & expensive
Non-availability of timely reports on prohibited content delayed further actions Absence of analytical interface to make decisions on targeted information
Holistic development of scalable web scraper with advance monitoring functionalities
Designed and developed a Fully Scalable Distributed Web Scraper to crawl their customers’ websites for capturing and consolidating prohibited content.
A Distributed Mongo DB Cluster was used to collect and analyze the data generated by web scraper.
A Web Based Monitoring Tool was developed to manage and monitor web scraper performance. Additionally, the tool was also capable of adding/removing scraper nodes on runtime.
Created Highly Interactive Light-weight UI using industry standard jQuery-JSON combination with dashboard presenting most crucial analytics.
High Performance and equally complex middle tier based on Spring, Spring MVC and Hibernate. For achieving higher performance, data rows were horizontally partitioned using Hibernate-Shards.
TOOLS & TECHNOLOGIES USED
JAVA 1.6 with Apache HTTP client library
Spring MVC (Web Application)
Hibernate (Web Application)
MySQL (Web Application)
Value addition to client, its customers and end users
Reduced Operating Cost: Operating Costs dropped by 60% due to complete automation of process.
Improved Quality and Performance: Fully automated system helped crawling more than 10 million web pages on daily basis and providing précised and accurate analysis results.
Improved Reporting Time: Advanced processing led to significant improvement in reporting time, making it possible to obtain reports within 12 hours instead of previous 3 to 4 days reporting.
Sustainable Competitive Advantage: Quality enhancement, timely reporting and advanced web-based tools helped client provide state-of-the-art solutions and services and stay ahead of its peers.
Since 2003, A3logics has gained a reputation of delivering quality services by delivering time critical solutions and continuously evolving through innovative delivery methods. We have a dedicated team of high-quality professionals who constantly work with diverse industrial players in varied sectors like biometrics, education, banking, media monitoring, retail, shipping, logistics etc.