Archives for 2020

You are browsing the site archives by date.

Introducing Baskerville (waf!)

 

The more outré and grotesque an incident is the more carefully it deserves to be examined.
― Arthur Conan Doyle, The Hound of the Baskervilles

 

Chapter 1 – Baskerville

Baskerville is a machine operating on the Deflect network that protects  sites from hounding, malicious bots. It’s also an open source project that, in time, will be able to reduce bad behaviour on your networks too. Baskerville responds to web traffic, analyzing requests in real-time, and challenging those acting suspiciously. A few months ago, Baskerville passed an important milestone – making its own decisions on traffic deemed anomalous. The quality of these decisions (recall) is high and Baskerville has already successfully mitigated many sophisticated real-life attacks.

We’ve trained Baskerville to recognize what legitimate traffic on our network looks like, and how to distinguish it from malicious requests attempting to disrupt our clients’ websites. Baskerville has turned out to be very handy for mitigating DDoS attacks, and for correctly classifying other types of malicious behaviour.

Baskerville is an important contribution to the world of online security – where solid web defences are usually the domain of proprietary software companies or complicated manual rule-sets. The ever-changing nature and patterns of attacks makes their mitigation a continuous process of adaptation. This is why we’ve trained a machine how to recognize and respond to anomalous traffic. Our plans for Baskerville’s future will enable plug-and-play installation in most web environments and privacy-respecting exchange of threat intelligence data between your server and the Baskerville clearinghouse.

Chapter 2 – Background 

Web attacks are a threat to democratic voices on the Internet. Botnets deploy an arsenal of methods, including brute force password login, vulnerability scanning, and DDoS attacks, to overwhelm a platform’s hosting resources and defences, or to wreak financial damage on the website’s owners. Attacks become a form of punishment, intimidation, and most importantly, censorship, whether through direct denial of access to an Internet resource or by instilling fear among the publishers. Much of the development to-date in anomaly detection and mitigation of malicious network traffic has been closed source and proprietary. These silo-ed approaches are limiting when dealing with constantly changing variables. They are also quite expensive to set-up, with a company’s costs often offset by the sale or trade of threat intelligence gathered on the client’s network, something Deflect does not do or encourage.

Since 2010, the Deflect project has protected hundreds of civil society and independent media websites from web attacks, processing over a billion monthly website requests from humans and bots. We are now bringing internally developed mitigation tooling to a wider audience, improving network defences for freedom of expression and association on the internet.

Baskerville was developed over three years by eQualitie’s dedicated team of machine learning experts. Several challenges or ambitions were presented to the team. To make this an effective solution to the ever-growing need for humans to perform constant network monitoring, and the never-ending need to create rules to ban newly discovered malicious network behaviour, Baskerville had to:

  • Be fast enough to make it count
  • Be able to adapt to changing traffic patterns
  • Provide actionable intelligence (a prediction and a score for every IP)
  • Provide reliable predictions (probation period & feedback)

Baskerville works by analyzing HTTP traffic bound for your website, monitoring the proportion of legitimate vs anomalous traffic. On the Deflect network, it will trigger a Turing challenge to an IP address behaving suspiciously, thereafter confirming whether a real person or a bot is sending us requests.

Chapter 3 –  Baskerville Learns

To detect new evolving threats, Baskerville uses the unsupervised anomaly detection algorithm Isolation Forest. The majority of anomaly detection algorithms construct a profile of normal instances, then classify instances that do not conform to the normal profile as anomalies. The main problem with this approach is that the model is optimized to detect normal instances, but not optimized to detect anomalies causing either too many false alarms or too few anomalies. In contrast, Isolation Forest explicitly isolates anomalies rather than profiling normal instances. This method is based on a simple assumption: ‘Anomalies are few, and they are different’. In addition, the Isolation Forest algorithm does not require a training set to contain normal instances only. Moreover, the algorithm performs even better if the training set contains some anomalies, or attack incidents in our case. This enables us to re-train the model regularly on all the recent traffic without any labeling procedure in order to adapt to the changing patterns.

Labelling

Despite the fact that we don’t need labels to train a model, we still need a labelled dataset of historical attacks for parameter tuning. Traditionally, labelling is a challenging procedure since it requires a lot of manual work. Every new attack must be reported and investigated, and every IP should be labelled either malicious or benign.

Our production environment reports several incidents a week, so we designed an automated procedure of labelling using a machine model trained on the same features we use for the Isolation Forest anomaly detection model.

We reasoned that if an attack incident has a clearly visible traffic spike, we can assume that the vast majority of the IPs during this period are malicious, and we can train a classifier like Random Forest particularly for this incident. The only user input would be the precise time period for that incident and for the time period for ordinal traffic for that host. Such a classifier would not be perfect, but it would be good enough to be able to separate some regular IPs from the majority of malicious IPs during the time of the incident. In addition, we assume that attacker IPs most likely are not active immediately before the attack, and we do not label an IP as malicious if it was seen in the regular traffic period.

This labelling procedure is not perfect, but it allows us to label new incidents with very little time or human interaction.

An example of the labelling procedure output

Performance Metrics

We use the Precision-Recall AUC metric for model performance evaluation. The main reason for using the Precision-Recall metric is that it is more sensitive to the improvements for the positive class than the ROC (receiver operating characteristic) curve. We are less concerned about the false positive rate since, in the event that we falsely predict that an IP is doing something malicious, we won’t ban it, but only notify the rule-based attack mitigation system to challenge that specific IP. The IP will only be banned if the challenge fails.

The performance of two different models on two different attacks

Categorical Features

After two months of validating our approach in the production environment, we started to realize that the model was not sophisticated enough to distinguish anomalies specific only to particular clients.

The main reason for this is that the originally published Isolation Forest algorithm supports only numerical features, and could not work with so-called categorical string values, such as hostname. First, we decided to train a separate model per target host and create an assembly of models for the final prediction. This approach over complicated the whole process and did not scale well. Additionally, we had to take care of adjusting the weights in the model assembly. In fact, we jeopardized the original idea of knowledge sharing by having a single model for all the clients. Then we tried to use the classical way of dealing with this problem: one-hot encoding. However, the deployed solution did not work well since the model became too overfit to the new hostname feature, and the performance decreased.

In the next iteration, we found another way of encoding categorical features  based on a peer-review paper recently published in 2018. The main idea was not to use one-hot encoding, but rather to modify the tree-building algorithm itself. We could not find the implementation of the idea, and had to modify the source code of IForest library in Scala. We introduced a new string feature ‘hostname,’ and this time the model showed notable performance improvement in production. Moreover, our final implementation was generic and allowed us to experiment with other categorical features like country, user agent, operating system, etc.

 

Stratified Sampling

Baskerville uses a single machine learning model trained on the data received from hundreds of clients.This allows us to share the knowledge and benefit from a model trained on a global dataset of recorded incidents. However, when we first deployed Baskerville, we realized that the model is biased towards high traffic clients.

We had to find a balance in the amount of data we feed to the training pipeline from each client. On the one hand, we wanted to equalize the number of records from each client, but on the other hand, high traffic clients provided much more valuable incident information. We decided to use stratified sampling of training datasets with a single parameter: the maximum number of samples per host.

Storage

Baskerville uses Postgres to store the processed results. The request-sets  table holds the results of the real-time weblogs pre-processed by our analytics engine which has an estimated input of ~30GB per week. So, within a year, we’d have a ~1.5 TB table. Even though this is within Postgres limits, running queries on this would not be very efficient. That’s where the data partitioning feature of Postgres came in. We used that feature to split the request sets table into smaller tables, each holding one week’s data. . This allowed for better data management and faster query execution.

However, even with the use of data partitioning, we needed to be able to scale the database out. Since we already had the Timescale extension for the Prometheus database, we decided to use it for  Baskerville too. We followed Timescale’s tutorial for data migration in the same database, which means we created a temp table, moved the data from each and every partition into the temp table, ran the command to create a hypertable on the temp table, deleted the initial request sets table and its partitions, and, finally, renamed the temp table as ‘request sets.’ The process was not very straightforward, unfortunately, and we did run into some problems. But in the end, we were able to scale the database, and we are currently operating using Timescale in production.

We also explored other options, like TileDb, Apache Hive, and Apache HBase, but for the time being, Timescale is enough for our needs. We will surely revisit this in the future, though.

Architecture

The initial design of Baskerville was created with the assumption that Baskerville will be running under Deflect as an analytics engine, to aid the already in place rule-based attack detection and mitigation mechanism. However, the needs changed as it became necessary to open up Baskerville’s prediction to other users and make our insights available to them.

In order to allow other users to take advantage of our model, we had to redesign the pipelines to be more modular. We also needed to take into account the kind of data to be exchanged, more specifically, we wanted to avoid any exchange that would involve sensitive data, like IPs for example. The idea was that the preprocessing would happen on the client’s end, and only the resulting  feature vectors  would be sent, via Kafka, to the Prediction centre. The Prediction centre continuously listens for incoming feature vectors, and once a request arrives, it uses the pre-trained model to predict and send the results back to the user. This whole process happens without the exchange of any kind of sensitive information, as only the feature vectors go back and forth.

On the client side, we had to implement a caching mechanism with TTL, so that the request sets wait for their matching predictions. If the prediction center takes more than 10 minutes, the request sets expire. 10 minutes, of course, is not an acceptable amount of time, just a safeguard so that we do not keep request sets forever which can result in OOM. The ttl is configurable. We used Redis for this mechanism, as it has the ttl feature embedded, and there is a spark-redis connector we could easily use, but we’re still tuning the performance and thinking about alternatives. We also needed a separate spark application to handle the prediction to request set matching once the response from the Prediction center is received.. This application listens to the client specific Kafka topic, and once a prediction arrives, it looks into redis, fetches the matched request set, and saves everything into the database.

To sum up, in the new architecture, the preprocessing happens on the client’s side, the feature vectors are sent via Kafka to the Prediction centre (no sensitive data exchange), a prediction and a score for each request set is sent as a reply to each feature vector (via Kafka), and on the client side, another Spark job is waiting to consume the prediction message, match it with the respective request set, and save it to the database.

Read More

We, the Internet (Nous l’Internet)

Les Québécois et les francophones du Canada participeront bientôt au plus grand dialogue citoyen mondial jamais organisé sur l’avenir de l’Internet.

En octobre 2020, des milliers de personnes représentant la diversité de leur pays respectifs se réuniront dans 70 pays pour livrer leur vision de l’avenir de l’Internet, ce qui en fera la plus grande participation citoyenne jamais organisée. 

Au Québec, ce dialogue virtuel se tiendra les 23 et 24 octobre et la population est invitée à s’inscrire dès maintenant pour y participer. Initié par l’organisation française Missions Publiques, l’événement s’inscrit dans une démarche à l’échelle mondiale dont l’objectif est de faire naître une parole citoyenne sur un sujet clé : l’avenir du numérique.

Durant le Dialogue Citoyen mondial sur l’avenir d’Internet, 100 participants de chaque pays sont invités à apprendre, discuter et décider de ce qui pour eux, fera d’Internet un outil meilleur pour les années à venir. Au Québec, ce forum est organisé par eQualitie en collaboration avec le chapitre québécois de l’Internet Society et la Fondation SecDev

Des discussions marquées par la COVID-19 

À l’heure d’une pandémie mondiale, Internet devient l’épine dorsale de nos interactions sociales. Par conséquent, le sujet de la COVID-19 s’immiscera à l’intérieur des thématiques déjà prévues: l’identité numérique, la cybersécurité, l’information et la désinformation à l’ère du numérique, ainsi que l’intelligence artificielle. Cette délibération mondiale permettra d’élaborer des recommandations citoyennes informées qui seront soumises aux décideuses et décideurs au niveau local, régional et international. Les résultats des délibérations seront soumis au Forum Québécois sur la Gouvernance d’Internet québécois, ainsi que leur équivalent au niveau canadien et international.

Pour Dmitri Vitaliev, directeur d’eQualitie, « les enjeux du développement des technologies imposent de faire renaître la confiance entre les citoyens et les décideurs. C’est pourquoi les modes de gouvernance doivent devenir plus inclusifs pour s’adapter aux défis à venir. »

« Nous l’Internet – We, the Internet » est coordonné par une coalition de partenaires mondiaux tels que la Commission européenne, l’UNESCO, l’Internet Society, la Wikimedia Foundation, la World Wide Web Foundation, ainsi que les gouvernements suisses et allemands, notamment.

 

Pour informations :
Michel Lambert

A propos :
Nous l’Internet (Québec) Inscriptions
We The Internet (Global)

Read More

eQ offers Deflect website security services for free in response to COVID-19

 

 

In response and solidarity with numerous efforts that have sprung up to help with communications, coordination and outreach during the COVID-19 epidemic, eQualitie is offering Deflect website security and content delivery services for free until the end of 2020 to organizations and individuals working to help others during this difficult time. This includes:

  • Availability: as demand for your content grows, our worldwide infrastructure will ensure that your website remains accessible and fast
  • Security: protecting your website from malicious bots and hackers
  • Hosting: for existing or new WordPress sites
  • Aanalytics: view real-time statistics in the Deflect dashboard

Deflect is always offered free of charge to not-for-profit entities that meet our eligibility requirements. This offer extends our free services to any business or individual that is responding to societal needs during the pandemic, including media organizations, government, online retail and hospitality services, etc. We will review all applications to make sure they align with Deflect’s Terms of Use.

It takes 15 minutes to set up and we’ll have you protected on the same day. Our support team can help you in English, French, Chinese, Spanish and Russian. If you have any questions please contact us.

 

 

 

Read More

Web Security Fellowship – project review

Launched in early 2019, the Web Security Fellowship was a pilot project for eQualitie to introduce more IT professionals to the ranks of active civil society organizations. Eight fellows were selected from a public application process for a six-month placement within host organizations, comprising of human rights and independent media groups in Russia. The fellowship began with a three month intensive program on improving technical security insight and practical skills among the fellows. Thereafter, together with their host organization, fellows came up with a project or a series of tasks to improve the security of the host organization’s Web platform, mobile application or technical process. Herein we present the fellows, their projects and outcomes.

Webinar program

Presented by industry experts from the Runet, 10 online lectures were held within a three month period. The course material included:

  • Organizational audits: technology assessment, risks & vulnerabilities, operation security
  • Implementing a “security policy” within a civil society organization
  • Strengthening web servers
  • Cyber law, digital violence and censorship
  • Latest developments in Internet censorship and its circumvention
  • The theoretical and practical aspects of platform penetration testing
  • Web site security: hosting and DNS, performance analysis and load testing, DDoS mitigation
  • OWASP TOP 10 methodology
  • Defensive programming
  • Secure-by-design principles and system architectures

 

Meet the fellows and their projects

Aleksandr, Moscow

“I am working as a system administrator at the Memorial society’s Moscow office since 2009. My job is to administer the Windows Active Directory and Linux-based PC’s. I can design networks and configure network hardware on RouterOS and PfSense, make up html/css pages. I can provide video streaming, audio processing and editing, and technical support at public events (sound and video directing). At the basic level, I can administer *nix-based Web servers. I don’t know how to write codes (except simple Python/bash scripts), or to administer Windows-based Web servers.

In my spare time I work as several holiday schools’ manager (notably the Puschino Winter School and the Molecular and Theoretical Biology School), learn to play guitar, play video games, watch series and, last but not least, rear my daughter”.

Organization: International Memorial

Project: Improve the security of the web hosting server, making further recommendations to in-house developers

Tasks: Audit the security of base.memo.ru: develop a threat model, interview web developers, perform black box and white box penetration tests, analyze the configuration of the host site and audit the site code. Based on the results of the audit – re-configure the host server software to improve its security and set draft recommendations for the developer of the website application.

Project details: Pentest critical vulnerabilities using burp/owasp zap, sqlmap; configure monitor services (zabbix or its equivalent), harden the OS and its access interfaces (ssh, https).

Artemy, Moscow

“For more than a decade I have been developing applications and services. I prefer Kotlin, Java, Python; I use Sketch to draw designs and I love Material design. In my spare time I dig into engines when my Subaru would not accelerate or to switch my gear box to sport”.

Organisation: Not disclosed

Project: Make it harder to block application circuits

Tasks: Code a prototype library to identify the locker and lock them.

Project details: To block Telegram, Roskomnadzor has been using IP blocking via an Android system to lock the IPs that the application listens to when it tries to connect and masking its presence using the providers’ VPN networks. The objective of this project is to identify the locker who uses the above method; it may well stay relevant judging by the news briefs: Roskomnadzor plans to lock applications using DPI, the task being prolonged for over five years. A prototype implementation should have the same device distributing and receiving IPs as well as searching them and locking;  it is to simplify the logic for a prototype. The battlefield version of this project is planned to distribute IPs from the back end using several circuits and other technologies to circumvent the locking.

Github: github.com/art2limit/Offenbarung

 

Nikita, Moscow

“I work in the field of Information Security expertise for business software  — I provide support for the SIEM system. During my work I scanned through innumerable logs and learned to scoop as much as possible; I worked with a wide range of IT systems (FW, DLP, antiviruses, hypervisors, DNS, DHCP, proxies and others) and usually know what’s happening inside and where to dig to come up with a relevant thing. I am good at problems finding and solving and I love to optimize the trivialities using Python. I am interested in many security trends but even more I love good muzak and movie from Carbon Based Lifeforms to Pearl Jam to Angerfist and from Clerks to The Good, the Bad and the Ugly to Sharknado”.

Organization: Mass Media Defence Centre

Project: An integrated Web platform

Objective: To free the organization from the multitude of difficult to maintain and ageing tools by creating a universal and easily maintained platform able to host all the current information resources and extendable to include more resources in the future and to enhance the availability. To secure the organization’s access to the external network and ensure the stability of that access.

Tasks:

  • Project and devise a Web platform based on the state of the art tools and capable to host any of organization’s sites; develop and implement an enhanced availability (to block attacks and lockings).
  • Clear up the technicalities of network communications inside the organization.

 

Oksana, Saint Petersburg

“I have done some web-site design in WordPress and application coding in clean-code-javascript using several frameworks. More on my projects is available here: http://o.web-corner.net/ And once I have been working on an application for an NPO using a database that contained sensitive information (together with the backend’s developer); the backend was in Java, the frontend was in VueJS”.

Organization: Nochlezhka

Project: Secure web hosting

Objective: To enhance the security of organization’s Web resources

Tasks:

  • Restrict the free access to the organization’s main web-site from volunteers and other non-members of the staff;
  • Check the volume of the critical information about our resources available from the outside;
  • Develop a logins and passwords storage system for the members of the staff;
  • Configure back-ups of the main Web site;
  • Configure server applications to block brute force attacks;
  • Configure the monitoring;
  • Make corrections to the Social Worker’s Multifunctional Cabinet application;
  • Update php.

 

Anton, Yaroslavl

“I am a Linux, Windows system administrator; work with ERP and CRM systems (Microsoft Sharepoint, Dynamics). I mostly dealt with business information systems, deployed in on demand and hybrid environments. I have extensive experience in fine tuning Linux vps to several Web tasks provided with a basic security (fail2ban, access management using ssh and so on), as well as maintaining stacks /apache/mysql/php, nginx/mysql/php. I am an activist and a coordinator of the Golos movement in Yaroslavl”.

Organization: Not disclosed

Project: Fix the major holes in the organization’s security, deploy data storage engines and policies, and perform the back-ups.

Tasks:

  • Make VPN-only the common access to the organization’s recourses and, the first of all, to the site’s administration panel;
  • Set up a single place to store the inputs and provide the access control and the data encryption. The solution must support the storage of diverse data formats and media files. The data must be stored in a neutral jurisdiction;
  • Fine-tune the work flow and an engine to back up the Web application’s database regularly;
  • Check the application for critical vulnerabilities in the OWASP-10 list and eliminate/accept the risks;
  • Solve the problems with physical networking hardware – upgrade the router, move it to the rented facility’s boundaries, configure the local network, the firewall, the common access to the recourses. Provide a guest Wi-Fi access to the Web.

 

Konstantin, Orenburg

White hacker, software engineer, fullstack developer.

“I am skilled in: Python, JavaScript,  C++, videostreaming, heavily loaded systems, Django, Linux, telecommunications, React Native, Smart tv. I founded a technology company in the field of software design for IPTV/OTT operators and an OTT service provider (the company is Microimpuls, I am its CTO). My background is mathematician programmer”.

Organization: Horizontal Russia 7х7 interregional webzine

Project: Pentest a new Web site engine

Objective: Expose and fix security problems in a new self-made Website engine; reduce the penetration hazard.

Tasks:

  • Scrutinize the architecture of the new Website as well as its subsystems;
  • Scrutinize the source code of the engine and its modules;
  • Expose actual and potential vulnerabilities of the Website; perform the penetration test;
  • Come up with practical options to fix the vulnerabilities / fix them;
  • Come up with options to reduce the penetration hazard / deploy them;
  • Harden the server hosting the website;
  • Set up the Website’s security monitoring and configure the tools to detect the penetrations;
  • Develop rules for code-writing and for regular audit to maintain the Website’s  security.

 

K., Moscow

“For a decade I have been professionally dealing with the development and production of Web projects – makeup, design, coding (nowadays it’s called full-stack). I am skilled in the makeup with js, jquery, html, and css. I do the coding mostly in drupal, php. Currently I am studying python. I prefer to apply to projects I find interesting in the fields of science, art (music, theatre, painting, photography) and advocacy”.

Organization: OVD-Info

Project: A preparation step to develop technical specifications for a detention monitoring database

Objective: Develop a UX matrix, User Stories. Scrutinize, analyze and select the software tools to develop the database.

Tasks: Develop a script and an interface for every role in the headquarters. Create a UX matrix. Determine the engine to develop the database as well as the technologies and systems that are updated and supported to power the project. Must communicate with every user group of a detention database (lawyers, monitors, analysts) to better understand the problems with current interfaces and take stock developing the technical specifications.

 

Boris, Moscow

“2006 to 2008 I worked as a Web layout designer and a Web developer in a large company in the field of distance education in Russia. For more than a decade I make my living from the repair, assembly and set up of computers and peripherals as well as setting up and tuning the computer networks.

Surely I am skilled in installing all sorts of software. I have an experience of teaching the computer science at school. For many times I was an IT-volunteer for OVD-Info. For a year I did technical maintenance at the human rights organization”.

Organization: Memorial Human Rights Center

Project: Design a plan to migrate Memorial HRC to a cloud

Objective: Develop a project for Memorial HRC to switch to a cloud service: give the project rationale and describe the transition phases. Design a plan, set up a cloud infrastructure, test it, migrate the data and start it up.

Tasks:

  • Explain the rationale for moving to a cloud to the HRC staff; make a presentation of the transition phases;
  • Choose a cloud provider;
  • Set up security policies for the cloud participation;
  • Provide a single input point to enter the application from any environment;
  • Protect IDs in local and cloud environments;
  • Provide integrated management for the cloud and the security;
  • Configure cloud services for back ups and disaster recovery of the local environment;
  • Set up a platform for consistent data;
  • Deliver the benefits of having a common database both in the local environment and in the cloud;
  • Save costs with moving the local data to the cloud;
  • Apply services for the consistent data storage, analysis and visualization;
  • Run the state of art applications in the local environment and in the cloud;
  • Fine tune the intranet and purchase the routers to ready the Internet connection for seamlessly using the cloud services.
Read More