From edda054f782a25857e4938fa091fe94fa6006331 Mon Sep 17 00:00:00 2001 From: Andrey Meshkov Date: Mon, 14 Oct 2019 14:44:32 +0300 Subject: [PATCH 1/4] =?UTF-8?q?docs:=20=E2=9C=8F=EF=B8=8F=20added=20hosts?= =?UTF-8?q?=20blocklists=20syntax=20description?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Home.md | 1 + Hosts-Blocklists.md | 132 ++++++++++++++++++++++++++++++++++++++++++++ OpenRC.md | 2 +- __Sidebar.md | 1 + 4 files changed, 135 insertions(+), 1 deletion(-) create mode 100644 Hosts-Blocklists.md diff --git a/Home.md b/Home.md index 9899c56..e67b747 100644 --- a/Home.md +++ b/Home.md @@ -31,3 +31,4 @@ The wiki was just recently created, so there isn't much content (yet). * [How to install and run AdGuard Home on Raspberry Pi](Raspberry-Pi) * [How to install and run AdGuard Home on a Virtual Private Server](VPS) * [OpenRC service-script](OpenRC) +* [How to write hosts blocklists](Hosts-Blocklists) diff --git a/Hosts-Blocklists.md b/Hosts-Blocklists.md new file mode 100644 index 0000000..65ee5be --- /dev/null +++ b/Hosts-Blocklists.md @@ -0,0 +1,132 @@ +# AdGuard Home - How to write hosts blocklists + +There are two different approaches to writing hosts blocklists: + +- [/etc/hosts syntax](#etc-hosts) - the old, tried and true approach is to use the same syntax as Operation Systems use for the "hosts" files. +- [Adblock-style syntax](#adblock-style) - modern approach to writing filtering rules based on using a subset of the Adblock-style syntax. This way blocklists will be compatible with browser ad blockers. + +If you are creating a blocklist for AdGuard Home, we recommend using the [Adblock-style syntax](#adblock-style). It has a couple of important advantages over the old-style syntax: + +- **Blocklists size.** Using pattern-matching allows you to have a single rule instead of hundreds of `/etc/hosts` entries. +- **Compatibility.** Your blocklist will be compatible with browser ad blockers, and it will be easier to share rules with a browser filter list. +- **Extensibility.** For the last decade, Adblock-style syntax has greatly evolved, and I don't see why we can't extend it even more, and provide additional features for network-wide blockers. + +## Rules examples + +- `||example.org^` - block access to the `example.org` domain and all its subdomains +- `@@||example.org^` - unblock access to the example.org domain and all it's subdomains +- `0.0.0.0 example.org` - (attention, old-style /etc/hosts syntax) block `example.org` domain (but NOT it's subdomains) +- `! Here goes a comment` - just a comment +- `# Also a comment` - just a comment +- `/REGEX/` - block access to the domains matching the specified regular expression + +## /etc/hosts syntax + +For each host a single line should be present with the following information: + +``` +IP_address canonical_hostname [aliases...] +``` + +Fields of the entry are separated by any number of blanks and/or tab characters. + +Text from a `#` character until the end of the line is a comment and is ignored. + +Example: + +``` +# This is a comment +``` + +Hostnames may contain only alphanumeric characters, minus signs (`-`), and periods (`.`). They must begin with an alphabetic character and end with an alphanumeric character. Optional aliases provide for name changes, alternate spellings, shorter hostnames, or generic hostnames (for example, `localhost`). + +Examples: + +``` +127.0.0.1 example.org foo +127.0.0.1 example.com +``` + +> Please note, that the `IP_address` value is ignored by most of the DNS filtering software. + +## Adblock-style syntax + +This is a subset of the [traditional Adblock-style](https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters) syntax which is used by browser ad blockers. + +``` + rule = ["@@"] pattern [ "$" modifiers ] +modifiers = [modifier0, modifier1[, ...[, modifierN]]] +``` + +- `pattern` — the hostname mask. Every hostname is matched against this mask. The pattern can also contain special characters, which are discussed below. +- `@@` — a marker that is used in "exception" rules. Start your rule with this marker if you want to turn off filtering for the matching hostnames. +- `modifiers` — parameters that clarify the rule. They may limit the scope of the rule or even completely change the way it works. + +### Special characters + +- `*` — wildcard character. It is used to represent "any set of characters". This can also be an empty string or a string of any length. +- `||` — matching the beginning of a hostname (and any subdomain). For instance, `||example.org` matches `example.org` and `test.example.org`, but not `testexample.org`. +- `^` — separator character mark. Unlike browser ad blocking, there's nothing to "separate" in a hostname, so the only purpose of this character is to mark the end of the hostname. +- `|` — a pointer to the beginning or the end of the hostname. The value depends on the character placement in the mask. For example, a rule ample.org| corresponds to `example.org` , but not to `example.org.com`. `|example` corresponds to `example.org`, but not to `test.example`. + +### Regular expressions support + +If you want even more flexibility in making rules, you can use [Regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions) instead of the default simplified matching syntax. + +If you want to use a regular expression, the pattern has to look like this: + +``` +pattern = "/" regexp "/" +``` + +**Examples:** + +- `/example.*/` will block hosts which names match the `example.*` regex. +- `@@/example.*/$important` will unblock hosts which names match the `example.*` regex. Note that this rule also has the `$important` modifier. + +### Rule modifiers + +You can change the behavior of a rule by using additional modifiers. Modifiers must be located at the end of the rule after the `$` character, and be separated by commas. + +Example: + +``` +||example.org^$important +``` + +- `||example.org^` - is a matching pattern +- `$` - is a delimiter, it signals that now modifiers start +- `important` - is a modifier + +> **IMPORTANT:** If a rule contains a modifier not listed in this document, the whole rule **must be ignored**. This way we will avoid false-positives when people are trying to use unmodified browser ad blockers' filter lists like EasyList or EasyPrivacy. + +#### `important` + +The `$important` modifier applied to a rule increases its priority over any other rule without \$important modifier. Even over basic exception rules. + +**Example 1:** + +``` +||example.org^$important +@@||example.org^ +``` + +`||example.org^$important` will block all requests despite the exception rule. + +**Example 2:** + +``` +||example.org^$important +@@||example.org^$important +``` + +Now the exception rule also has the `$important` modifier so it will prevail. + +#### `badfilter` + +The rules with the `$badfilter` modifier disable other basic rules to which they refer. It means that the text of the disabled rule should match the text of the `$badfilter` rule (without the `badfilter` modifier). + +**Examples:** + +- `||example.com$badfilter` disables `||example.com` +- `@@||example.org^$badfilter` disables ``@@||example.org^` diff --git a/OpenRC.md b/OpenRC.md index b93ce5f..c932b8e 100644 --- a/OpenRC.md +++ b/OpenRC.md @@ -1,4 +1,4 @@ -# OpenRC service-script +# AdGuard Home - OpenRC service-script A service-script for OpenRC-based systems, for example if you run AdGuard Home in Alpine (without using Docker). diff --git a/__Sidebar.md b/__Sidebar.md index 5b42834..56649c1 100644 --- a/__Sidebar.md +++ b/__Sidebar.md @@ -8,3 +8,4 @@ * [How to install and run AdGuard Home on Raspberry Pi](Raspberry-Pi) * [How to install and run AdGuard Home on a Virtual Private Server](VPS) * [OpenRC service-script](OpenRC) +* [How to write hosts blocklists](Hosts-Blocklists) \ No newline at end of file From 1eceafcd4205440de4be43c8783b825b49b1f3ed Mon Sep 17 00:00:00 2001 From: Andrey Meshkov Date: Mon, 14 Oct 2019 17:01:56 +0300 Subject: [PATCH 2/4] =?UTF-8?q?docs:=20=E2=9C=8F=EF=B8=8F=20correct=20mist?= =?UTF-8?q?akes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Hosts-Blocklists.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/Hosts-Blocklists.md b/Hosts-Blocklists.md index 65ee5be..bae07f1 100644 --- a/Hosts-Blocklists.md +++ b/Hosts-Blocklists.md @@ -9,7 +9,7 @@ If you are creating a blocklist for AdGuard Home, we recommend using the [Adbloc - **Blocklists size.** Using pattern-matching allows you to have a single rule instead of hundreds of `/etc/hosts` entries. - **Compatibility.** Your blocklist will be compatible with browser ad blockers, and it will be easier to share rules with a browser filter list. -- **Extensibility.** For the last decade, Adblock-style syntax has greatly evolved, and I don't see why we can't extend it even more, and provide additional features for network-wide blockers. +- **Extensibility.** For the last decade, Adblock-style syntax has greatly evolved, and we don't see why we can't extend it even more, and provide additional features for network-wide blockers. ## Rules examples @@ -58,7 +58,7 @@ This is a subset of the [traditional Adblock-style](https://kb.adguard.com/en/ge modifiers = [modifier0, modifier1[, ...[, modifierN]]] ``` -- `pattern` — the hostname mask. Every hostname is matched against this mask. The pattern can also contain special characters, which are discussed below. +- `pattern` — the hostname mask. Every hostname is matched against this mask. The pattern can also contain special characters, which are described below. - `@@` — a marker that is used in "exception" rules. Start your rule with this marker if you want to turn off filtering for the matching hostnames. - `modifiers` — parameters that clarify the rule. They may limit the scope of the rule or even completely change the way it works. @@ -67,7 +67,7 @@ modifiers = [modifier0, modifier1[, ...[, modifierN]]] - `*` — wildcard character. It is used to represent "any set of characters". This can also be an empty string or a string of any length. - `||` — matching the beginning of a hostname (and any subdomain). For instance, `||example.org` matches `example.org` and `test.example.org`, but not `testexample.org`. - `^` — separator character mark. Unlike browser ad blocking, there's nothing to "separate" in a hostname, so the only purpose of this character is to mark the end of the hostname. -- `|` — a pointer to the beginning or the end of the hostname. The value depends on the character placement in the mask. For example, a rule ample.org| corresponds to `example.org` , but not to `example.org.com`. `|example` corresponds to `example.org`, but not to `test.example`. +- `|` — a pointer to the beginning or the end of the hostname. The value depends on the character placement in the mask. For example, a rule `ample.org|` corresponds to `example.org` , but not to `example.org.com`. `|example` corresponds to `example.org`, but not to `test.example`. ### Regular expressions support @@ -79,10 +79,18 @@ If you want to use a regular expression, the pattern has to look like this: pattern = "/" regexp "/" ``` +### Comments + +Any line that starts with an exclamation mark is a comment and it will be ignored by the filtering engine. Comments are usually placed above rules and used to describe what a rule does. + +``` +! This is a comment +``` + **Examples:** -- `/example.*/` will block hosts which names match the `example.*` regex. -- `@@/example.*/$important` will unblock hosts which names match the `example.*` regex. Note that this rule also has the `$important` modifier. +- `/example.*/` will block hosts matching the `example.*` regex. +- `@@/example.*/$important` will unblock hosts matching the `example.*` regex. Note that this rule also has the `$important` modifier. ### Rule modifiers @@ -129,4 +137,4 @@ The rules with the `$badfilter` modifier disable other basic rules to which they **Examples:** - `||example.com$badfilter` disables `||example.com` -- `@@||example.org^$badfilter` disables ``@@||example.org^` +- `@@||example.org^$badfilter` disables `@@||example.org^` From e58343c2d6eb118f9d26055c75637d8df090fe6d Mon Sep 17 00:00:00 2001 From: Andrey Meshkov Date: Tue, 15 Oct 2019 12:06:11 +0300 Subject: [PATCH 3/4] =?UTF-8?q?fix:=20=F0=9F=90=9B=20correct=20mistakes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Hosts-Blocklists.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/Hosts-Blocklists.md b/Hosts-Blocklists.md index bae07f1..2a5e286 100644 --- a/Hosts-Blocklists.md +++ b/Hosts-Blocklists.md @@ -9,13 +9,13 @@ If you are creating a blocklist for AdGuard Home, we recommend using the [Adbloc - **Blocklists size.** Using pattern-matching allows you to have a single rule instead of hundreds of `/etc/hosts` entries. - **Compatibility.** Your blocklist will be compatible with browser ad blockers, and it will be easier to share rules with a browser filter list. -- **Extensibility.** For the last decade, Adblock-style syntax has greatly evolved, and we don't see why we can't extend it even more, and provide additional features for network-wide blockers. +- **Extensibility.** For the last decade the Adblock-style syntax has greatly evolved, and we don't see why we can't extend it even more, and provide additional features for network-wide blockers. ## Rules examples - `||example.org^` - block access to the `example.org` domain and all its subdomains -- `@@||example.org^` - unblock access to the example.org domain and all it's subdomains -- `0.0.0.0 example.org` - (attention, old-style /etc/hosts syntax) block `example.org` domain (but NOT it's subdomains) +- `@@||example.org^` - unblock access to the `example.org` domain and all its subdomains +- `0.0.0.0 example.org` - (attention, old-style /etc/hosts syntax) block `example.org` domain (but NOT its subdomains) - `! Here goes a comment` - just a comment - `# Also a comment` - just a comment - `/REGEX/` - block access to the domains matching the specified regular expression @@ -30,7 +30,7 @@ IP_address canonical_hostname [aliases...] Fields of the entry are separated by any number of blanks and/or tab characters. -Text from a `#` character until the end of the line is a comment and is ignored. +Text from the `#` character until the end of the line is a comment and is ignored. Example: From 78c0ec45425d6cfa5978ed1a6af32dd03c2d6653 Mon Sep 17 00:00:00 2001 From: Andrey Meshkov Date: Tue, 15 Oct 2019 14:57:30 +0300 Subject: [PATCH 4/4] =?UTF-8?q?docs:=20=E2=9C=8F=EF=B8=8F=20correct=20mist?= =?UTF-8?q?akes=202?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- Hosts-Blocklists.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/Hosts-Blocklists.md b/Hosts-Blocklists.md index 2a5e286..8ca47e2 100644 --- a/Hosts-Blocklists.md +++ b/Hosts-Blocklists.md @@ -59,7 +59,7 @@ modifiers = [modifier0, modifier1[, ...[, modifierN]]] ``` - `pattern` — the hostname mask. Every hostname is matched against this mask. The pattern can also contain special characters, which are described below. -- `@@` — a marker that is used in "exception" rules. Start your rule with this marker if you want to turn off filtering for the matching hostnames. +- `@@` — a marker that is used in the "exception" rules. Start your rule with this marker if you want to turn off filtering for the matching hostnames. - `modifiers` — parameters that clarify the rule. They may limit the scope of the rule or even completely change the way it works. ### Special characters @@ -67,7 +67,7 @@ modifiers = [modifier0, modifier1[, ...[, modifierN]]] - `*` — wildcard character. It is used to represent "any set of characters". This can also be an empty string or a string of any length. - `||` — matching the beginning of a hostname (and any subdomain). For instance, `||example.org` matches `example.org` and `test.example.org`, but not `testexample.org`. - `^` — separator character mark. Unlike browser ad blocking, there's nothing to "separate" in a hostname, so the only purpose of this character is to mark the end of the hostname. -- `|` — a pointer to the beginning or the end of the hostname. The value depends on the character placement in the mask. For example, a rule `ample.org|` corresponds to `example.org` , but not to `example.org.com`. `|example` corresponds to `example.org`, but not to `test.example`. +- `|` — a pointer to the beginning or the end of the hostname. The value depends on the character placement in the mask. For example, the rule `ample.org|` corresponds to `example.org`, but not to `example.org.com`. `|example` corresponds to `example.org`, but not to `test.example`. ### Regular expressions support @@ -94,7 +94,7 @@ Any line that starts with an exclamation mark is a comment and it will be ignore ### Rule modifiers -You can change the behavior of a rule by using additional modifiers. Modifiers must be located at the end of the rule after the `$` character, and be separated by commas. +You can change the behavior of a rule by using additional modifiers. Modifiers must be located at the end of the rule after the `$` character and be separated by commas. Example: @@ -102,9 +102,9 @@ Example: ||example.org^$important ``` -- `||example.org^` - is a matching pattern -- `$` - is a delimiter, it signals that now modifiers start -- `important` - is a modifier +- `||example.org^` - a matching pattern +- `$` - a delimiter, it signals that now modifiers start +- `important` - a modifier > **IMPORTANT:** If a rule contains a modifier not listed in this document, the whole rule **must be ignored**. This way we will avoid false-positives when people are trying to use unmodified browser ad blockers' filter lists like EasyList or EasyPrivacy.