1
0

Erstelle CLAUDE.md für RSS-Bridge-Doku

This commit is contained in:
Akamaru
2025-11-09 20:55:17 +01:00
parent d8751fb514
commit 3d0d069309
6 changed files with 970 additions and 0 deletions

View File

@@ -0,0 +1,36 @@
# How to create a completely new bridge
New code files MUST have `declare(strict_types=1);` at the top of file:
```php
<?php
declare(strict_types=1);
```
Create the new bridge in e.g. `bridges/BearBlogBridge.php`:
```php
<?php
declare(strict_types=1);
class BearBlogBridge extends BridgeAbstract
{
const NAME = 'BearBlog (bearblog.dev)';
public function collectData()
{
$dom = getSimpleHTMLDOM('https://herman.bearblog.dev/blog/');
foreach ($dom->find('.blog-posts li') as $li) {
$a = $li->find('a', 0);
$this->items[] = [
'title' => $a->plaintext,
'uri' => 'https://herman.bearblog.dev' . $a->href,
];
}
}
}
```
Learn more in [bridge api](https://rss-bridge.github.io/rss-bridge/Bridge_API/index.html).

View File

@@ -0,0 +1,567 @@
`BridgeAbstract` is a base class for standard bridges.
It implements the most common functions to simplify the process of adding new bridges.
***
# Creating a new bridge
You need four basic steps in order to create a new bridge:
[**Step 1**](#step-1---create-a-new-file) - Create a new file
[**Step 2**](#step-2---add-a-class-extending-bridgeabstract) - Add a class, extending `BridgeAbstract`
[**Step 3**](#step-3---add-general-constants-to-the-class) - Add general constants to the class
[**Step 4**](#step-4---implement-a-function-to-collect-feed-data) - Implement a function to collect feed data
These steps are described in more detail below.
At the end of this document you'll find a complete [template](#template) based on these instructions.
The pictures below show an example based on these instructions:
<details><summary>Show pictures</summary><div>
![example card](../images/screenshot_bridgeabstract_example_card.png)
***
![example atom](../images/screenshot_bridgeabstract_example_atom.png)
</div></details><br>
Make sure to read these instructions carefully.
Please don't hesitate to open an
[Issue](https://github.com/RSS-Bridge/rss-bridge/issues)
if you have further questions (or suggestions).
Once your bridge is finished, please open a [Pull Request](https://github.com/RSS-Bridge/rss-bridge/pulls),
in order to get your bridge merge into RSS-Bridge.
***
## Step 1 - Create a new file
Please read [these instructions](./01_How_to_create_a_new_bridge.md) on how to create a new file for RSS-Bridge.
## Step 2 - Add a class, extending `BridgeAbstract`
Your bridge needs to be a class, which extends `BridgeAbstract`.
The class name must **exactly** match the name of the file, without the file extension.
For example: `MyBridge.php` => `MyBridge`
<details><summary>Show example</summary><div>
```PHP
<?PHP
class MyBridge extends BridgeAbstract
{
}
```
</div></details>
## Step 3 - Add general constants to the class
In order to present your bridge on the front page, RSS-Bridge requires a few constants:
```PHP
const NAME // Name of the Bridge (default: "Unnamed Bridge")
const URI // URI to the target website of the bridge (default: empty)
const DESCRIPTION // A brief description of the Bridge (default: "No description provided")
const MAINTAINER // Name of the maintainer, i.e. your name on GitHub (default: "No maintainer")
const PARAMETERS // (optional) Definition of additional parameters (default: empty)
const CACHE_TIMEOUT // (optional) Defines the maximum duration for the cache in seconds (default: 3600)
```
<details><summary>Show example</summary><div>
```PHP
<?php
class MyBridge extends BridgeAbstract
{
const NAME = 'My Bridge';
const URI = 'https://rss-bridge.github.io/rss-bridge/Bridge_API/BridgeAbstract.html';
const DESCRIPTION = 'Returns "Hello World!"';
const MAINTAINER = 'ghost';
}
```
</div></details><br>
**Notice**: `const PARAMETERS` can be used to request information from the user.
Refer to [these instructions](#parameters) for more information.
## Step 4 - Implement a function to collect feed data
In order for RSS-Bridge to collect data, you must implement the **public** function `collectData`.
This function takes no arguments and returns nothing.
It generates a list of feed elements, which must be placed into the variable `$this->items`.
<details><summary>Show example</summary><div>
```PHP
<?php
class MyBridge extends BridgeAbstract
{
const NAME = 'My Bridge';
const URI = 'https://rss-bridge.github.io/rss-bridge/Bridge_API/BridgeAbstract.html';
const DESCRIPTION = 'Returns "Hello World!"';
const MAINTAINER = 'ghost';
public function collectData()
{
$item = [];
$item['title'] = 'Hello World!';
$this->items[] = $item;
}
}
```
</div></details><br>
For more details on the `collectData` function refer to [these instructions](#collectdata).
***
# Template
Use this template to create your own bridge.
Please remove any unnecessary comments and parameters.
```php
<?php
class MyBridge extends BridgeAbstract
{
const NAME = 'Unnamed bridge';
const URI = '';
const DESCRIPTION = 'No description provided';
const MAINTAINER = 'No maintainer';
const PARAMETERS = []; // Can be omitted!
const CACHE_TIMEOUT = 3600; // Can be omitted!
public function collectData()
{
$item = []; // Create an empty item
$item['title'] = 'Hello World!';
$this->items[] = $item; // Add item to the list
}
}
```
# PARAMETERS
You can specify additional parameters in order to customize the bridge (i.e. to specify how many items to return).
This document explains how to specify those parameters and which options are available to you.
For information on how to read parameter values during execution, please refer to the [getInput](../06_Helper_functions/index.md#getinput) function.
***
## Adding parameters to a bridge
Parameters are specified as part of the bridge class.
An empty list of parameters is defined as `const PARAMETERS = [];`
<details><summary>Show example</summary><div>
```PHP
<?php
class MyBridge extends BridgeAbstract {
/* ... */
const PARAMETERS = []; // Empty list of parameters (can be omitted)
/* ... */
}
```
</div></details><br>
Parameters are organized in two levels:
[**Level 1**](##level-1---context) - Context
[**Level 2**](##level-2---parameter) - Parameter
## Level 1 - Context
A context is defined as a associative array of parameters.
The name of a context is displayed by RSS-Bridge.
<details><summary>Show example</summary><div>
```PHP
const PARAMETERS = [
'My Context 1' => [],
'My Context 2' => [],
];
```
**Output**
![bridge context named](../images/bridge_context_named.png)
</div></details><br>
_Notice_: The name of a context can be left empty if only one context is needed!
<details><summary>Show example</summary><div>
```PHP
const PARAMETERS = [
[]
];
```
</div></details><br>
You can also define a set of parameters that will be applied to every possible context of your bridge.
To do this, specify a context named `global`.
<details><summary>Show example</summary><div>
```PHP
const PARAMETERS = [
'global' => [] // Applies to all contexts!
];
```
</div></details>
## Level 2 - Parameter
Parameters are placed inside a context.
They are defined as associative array of parameter specifications.
Each parameter is defined by it's internal input name, a definition in the form `'n' => [];`,
where `n` is the name with which the bridge can access the parameter during execution.
<details><summary>Show example</summary><div>
```PHP
const PARAMETERS = [
'My Context' => [
'n' => [],
]
];
```
</div></details><br>
The parameter specification consists of various fields, listed in the table below.
<details><summary>Show example</summary><div>
```PHP
const PARAMETERS = [
'My Context' => [
'n' => [
'name' => 'Limit',
'type' => 'number',
'required' => false,
'title' => 'Maximum number of items to return',
'defaultValue' => 10,
]
]
];
```
**Output**
![context parameter](../images/context_parameter_example.png)
</div></details>
***
Parameter Name | Required | Type | Supported values | Description
---------------|----------|------|------------------| -----------
`name` | **yes** | Text | | Input name as displayed to the user
`type` | no | Text | `text`, `number`, `list`, `checkbox` | Type of the input (default: `text`)
`required` | no | Boolean | `true`, `false` | Specifies if the parameter is required or not (default: `false`). Not supported for lists and checkboxes.
[`values`](#list-values) | no | associative array | | name/value pairs used by the HTML option tag, required for type '`list`'
`title` | no | Text | | Used as tool-tip when mouse-hovering over the input box
`pattern` | no | Text | | Defines a pattern for an element of type `text`. The pattern should be mentioned in the `title` attribute!
`exampleValue` | no | Text | | Defines an example value displayed for elements of type `text` and `number` when no data has been entered yet
[`defaultValue`](#defaultvalue) | no | | | Defines the default value if left blank by the user
#### List values
List values are defined in an associative array where keys are the string displayed in the combo list of the **RSS-Bridge** web interface, and values are the content of the \<option\> HTML tag value attribute.
```PHP
...
'type' => 'list',
'values' => [
'Item A' => 'itemA'
'Item B' => 'itemB'
]
...
```
If a more complex organization is required to display the values, the above key/value can be used to set a title as a key and another array as a value:
```PHP
...
'type' => 'list',
'values' => [
'Item A' => 'itemA',
'List 1' => [
'Item C' => 'itemC',
'Item D' => 'itemD'
],
'List 2' => [
'Item E' => 'itemE',
'Item F' => 'itemF'
],
'Item B' => 'itemB'
]
...
```
#### defaultValue
This attribute defines the default value for your parameter. Its behavior depends on the `type`:
- `text`: Allows any text
- `number`: Allows any number
- `list`: Must match either name or value of one element
- `checkbox`: Must be "checked" to activate the checkbox
***
# queriedContext
The queried context is defined via `PARAMETERS` and can be accessed via `$this->queriedContext`.
It provides a way to identify which context the bridge is called with.
Example:
```PHP
const PARAMETERS = [
'By user name' => [
'u' => ['name' => 'Username']
],
'By user ID' => [
'id' => ['name' => 'User ID']
]
];
```
In this example `$this->queriedContext` will either return **By user name** or **By user ID**.
The queried context might return no value, so the best way to handle it is by using a case-structure:
```PHP
switch($this->queriedContext){
case 'By user name':
break;
case 'By user ID':
break;
default: // Return default value
}
```
# collectData
The `collectData` function is responsible for collecting data and adding items to generate feeds from.
If you are unsure how to solve a specific problem, please don't hesitate to open an [Issue](https://github.com/RSS-Bridge/rss-bridge/issues) on GitHub.
Existing bridges are also a good source to learn implementing your own bridge.
## Implementing the `collectData` function
Implementation for the `collectData` function is specific to each bridge.
However, there are certain reoccurring elements, described below. RSS-Bridge also provides functions to simplify the process of collecting and parsing HTML data (see "Helper Functions" on the sidebar)
Elements collected by this function must be stored in `$this->items`.
The `items` variable is an array of item elements, each of which is an associative array that may contain arbitrary keys.
RSS-Bridge specifies common keys which are used to generate most common feed formats.
<details><summary>Show example</summary><div>
```PHP
$item = [];
$item['title'] = 'Hello World!';
$this->items[] = $item;
```
</div></details><br>
Additional keys may be added for custom APIs (ignored by RSS-Bridge).
## Item parameters
The item array should provide as much information as possible for RSS-Bridge to generate feature rich feeds.
Find below list of keys supported by RSS-Bridge.
```PHP
$item['uri'] // URI to reach the subject ("https://...")
$item['title'] // Title of the item
$item['timestamp'] // Timestamp of the item in numeric or text format (compatible for strtotime())
$item['author'] // Name of the author for this item
$item['content'] // Content in HTML format
$item['enclosures'] // Array of URIs to an attachments (pictures, files, etc...)
$item['categories'] // Array of categories / tags / topics
$item['uid'] // A unique ID to identify the current item
```
All formats support these parameters. The formats `Plaintext` and `JSON` also support custom parameters.
# getDescription
The `getDescription` function returns the description for a bridge.
**Notice:** By default **RSS-Bridge** returns the contents of `const DESCRIPTION`,
so you only have to implement this function if you require different behavior!
```PHP
public function getDescription()
{
return self::DESCRIPTION;
}
```
# getMaintainer
The `getMaintainer` function returns the name of the maintainer for a bridge.
**Notice:** By default **RSS-Bridge** returns `const MAINTAINER`,
so you only have to implement this function if you require different behavior!
```PHP
public function getMaintainer()
{
return self::MAINTAINER;
}
```
# getName
The `getName` function returns the name of a bridge.
**Notice:** By default **RSS-Bridge** returns `const NAME`,
so you only have to implement this function if you require different behavior!
```PHP
public function getName()
{
return self::NAME;
}
```
# getURI
The `getURI` function returns the base URI for a bridge.
**Notice:** By default **RSS-Bridge** returns `const URI`,
so you only have to implement this function if you require different behavior!
```PHP
public function getURI()
{
return self::URI;
}
```
# getIcon
The `getIcon` function returns the URI for an icon, used as favicon in feeds.
If no icon is specified by the bridge,
RSS-Bridge will use a default location: `static::URI . '/favicon.ico'` (i.e. "https://github.com/favicon.ico") which may or may not exist.
```PHP
public function getIcon()
{
return static::URI . '/favicon.ico';
}
```
# detectParameters
The `detectParameters` function takes a URL and attempts to extract a valid set of parameters for the current bridge.
If the passed URL is valid for this bridge, the function should return an array of parameter -> value pairs that can be used by this bridge, including context if available, or an empty array if the bridge requires no parameters. If the URL is not relevant for this bridge, the function should return `null`.
**Notice:** Implementing this function is optional. By default, **RSS-Bridge** tries to match the supplied URL to the `URI` constant defined in the bridge, which may be enough for bridges without any parameters defined.
```PHP
public function detectParameters($url)
{
$regex = '/^(https?:\/\/)?(www\.)?(.+?)(\/)?$/';
if (empty(static::PARAMETERS)
&& preg_match($regex, $url, $urlMatches) > 0
&& preg_match($regex, static::URI, $bridgeUriMatches) > 0
&& $urlMatches[3] === $bridgeUriMatches[3]
) {
return [];
} else {
return null;
}
}
```
**Notice:** This function is also used by the [findFeed](../04_For_Developers/04_Actions.md#findfeed) action.
This action allows an user to get a list of all feeds corresponding to an URL.
You can implement automated tests for the `detectParameters` function by adding the `TEST_DETECT_PARAMETERS` constant to your bridge class constant.
`TEST_DETECT_PARAMETERS` is an array, with as key the URL passed to the `detectParameters`function and as value, the array of parameters returned by `detectParameters`
```PHP
const TEST_DETECT_PARAMETERS = [
'https://www.instagram.com/metaverse' => ['context' => 'Username', 'u' => 'metaverse'],
'https://instagram.com/metaverse' => ['context' => 'Username', 'u' => 'metaverse'],
'http://www.instagram.com/metaverse' => ['context' => 'Username', 'u' => 'metaverse'],
];
```
**Notice:** Adding this constant is optional. If the constant is not present, no automated test will be executed.
***
# Helper Methods
`BridgeAbstract` implements helper methods to make it easier for bridge maintainers to create bridges.
Use these methods whenever possible instead of writing your own.
## saveCacheValue
Within the context of the current bridge, stores a value by key in the cache.
The value can later be retrieved with [loadCacheValue](#loadcachevalue).
```php
protected function saveCacheValue($key, $value, $ttl = null)
```
Example:
```php
public function collectData()
{
$this->saveCacheValue('my_key', 'my_value', 3600); // 1h
}
```
## loadCacheValue
Within the context of the current bridge, loads a value by key from cache.
Optionally specifies the cache duration for the key.
Returns `null` if the key doesn't exist or the value is expired.
```php
protected function loadCacheValue($key, $default = null)
```
Example:
```php
public function collectData()
{
$value = $this->loadCacheValue('my_key');
if (! $value) {
$this->saveCacheValue('my_key', 'foobar');
}
}
```

View File

@@ -0,0 +1,55 @@
**Usage example**: _You have discovered a site that provides feeds which are hidden and inaccessible by normal means. You want your bridge to directly read the feeds and provide them via **RSS-Bridge**_
Find a [template](#template) at the end of this file.
**Notice:** For a standard feed only `collectData` need to be implemented. `collectData` should call `$this->collectExpandableDatas('your URI here');` to automatically load feed items and header data (will subsequently call `parseItem` for each item in the feed). You can limit the number of items to fetch by specifying an additional parameter for: `$this->collectExpandableDatas('your URI here', 10)` (limited to 10 items).
## The `parseItem` method
This method receives one item from the current feed and should return one **RSS-Bridge** item.
The default function does all the work to get the item data from the feed, whether it is RSS 1.0,
RSS 2.0 or Atom 1.0.
**Notice:** The following code sample is just an example. Implementation depends on your requirements!
```PHP
protected function parseItem(array $item)
{
$item['content'] = str_replace('rssbridge','RSS-Bridge',$item['content']);
return $item;
}
```
### Feed parsing
How rss-bridge processes xml feeds:
Function | uri | title | timestamp | author | content
---------|-----|-------|-----------|--------|--------
`atom` | id | title | updated | author | content
`rss 0.91` | link | title | | | description
`rss 1.0` | link | title | dc:date | dc:creator | description
`rss 2.0` | link, guid | title | pubDate, dc:date | author, dc:creator | description
# Template
This is the template for a new bridge:
```PHP
<?php
class MySiteBridge extends FeedExpander
{
const MAINTAINER = 'No maintainer';
const NAME = 'Unnamed bridge';
const URI = '';
const DESCRIPTION = 'No description provided';
const PARAMETERS = [];
const CACHE_TIMEOUT = 3600;
public function collectData()
{
$this->collectExpandableDatas('your feed URI');
}
}
```

View File

@@ -0,0 +1,83 @@
`WebDriverAbstract` extends [`BridgeAbstract`](./02_BridgeAbstract.md) and adds functionality for generating feeds
from active websites that use XMLHttpRequest (XHR) to load content and / or JavaScript to
modify content.
It highly depends on the php-webdriver library which offers Selenium WebDriver bindings for PHP.
- https://github.com/php-webdriver/php-webdriver (Project Repository)
- https://php-webdriver.github.io/php-webdriver/latest/ (API)
Please note that this class is intended as a solution for websites _that cannot be covered
by the other classes_. The WebDriver starts a browser and is therefore very resource-intensive.
# Configuration
You need a running WebDriver to use bridges that depend on `WebDriverAbstract`.
The easiest way is to start the Selenium server from the project of the same name:
```
docker run -d -p 4444:4444 --shm-size="2g" docker.io/selenium/standalone-chrome:latest
```
- https://github.com/SeleniumHQ/docker-selenium
With these parameters only one browser window can be started at a time.
On a multi-user site, Selenium Grid should be used
and the number of sessions should be adjusted to the number of processor cores.
Finally, the `config.ini.php` file must be adjusted so that the WebDriver
can find the Selenium server:
```
[webdriver]
selenium_server_url = "http://localhost:4444"
```
# Development
While you are programming a new bridge, it is easier to start a local WebDriver because then you can see what is happening and where the errors are. I've also had good experience recording the process with a screen video to find any timing problems.
```
chromedriver --port=4444
```
- https://chromedriver.chromium.org/
If you start rss-bridge from a container, then Chrome driver is only accessible
if you call it with the `--allowed-ips` option so that it binds to all network interfaces.
```
chromedriver --port=4444 --allowed-ips=192.168.1.42
```
The **most important rule** is that after an event such as loading the web page
or pressing a button, you often have to explicitly wait for the desired elements to appear.
A simple example is the bridge `ScalableCapitalBlogBridge.php`.
A more complex and relatively complete example is the bridge `GULPProjekteBridge.php`.
# Template
Use this template to create your own bridge.
```PHP
<?php
class MyBridge extends WebDriverAbstract
{
const NAME = 'My Bridge';
const URI = 'https://www.example.org';
const DESCRIPTION = 'Further description';
const MAINTAINER = 'your name';
public function collectData()
{
parent::collectData();
try {
// TODO
} finally {
$this->cleanUp();
}
}
}
```

View File

@@ -0,0 +1,160 @@
`XPathAbstract` extends [`BridgeAbstract`](./02_BridgeAbstract.md) and adds functionality for generating feeds based on _XPath expressions_. It makes creation of new bridges easy and if you're familiar with XPath expressions this class is probably the right point for you to start with.
At the end of this document you'll find a complete [template](#template) based on these instructions.
***
# Required constants
To create a new Bridge based on `XPathAbstract` your inheriting class should specify a set of constants describing the feed and the XPath expressions.
It is advised to override constants inherited from [`BridgeAbstract`](./02_BridgeAbstract.md#step-3---add-general-constants-to-the-class) aswell.
## Class constant `FEED_SOURCE_URL`
Source Web page URL (should provide either HTML or XML content). You can specify any website URL which serves data suited for display in RSS feeds
## Class constant `XPATH_EXPRESSION_FEED_TITLE`
XPath expression for extracting the feed title from the source page. If this is left blank or does not provide any data `BridgeAbstract::getName()` is used instead as the feed's title.
## Class constant `XPATH_EXPRESSION_FEED_ICON`
XPath expression for extracting the feed favicon URL from the source page. If this is left blank or does not provide any data `BridgeAbstract::getIcon()` is used instead as the feed's favicon URL.
## Class constant `XPATH_EXPRESSION_ITEM`
XPath expression for extracting the feed items from the source page. Enter an XPath expression matching a list of dom nodes, each node containing one feed article item in total (usually a surrounding `<div>` or `<span>` tag). This will be the context nodes for all of the following expressions. This expression usually starts with a single forward slash.
## Class constant `XPATH_EXPRESSION_ITEM_TITLE`
XPath expression for extracting an item title from the item context. This expression should match a node contained within each article item node containing the article headline. It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node.
## Class constant `XPATH_EXPRESSION_ITEM_CONTENT`
XPath expression for extracting an item's content from the item context. This expression should match a node contained within each article item node containing the article content or description. It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node.
## Class constant `XPATH_EXPRESSION_ITEM_URI`
XPath expression for extracting an item link from the item context. This expression should match a node's attribute containing the article URL (usually the href attribute of an `<a>` tag). It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node. Attributes can be selected by prepending an `@` char before the attributes name.
## Class constant `XPATH_EXPRESSION_ITEM_AUTHOR`
XPath expression for extracting an item author from the item context. This expression should match a node contained within each article item node containing the article author's name. It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node.
## Class constant `XPATH_EXPRESSION_ITEM_TIMESTAMP`
XPath expression for extracting an item timestamp from the item context. This expression should match a node or node's attribute containing the article timestamp or date (parsable by PHP's strtotime function). It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node. Attributes can be selected by prepending an `@` char before the attributes name.
## Class constant `XPATH_EXPRESSION_ITEM_ENCLOSURES`
XPath expression for extracting item enclosures (media content like images or movies) from the item context. This expression should match a node's attribute containing an article image URL (usually the src attribute of an <img> tag or a style attribute). It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node. Attributes can be selected by prepending an `@` char before the attributes name.
## Class constant `XPATH_EXPRESSION_ITEM_CATEGORIES`
XPath expression for extracting an item category from the item context. This expression should match a node or node's attribute contained within each article item node containing the article category. This could be inside <div> or <span> tags or sometimes be hidden in a data attribute. It should start with a dot followed by two forward slashes, referring to any descendant nodes of the article item node. Attributes can be selected by prepending an `@` char before the attributes name.
## Class constant `SETTING_FIX_ENCODING`
Turns on automatic fixing of encoding errors. Set this to true for fixing feed encoding by invoking PHP's `utf8_decode` function on all extracted texts. Try this in case you see "broken" or "weird" characters in your feed where you'd normally expect umlauts or any other non-ascii characters.
# Optional methods
`XPathAbstract` offers a set of methods which can be overridden by derived classes for fine tuning and customization. This is optional. The methods provided for overriding can be grouped into three categories.
## Methods for providing XPath expressions
Usually XPath expressions are defined in the class constants described above. By default the following base methods just return the value of its corresponding class constant. However deriving classed can override them in case if XPath expressions need to be formed dynamically or based on conditions. In case any of these methods is defined, the method's return value is used instead of the corresponding constant for providing the value.
### Method `getSourceUrl()`
Should return the source Web page URL used as a base for applying the XPath expressions.
### Method `getExpressionTitle()`
Should return the XPath expression for extracting the feed title from the source page.
### Method `getExpressionIcon()`
Should return the XPath expression for extracting the feed favicon from the source page.
### Method `getExpressionItem()`
Should return the XPath expression for extracting the feed items from the source page.
### Method `getExpressionItemTitle()`
Should return the XPath expression for extracting an item title from the item context.
### Method `getExpressionItemContent()`
Should return the XPath expression for extracting an item's content from the item context.
### Method `getSettingUseRawItemContent()`
Should return the 'Use raw item content' setting value (bool true or false).
### Method `getExpressionItemUri()`
Should return the XPath expression for extracting an item link from the item context.
### Method `getExpressionItemAuthor()`
Should return the XPath expression for extracting an item author from the item context.
### Method `getExpressionItemTimestamp()`
Should return the XPath expression for extracting an item timestamp from the item context.
### Method `getExpressionItemEnclosures()`
Should return the XPath expression for extracting item enclosures (media content like images or movies) from the item context.
### Method `getExpressionItemCategories()`
Should return the XPath expression for extracting an item category from the item context.
### Method `getSettingFixEncoding()`
Should return the Fix encoding setting value (bool true or false).
## Methods for providing feed data
Those methods are invoked for providing the HTML source as a base for applying the XPath expressions as well as feed meta data as the title and icon.
### Method `provideWebsiteContent()`
This method should return the HTML source as a base for the XPath expressions. Usually it merely returns the HTML content of the URL specified in the constant `FEED_SOURCE_URL` retrieved by curl. Some sites however require user authentication mechanisms, the use of special cookies and/or headers, where the direct retrival using standard curl would not suffice. In that case this method should be overridden and take care of the page retrival.
### Method `provideFeedTitle()`
This method should provide the feed title. Usually the XPath expression defined in `XPATH_EXPRESSION_FEED_TITLE` is used for extracting the title directly from the page source.
### Method `provideFeedIcon()`
This method should provide the URL of the feed's favicon. Usually the XPath expression defined in `XPATH_EXPRESSION_FEED_ICON` is used for extracting the title directly from the page source.
### Method `provideFeedItems()`
This method should provide the feed items. Usually the XPath expression defined in `XPATH_EXPRESSION_ITEM` is used for extracting the items from the page source. All other XPath expressions are applied on a per-item basis, item by item, and only on the item's contents.
## Methods for formatting and filtering feed item attributes
The following methods are invoked after extraction of the feed items from the source. Each of them expect one parameter, the value of the corresponding field, which then can be processed and transformed by the method. You can override these methods in order to format or filter parts of the feed output.
### Method `formatItemTitle()`
Accepts the items title values as parameter, processes and returns it. Should return a string.
### Method `formatItemContent()`
Accepts the items content as parameter, processes and returns it. Should return a string.
### Method `formatItemUri()`
Accepts the items link URL as parameter, processes and returns it. Should return a string.
### Method `formatItemAuthor()`
Accepts the items author as parameter, processes and returns it. Should return a string.
### Method `formatItemTimestamp()`
Accepts the items creation timestamp as parameter, processes and returns it. Should return a unix timestamp as integer.
### Method `cleanMediaUrl()`
Method invoked for cleaning feed icon, item image and media attachment (like .mp3, .webp) URL's. Extracts the media URL from the passed parameter, stripping any additional content. Furthermore, makes sure that relative media URL's get transformed to absolute ones.
### Method `fixEncoding()`
Only invoked when class constant `SETTING_FIX_ENCODING` is set to true. It then passes all extracted string values through PHP's `utf8_decode` function.
### Method `generateItemId()`
This method plays in important role for generating feed item ids for all extracted items. Every feed item needs an unique identifier (Uid), so that your feed reader updates the original item instead of adding a duplicate in case an items content is updated on the source site. Usually the items link URL is a good candidate the the Uid.
***
# Template
Use this template to create your own bridge. Please remove any unnecessary comments and parameters.
```PHP
<?php
class TestBridge extends XPathAbstract {
const NAME = 'Test';
const URI = 'https://www.unbemerkt.eu/de/blog/';
const DESCRIPTION = 'Test';
const MAINTAINER = 'your name';
const CACHE_TIMEOUT = 3600;
const FEED_SOURCE_URL = 'https://www.unbemerkt.eu/de/blog/';
const XPATH_EXPRESSION_ITEM = '/html[1]/body[1]/section[1]/section[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[*]/article[1]';
const XPATH_EXPRESSION_ITEM_TITLE = './/a[@target="_self"]';
const XPATH_EXPRESSION_ITEM_CONTENT = './/div[@class="post-content"]';
const XPATH_EXPRESSION_ITEM_URI = './/a[@class="more-btn"]/@href';
const XPATH_EXPRESSION_ITEM_AUTHOR = '/html[1]/body[1]/section[1]/div[2]/div[1]/div[1]/h1[1]';
const XPATH_EXPRESSION_ITEM_TIMESTAMP = './/time/@datetime';
const XPATH_EXPRESSION_ITEM_ENCLOSURES = './/img/@data-src';
const SETTING_FIX_ENCODING = false;
}
```

69
CLAUDE.md Normal file
View File

@@ -0,0 +1,69 @@
# Bridge Documentation
This directory contains custom RSS-Bridge implementations. For detailed documentation on creating and working with bridges, refer to the comprehensive guides in the `Bridge_Docs/` folder.
## Creating a New Bridge File
**File:** [Bridge_Docs/01_How_to_create_a_new_bridge.md](Bridge_Docs/01_How_to_create_a_new_bridge.md)
Basic guide for creating a new bridge file with proper structure. Covers the mandatory `declare(strict_types=1);` declaration and provides a minimal working example using `getSimpleHTMLDOM()` to scrape content.
## BridgeAbstract - Standard Bridge Base Class
**File:** [Bridge_Docs/02_BridgeAbstract.md](Bridge_Docs/02_BridgeAbstract.md)
Complete reference for `BridgeAbstract`, the base class for standard bridges. This is your primary resource for understanding:
- Required constants (NAME, URI, DESCRIPTION, MAINTAINER)
- How to define PARAMETERS for user input (text, number, list, checkbox)
- The `collectData()` method and item structure (title, uri, content, timestamp, author, etc.)
- Context handling with `$this->queriedContext`
- Helper methods like `saveCacheValue()` and `loadCacheValue()`
- Optional methods to override (getName, getURI, getIcon, detectParameters)
**Use this when:** Creating any standard bridge that scrapes HTML content directly.
## FeedExpander - RSS/Atom Feed Extension
**File:** [Bridge_Docs/03_FeedExpander.md](Bridge_Docs/03_FeedExpander.md)
Specialized class for bridges that consume existing RSS/Atom feeds and enhance or modify them. Extends `BridgeAbstract` with feed parsing capabilities.
Key features:
- Automatically parses RSS 1.0, RSS 2.0, and Atom 1.0 feeds
- Override `parseItem()` to modify individual feed items
- Call `$this->collectExpandableDatas('feed_url')` in `collectData()`
- Can limit number of items fetched
**Use this when:** Working with sites that already provide feeds but need enhancement (e.g., content modification, filtering, or expanding truncated content).
## WebDriverAbstract - JavaScript/XHR-Heavy Sites
**File:** [Bridge_Docs/04_WebDriverAbstract.md](Bridge_Docs/04_WebDriverAbstract.md)
For websites that heavily rely on JavaScript or XMLHttpRequest (XHR) to load content. Uses Selenium WebDriver with a real browser instance.
Important notes:
- Requires a running Selenium server (Docker image or local ChromeDriver)
- Very resource-intensive - only use when other methods fail
- Must explicitly wait for elements to appear after page loads or interactions
- Always call `$this->cleanUp()` in a finally block
**Use this when:** The target website loads content dynamically via JavaScript and cannot be scraped with standard HTML parsing methods.
## XPathAbstract - XPath-Based Bridges
**File:** [Bridge_Docs/05_XPathAbstract.md](Bridge_Docs/05_XPathAbstract.md)
Simplified bridge creation using XPath expressions. Ideal if you're familiar with XPath syntax.
Define these constants with XPath expressions:
- `FEED_SOURCE_URL` - The source webpage
- `XPATH_EXPRESSION_ITEM` - Selects each article/item container
- `XPATH_EXPRESSION_ITEM_TITLE` - Extracts title from item context
- `XPATH_EXPRESSION_ITEM_CONTENT` - Extracts content from item context
- `XPATH_EXPRESSION_ITEM_URI` - Extracts link URL
- Additional expressions for author, timestamp, enclosures, categories
Optional methods to override for formatting: `formatItemTitle()`, `formatItemContent()`, `formatItemUri()`, etc.
**Use this when:** You can identify the content you need using XPath expressions, making bridge creation declarative and concise.