Commit Graph

15 Commits

Author SHA1 Message Date
Josh S
a23c357f2f Add bulk and single bookmark metadata refresh (#999)
* Add url create/edit query paramter to clear cache

* Add refresh bookmark metadata button in create/edit bookmark page

* Fix refresh bookmark metadata when editing existing bookmark

* Add bulk refresh metadata functionality

* Fix test cases for bulk view dropdown selection list

* Allow bulk metadata refresh when background tasks are disabled

* Move load preview image call on refresh metadata

* Update bookmark modified time on metadata refresh

* Rename function to align with convention

* Add tests for refresh task

* Add tests for bookmarks service refresh metadata

* Add tests for bookmarks api disable cache on check

* Remove bulk refresh metadata when background tasks disabled

* Refactor refresh metadata task

* Remove unnecessary call

* Fix testing mock name

* Abstract clearing metadata cache

* Add test to check if load page is called twice when cache disabled

* Remove refresh button for new bookmarks

* Remove strict disable cache is true check

* Refactor refresh metadata form logic into its own function

* move button and highlight changes

* polish and update tests

---------

Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>
2025-03-22 11:34:10 +01:00
Sascha Ißbrücker
fe7ddbe645 Allow bookmarks to have empty title and description (#843)
* add migration for merging fields

* remove usage of website title and description

* keep empty website title and description in API for compatibility

* restore scraping in API and add option for disabling it

* document API scraping behavior

* remove deprecated fields from API docs

* improve form layout

* cleanup migration

* cleanup website loader

* update tests
2024-09-22 07:52:00 +02:00
Viacheslav Slinko
87cd4061cb Add support for bookmark thumbnails (#721)
* Preview Image

* fix tests

* add test

* download preview image

* relative path

* gst

* details view

* fix tests

* Improve preview image styles

* Remove preview image URL from model

* Revert form changes

* update tests

* make it work in uwsgi

---------

Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>
2024-05-07 18:58:52 +02:00
Sascha Ißbrücker
98b9a9c1a0 Add black code formatter 2024-01-27 11:29:16 +01:00
Jonathan Sundqvist
150dfecc6f Support Open Graph description (#602)
* Support pytest for running tests

* Support extracting description from meta og:description property

* Revert changes to TOC

* Add test

---------

Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>
2024-01-27 10:28:46 +01:00
Sascha Ißbrücker
4220ea0b4c Fix website loader content encoding detection (#482) 2023-05-30 22:04:54 +02:00
Sascha Ißbrücker
30da1880a5 Cache website metadata to avoid duplicate scraping (#401)
* Cache website metadata to avoid duplicate scraping

* fix test setup
2023-01-20 22:28:44 +01:00
Sascha Ißbrücker
43d52642a6 Fix website loader test 2023-01-14 12:26:04 +01:00
Sascha Ißbrücker
4f9170c48d Improve website loader logging 2023-01-14 11:24:09 +01:00
Luca
c2d8cde86b Trim website metadata title and description (#383)
* feat: trim fetched metadata placeholders

* feat: implement trimming serverside

* Add website loader tests

* Address review comments

Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>
2023-01-12 21:06:36 +01:00
Sascha Ißbrücker
2fd7704816 Limit document size for website scraper (#354)
Limits the size of scraped HTML documents to prevent out of memory errors. The scraper will stop reading from the response when it encounters the closing head tag, or if the read content's size exceeds a max limit.

Fixes #345
2022-10-07 21:18:18 +02:00
Dustin Blackman
b53bd9f112 Bump waybackpy to 3.0.6 (#281)
* fix wayback

* fix tests

* Reuse user agent from website loader

Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>
2022-07-03 06:26:16 +02:00
Sascha Ißbrücker
e08bf9fd03 Fake request headers to reduce bot detection (#263)
Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>
2022-05-21 13:25:32 +02:00
Taku Izumi
937858cf58 Fix website scraper decoding content incorrectly (#126)
* Avoid stall on web scraping

This patch fixes stall on web scraping.
I encountered a stall (scraping never ends) when adding
a bookmark of some site.
To avoid this case, adding a timeout parameter at requests.get()
function is a solution.

Signed-off-by: Taku Izumi <admin@orz-style.com>

* Avoid character corruption of scraping some Japanese sites

This patch fixes character corruption of scraping some Japanese
sites. To avoid character corruption, I use r.content instead
of r.text in load_page function.

The reason of character corruption is encoding problem, I think.
r.text handles data as unicode encoded text, so if scraping
web site's charset is not unicode encoded, character corruption
occurs. r.content handles data as str[], we can avoid encoding
problem.

Signed-off-by: Taku Izumi <admin@orz-style.com>

* use charset_normalizer to determine response encoding

Co-authored-by: Taku Izumi <admin@orz-style.com>
Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@googlemail.com>
2021-08-25 10:16:23 +02:00
Sascha Ißbrücker
e07da529f1 Preview website title + description in bookmark form
Fix unnecessary selects when rendering bookmarks
2019-07-02 01:28:02 +02:00