Samuel Clay
7165aa1bf6
Black formatting and isort
2024-04-24 09:50:42 -04:00
Samuel Clay
d1dafe7606
Black formatting.
2024-04-24 09:43:56 -04:00
Samuel Clay
6095bd709d
Fixing relative image urls to be absolute urls. Looks like a BeautifulSoup4 upgrade didn't catch the new attrs syntax. Also fixing bookmarklet loading wrong JS/CSS in development.
2022-11-24 12:53:40 -05:00
Samuel Clay
1e2ecb19d9
Handling unparseable error in text importer.
2022-07-01 12:21:40 -04:00
Samuel Clay
205d35b932
Removing unused readability unparseable error.
2021-10-25 19:46:15 -04:00
Samuel Clay
fcd8652715
Rewriting content before save.
2021-09-30 12:25:28 -04:00
Samuel Clay
5c6535fd2c
Newsletters have shorter full text, not longer.
2021-09-30 12:19:36 -04:00
Samuel Clay
3804fbba5d
Correcting mongodb on local installs since there is no auth.
2021-08-04 16:26:41 -04:00
Samuel Clay
6555f8ddcc
Vendorizing readability due to v0.8.1.1 not being released yet.
2021-08-03 18:11:17 -04:00
Samuel Clay
8348da9c59
Updating to latest readability-lxml.
2021-08-03 16:05:28 -04:00
Samuel Clay
540f7d3c45
Smart bytes import
2021-05-12 21:20:05 -04:00
Samuel Clay
c6ce8cc36a
Everything is smart bytes when it comes to original pages.
2021-05-12 21:19:09 -04:00
Samuel Clay
24d6c548b8
Fixing broken text importer, using haproxy to proxy text.
2021-05-05 15:18:09 -04:00
Jonathan Math
e068c9ff4e
change nb.local.com to localhost
2021-04-20 08:04:08 -05:00
Samuel Clay
11bf7cef8f
Correctly rewriting and removing image urls.
2021-04-14 13:27:31 -04:00
Samuel Clay
c16a4e4a4b
Skip images with no src
2021-03-25 18:52:32 -04:00
Samuel Clay
33c4c5ed4f
Removing tracing handler for sentry.
2021-03-17 10:42:58 -04:00
Samuel Clay
4cb24cf7f2
Since NewsBlur proxies all http images over https, the url can change, so acknowledge urls that are https on the original text but http on the feed
2021-02-26 14:06:42 -05:00
Samuel Clay
88d051abbf
Adding first image in original text when the story image no longer appears in the text. This should show more photos when reading by text.
2021-02-26 12:10:30 -05:00
Samuel Clay
8371c635f7
Merge branch 'master' into django2.0
...
* master: (27 commits)
Removing log override
Moving logging over to the newsblur log.
Fixing search indexer background task for new celery.
Attempting to add gunicorn errors to console/log.
Better handling of missing subs.
Handling missing user sub on feed delete.
Correct encoding for strings on systems that don't have utf-8 as default encoding.
Writing in the real urllib3 dependency for requests.
Upgrading requests due to urllib3 incompatibility.
Login required should use the next parameter.
Upgrading django oauth toolkit for django 1.11.
Handling newsletters with multiple recipients.
Extracting image urls sometimes fails.
Handling ajax errors in json views.
Adding timeouts to most outbound requests.
Sentry SDK 0.19.4.
Removing imperfect proxy warning for every story.
Found four more GET/POST crosses.
Feed unread count may need a POST.
Namespacing settings.
...
2020-12-08 09:09:25 -05:00
Samuel Clay
1a5d440582
Adding timeouts to most outbound requests.
2020-12-06 11:37:01 -05:00
Samuel Clay
b89e7dc429
Merge branch 'django1.11' into django2.0
...
* django1.11: (152 commits)
request.raw_post_data -> request.body (django 1.6)
Upgrading pgbouncer to 1.15.0.
Finishing off Postgresql 13 upgrade.
Upgrading to Postgresql 13.
Ubuntu 20.04
Fixing supervisor path issues
Upgrading setuptools
Fixing flask
Handling over capacity for twitter.
Max length for image_urls.
Properly filtering newsletter feeds.
Fixing issue with text importer on feed-less urls.
Removing dependency, fixing encoding issue for pages.
Fixing DB Monitor.
Updating User Agent for all fetchers.
Ignoring VSCode.
Fixing DB Monitor.
Updating User Agent for all fetchers.
Ignoring VSCode.
Fixing Statistics by fixing how timezones are handled.
...
2020-12-03 14:04:26 -05:00
Samuel Clay
4b8589a259
Fixing issue with text importer on feed-less urls.
2020-11-30 18:27:49 -05:00
Samuel Clay
fa43eed1b8
Updating User Agent for all fetchers.
2020-11-30 15:48:59 -05:00
Samuel Clay
42d4a4211f
Fixing encoding issues.
2020-06-30 17:22:47 -04:00
Samuel Clay
b05dcb3664
Fixing encoding errors
2020-06-30 15:29:28 -04:00
jmath1
6021afaec3
2to3 apps/rss_feeds
2020-06-15 02:54:37 -04:00
Samuel Clay
5c44c34da5
Overriding story url.
2020-04-30 14:53:42 -04:00
Samuel Clay
057d19acf1
Removing Daily Skip from text importer.
2020-04-30 14:36:32 -04:00
Samuel Clay
e5a8ef5978
Fixing KeyError
2019-08-21 18:33:57 -07:00
Samuel Clay
9566a26795
Rewriting relative image urls in original text with absolute urls.
2019-08-21 18:23:02 -07:00
Samuel Clay
2e6ad3afda
Adding new node app: original_text. To replace Mercury Reader. Thanks for all the text.
2019-04-13 15:29:14 -04:00
Samuel Clay
e74dde6e30
Fixing logging for mercury parser error.
2018-08-14 15:56:04 -04:00
Samuel Clay
f04e1a5279
Handling mercury text parsing error.
2018-08-09 09:47:10 -04:00
Samuel Clay
3b3ea98afd
Handling no host error in text importer.
2018-07-16 10:56:33 -04:00
Samuel Clay
94114595a6
Sanity check on extracting image urls in Text view.
2018-01-18 08:06:32 -08:00
Samuel Clay
8421f667d7
Fixing broken image handling from Mercury Reader that was causing image urls with a srcset to be concat'd together. This one's for @yesthatjwz.
2018-01-17 16:51:06 -08:00
Samuel Clay
b7574a1ff7
No longer finding the largest image in a story if the text view already successfully found one. Also using Mercury's builtin image finder.
2017-11-02 22:09:37 -07:00
Samuel Clay
f242a49d24
Handling mercury errors.
2017-10-30 11:47:18 -07:00
Samuel Clay
2827b896b5
Handling issue when story has no original content.
2017-10-24 15:33:27 -07:00
Samuel Clay
ec7e032c28
Switching to Mercury text parser, which is an upgraded Readability. Using old readability as backup.
2017-10-24 15:28:36 -07:00
Samuel Clay
c476d89e1f
Removing breaking text importer UTF-8 encoding.
2017-10-15 17:15:56 -07:00
Samuel Clay
ef51152bcd
Updating readability class names to look for.
2017-09-29 10:50:13 -07:00
Samuel Clay
82cdae1e4d
Extracting images from original text's noscript.
2017-03-23 16:28:47 -07:00
Samuel Clay
2c195cde2a
Fetcing the original text now extracts the image url for others.
2017-03-23 16:06:06 -07:00
Samuel Clay
aee018f39c
Upgrading Readability and forcing images to remain. THis should add a bunch of images back to the Text view.
2017-01-25 17:35:48 -08:00
Samuel Clay
c4830e3e95
Handling unicode encode errors in page/text handling. Also adding upgrade command for fabric when pip is non-trivial.
2016-12-05 22:09:05 -08:00
Samuel Clay
3ed96e338c
Fixing page and text importer to correctly handling non-breaking spaces.
2016-12-05 17:40:39 -08:00
Samuel Clay
e43733ce30
Handling lxml parser errors for original text.
2016-06-28 16:11:46 -07:00
Samuel Clay
53e4998146
Merge pull request #835 from sv0/text_importer
...
Text importer
2015-11-30 16:03:50 -08:00