Commit graph

78 commits

Author SHA1 Message Date
Samuel Clay
7165aa1bf6 Black formatting and isort 2024-04-24 09:50:42 -04:00
Samuel Clay
d1dafe7606 Black formatting. 2024-04-24 09:43:56 -04:00
Samuel Clay
6095bd709d Fixing relative image urls to be absolute urls. Looks like a BeautifulSoup4 upgrade didn't catch the new attrs syntax. Also fixing bookmarklet loading wrong JS/CSS in development. 2022-11-24 12:53:40 -05:00
Samuel Clay
1e2ecb19d9 Handling unparseable error in text importer. 2022-07-01 12:21:40 -04:00
Samuel Clay
205d35b932 Removing unused readability unparseable error. 2021-10-25 19:46:15 -04:00
Samuel Clay
fcd8652715 Rewriting content before save. 2021-09-30 12:25:28 -04:00
Samuel Clay
5c6535fd2c Newsletters have shorter full text, not longer. 2021-09-30 12:19:36 -04:00
Samuel Clay
3804fbba5d Correcting mongodb on local installs since there is no auth. 2021-08-04 16:26:41 -04:00
Samuel Clay
6555f8ddcc Vendorizing readability due to v0.8.1.1 not being released yet. 2021-08-03 18:11:17 -04:00
Samuel Clay
8348da9c59 Updating to latest readability-lxml. 2021-08-03 16:05:28 -04:00
Samuel Clay
540f7d3c45 Smart bytes import 2021-05-12 21:20:05 -04:00
Samuel Clay
c6ce8cc36a Everything is smart bytes when it comes to original pages. 2021-05-12 21:19:09 -04:00
Samuel Clay
24d6c548b8 Fixing broken text importer, using haproxy to proxy text. 2021-05-05 15:18:09 -04:00
Jonathan Math
e068c9ff4e change nb.local.com to localhost 2021-04-20 08:04:08 -05:00
Samuel Clay
11bf7cef8f Correctly rewriting and removing image urls. 2021-04-14 13:27:31 -04:00
Samuel Clay
c16a4e4a4b Skip images with no src 2021-03-25 18:52:32 -04:00
Samuel Clay
33c4c5ed4f Removing tracing handler for sentry. 2021-03-17 10:42:58 -04:00
Samuel Clay
4cb24cf7f2 Since NewsBlur proxies all http images over https, the url can change, so acknowledge urls that are https on the original text but http on the feed 2021-02-26 14:06:42 -05:00
Samuel Clay
88d051abbf Adding first image in original text when the story image no longer appears in the text. This should show more photos when reading by text. 2021-02-26 12:10:30 -05:00
Samuel Clay
8371c635f7 Merge branch 'master' into django2.0
* master: (27 commits)
  Removing log override
  Moving logging over to the newsblur log.
  Fixing search indexer background task for new celery.
  Attempting to add gunicorn errors to console/log.
  Better handling of missing subs.
  Handling missing user sub on feed delete.
  Correct encoding for strings on systems that don't have utf-8 as default encoding.
  Writing in the real urllib3 dependency for requests.
  Upgrading requests due to urllib3 incompatibility.
  Login required should use the next parameter.
  Upgrading django oauth toolkit for django 1.11.
  Handling newsletters with multiple recipients.
  Extracting image urls sometimes fails.
  Handling ajax errors in json views.
  Adding timeouts to most outbound requests.
  Sentry SDK 0.19.4.
  Removing imperfect proxy warning for every story.
  Found four more GET/POST crosses.
  Feed unread count may need a POST.
  Namespacing settings.
  ...
2020-12-08 09:09:25 -05:00
Samuel Clay
1a5d440582 Adding timeouts to most outbound requests. 2020-12-06 11:37:01 -05:00
Samuel Clay
b89e7dc429 Merge branch 'django1.11' into django2.0
* django1.11: (152 commits)
  request.raw_post_data -> request.body (django 1.6)
  Upgrading pgbouncer to 1.15.0.
  Finishing off Postgresql 13 upgrade.
  Upgrading to Postgresql 13.
  Ubuntu 20.04
  Fixing supervisor path issues
  Upgrading setuptools
  Fixing flask
  Handling over capacity for twitter.
  Max length for image_urls.
  Properly filtering newsletter feeds.
  Fixing issue with text importer on feed-less urls.
  Removing dependency, fixing encoding issue for pages.
  Fixing DB Monitor.
  Updating User Agent for all fetchers.
  Ignoring VSCode.
  Fixing DB Monitor.
  Updating User Agent for all fetchers.
  Ignoring VSCode.
  Fixing Statistics by fixing how timezones are handled.
  ...
2020-12-03 14:04:26 -05:00
Samuel Clay
4b8589a259 Fixing issue with text importer on feed-less urls. 2020-11-30 18:27:49 -05:00
Samuel Clay
fa43eed1b8 Updating User Agent for all fetchers. 2020-11-30 15:48:59 -05:00
Samuel Clay
42d4a4211f Fixing encoding issues. 2020-06-30 17:22:47 -04:00
Samuel Clay
b05dcb3664 Fixing encoding errors 2020-06-30 15:29:28 -04:00
jmath1
6021afaec3 2to3 apps/rss_feeds 2020-06-15 02:54:37 -04:00
Samuel Clay
5c44c34da5 Overriding story url. 2020-04-30 14:53:42 -04:00
Samuel Clay
057d19acf1 Removing Daily Skip from text importer. 2020-04-30 14:36:32 -04:00
Samuel Clay
e5a8ef5978 Fixing KeyError 2019-08-21 18:33:57 -07:00
Samuel Clay
9566a26795 Rewriting relative image urls in original text with absolute urls. 2019-08-21 18:23:02 -07:00
Samuel Clay
2e6ad3afda Adding new node app: original_text. To replace Mercury Reader. Thanks for all the text. 2019-04-13 15:29:14 -04:00
Samuel Clay
e74dde6e30 Fixing logging for mercury parser error. 2018-08-14 15:56:04 -04:00
Samuel Clay
f04e1a5279 Handling mercury text parsing error. 2018-08-09 09:47:10 -04:00
Samuel Clay
3b3ea98afd Handling no host error in text importer. 2018-07-16 10:56:33 -04:00
Samuel Clay
94114595a6 Sanity check on extracting image urls in Text view. 2018-01-18 08:06:32 -08:00
Samuel Clay
8421f667d7 Fixing broken image handling from Mercury Reader that was causing image urls with a srcset to be concat'd together. This one's for @yesthatjwz. 2018-01-17 16:51:06 -08:00
Samuel Clay
b7574a1ff7 No longer finding the largest image in a story if the text view already successfully found one. Also using Mercury's builtin image finder. 2017-11-02 22:09:37 -07:00
Samuel Clay
f242a49d24 Handling mercury errors. 2017-10-30 11:47:18 -07:00
Samuel Clay
2827b896b5 Handling issue when story has no original content. 2017-10-24 15:33:27 -07:00
Samuel Clay
ec7e032c28 Switching to Mercury text parser, which is an upgraded Readability. Using old readability as backup. 2017-10-24 15:28:36 -07:00
Samuel Clay
c476d89e1f Removing breaking text importer UTF-8 encoding. 2017-10-15 17:15:56 -07:00
Samuel Clay
ef51152bcd Updating readability class names to look for. 2017-09-29 10:50:13 -07:00
Samuel Clay
82cdae1e4d Extracting images from original text's noscript. 2017-03-23 16:28:47 -07:00
Samuel Clay
2c195cde2a Fetcing the original text now extracts the image url for others. 2017-03-23 16:06:06 -07:00
Samuel Clay
aee018f39c Upgrading Readability and forcing images to remain. THis should add a bunch of images back to the Text view. 2017-01-25 17:35:48 -08:00
Samuel Clay
c4830e3e95 Handling unicode encode errors in page/text handling. Also adding upgrade command for fabric when pip is non-trivial. 2016-12-05 22:09:05 -08:00
Samuel Clay
3ed96e338c Fixing page and text importer to correctly handling non-breaking spaces. 2016-12-05 17:40:39 -08:00
Samuel Clay
e43733ce30 Handling lxml parser errors for original text. 2016-06-28 16:11:46 -07:00
Samuel Clay
53e4998146 Merge pull request #835 from sv0/text_importer
Text importer
2015-11-30 16:03:50 -08:00