linux/arch/s390/kernel/time.c
Linus Torvalds fcc79e1714 Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new
 behavior is disabled by default, regression risk should be contained.
 
 Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
 default value from PTP_1588_CLOCK_KVM, as the first is intended to be
 a more reliable replacement for the latter.
 
 Core
 ----
 
  - Started a very large, in-progress, effort to make the RTNL lock
    scope per network-namespace, thus reducing the lock contention
    significantly in the containerized use-case, comprising:
    - RCU-ified some relevant slices of the FIB control path
    - introduce basic per netns locking helpers
    - namespacified the IPv4 address hash table
    - remove rtnl_register{,_module}() in favour of rtnl_register_many()
    - refactor rtnl_{new,del,set}link() moving as much validation as
      possible out of RTNL lock
    - convert all phonet doit() and dumpit() handlers to RCU
    - convert IPv4 addresses manipulation to per-netns RTNL
    - convert virtual interface creation to per-netns RTNL
    the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
    knob, disabled by default ad interim.
 
  - Introduce NAPI suspension, to efficiently switching between busy
    polling (NAPI processing suspended) and normal processing.
 
  - Migrate the IPv4 routing input, output and control path from direct
    ToS usage to DSCP macros. This is a work in progress to make ECN
    handling consistent and reliable.
 
  - Add drop reasons support to the IPv4 rotue input path, allowing
    better introspection in case of packets drop.
 
  - Make FIB seqnum lockless, dropping RTNL protection for read
    access.
 
  - Make inet{,v6} addresses hashing less predicable.
 
  - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
    and timestamps
 
 Things we sprinkled into general kernel code
 --------------------------------------------
 
  - Add small file operations for debugfs, to reduce the struct ops size.
 
  - Refactoring and optimization for the implementation of page_frag API,
    This is a preparatory work to consolidate the page_frag
    implementation.
 
 Netfilter
 ---------
 
  - Optimize set element transactions to reduce memory consumption
 
  - Extended netlink error reporting for attribute parser failure.
 
  - Make legacy xtables configs user selectable, giving users
    the option to configure iptables without enabling any other config.
 
  - Address a lot of false-positive RCU issues, pointed by recent
    CI improvements.
 
 BPF
 ---
 
  - Put xsk sockets on a struct diet and add various cleanups. Overall,
    this helps to bump performance by 12% for some workloads.
 
  - Extend BPF selftests to increase coverage of XDP features in
    combination with BPF cpumap.
 
  - Optimize and homogenize bpf_csum_diff helper for all archs and also
    add a batch of new BPF selftests for it.
 
  - Extend netkit with an option to delegate skb->{mark,priority}
    scrubbing to its BPF program.
 
  - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
    programs.
 
 Protocols
 ---------
 
  - Introduces 4-tuple hash for connected udp sockets, speeding-up
    significantly connected sockets lookup.
 
  - Add a fastpath for some TCP timers that usually expires after close,
    the socket lock contention.
 
  - Add inbound and outbound xfrm state caches to speed up state lookups.
 
  - Avoid sending MPTCP advertisements on stale subflows, reducing
    risks on loosing them.
 
  - Make neighbours table flushing more scalable, maintaining per device
    neigh lists.
 
 Driver API
 ----------
 
  - Introduce a unified interface to configure transmission H/W shaping,
    and expose it to user-space via generic-netlink.
 
  - Add support for per-NAPI config via netlink. This makes napi
    configuration persistent across queues removal and re-creation.
    Requires driver updates, currently supported drivers are:
    nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
 
  - Add ethtool support for writing SFP / PHY firmware blocks.
 
  - Track RSS context allocation from ethtool core.
 
  - Implement support for mirroring to DSA CPU port, via TC mirror
    offload.
 
  - Consolidate FDB updates notification, to avoid duplicates on
    device-specific entries.
 
  - Expose DPLL clock quality level to the user-space.
 
  - Support master-slave PHY config via device tree.
 
 Tests and tooling
 -----------------
 
  - forwarding: introduce deferred commands, to simplify
    the cleanup phase
 
 Drivers
 -------
 
  - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
    Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
    IRQs and queues to NAPI IDs, allowing busy polling and better
    introspection.
 
  - Ethernet high-speed NICs:
    - nVidia/Mellanox:
      - mlx5:
        - a large refactor to implement support for cross E-Switch
          scheduling
        - refactor H/W conter management to let it scale better
        - H/W GRO cleanups
    - Intel (100G, ice)::
      - adds support for ethtool reset
      - implement support for per TX queue H/W shaping
    - AMD/Solarflare:
      - implement per device queue stats support
    - Broadcom (bnxt):
      - improve wildcard l4proto on IPv4/IPv6 ntuple rules
    - Marvell Octeon:
      - Adds representor support for each Resource Virtualization Unit
        (RVU) device.
    - Hisilicon:
      - adds support for the BMC Gigabit Ethernet
    - IBM (EMAC):
      - driver cleanup and modernization
    - Cisco (VIC):
      - raise the queues number limit to 256
 
  - Ethernet virtual:
    - Google vNIC:
      - implements page pool support
    - macsec:
      - inherit lower device's features and TSO limits when offloading
    - virtio_net:
      - enable premapped mode by default
      - support for XDP socket(AF_XDP) zerocopy TX
    - wireguard:
      - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
        packets.
 
  - Ethernet NICs embedded and virtual:
    - Broadcom ASP:
      - enable software timestamping
    - Freescale:
      - add enetc4 PF driver
    - MediaTek: Airoha SoC:
      - implement BQL support
    - RealTek r8169:
      - enable TSO by default on r8168/r8125
      - implement extended ethtool stats
    - Renesas AVB:
      - enable TX checksum offload
    - Synopsys (stmmac):
      - support header splitting for vlan tagged packets
      - move common code for DWMAC4 and DWXGMAC into a separate FPE
        module.
      - Add the dwmac driver support for T-HEAD TH1520 SoC
    - Synopsys (xpcs):
      - driver refactor and cleanup
    - TI:
      - icssg_prueth: add VLAN offload support
    - Xilinx emaclite:
      - adds clock support
 
  - Ethernet switches:
    - Microchip:
      - implement support for the lan969x Ethernet switch family
      - add LAN9646 switch support to KSZ DSA driver
 
  - Ethernet PHYs:
    - Marvel: 88q2x: enable auto negotiation
    - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
 
  - PTP:
    - Add support for the Amazon virtual clock device
    - Add PtP driver for s390 clocks
 
  - WiFi:
    - mac80211
      - EHT 1024 aggregation size for transmissions
      - new operation to indicate that a new interface is to be added
      - support radio separation of multi-band devices
      - move wireless extension spy implementation to libiw
    - Broadcom:
      - brcmfmac: optional LPO clock support
    - Microchip:
      - add support for Atmel WILC3000
    - Qualcomm (ath12k):
      - firmware coredump collection support
      - add debugfs support for a multitude of statistics
    - Qualcomm (ath5k):
      -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
    - Realtek:
      - rtw88: 8821au and 8812au USB adapters support
      - rtw89: add thermal protection
      - rtw89: fine tune BT-coexsitence to improve user experience
      - rtw89: firmware secure boot for WiFi 6 chip
 
  - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
 wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
 vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
 jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
 b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
 kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
 BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
 lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
 Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
 eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
 THcvVM+bupEZ
 =GzPr
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
 "The most significant set of changes is the per netns RTNL. The new
  behavior is disabled by default, regression risk should be contained.

  Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
  default value from PTP_1588_CLOCK_KVM, as the first is intended to be
  a more reliable replacement for the latter.

  Core:

   - Started a very large, in-progress, effort to make the RTNL lock
     scope per network-namespace, thus reducing the lock contention
     significantly in the containerized use-case, comprising:
       - RCU-ified some relevant slices of the FIB control path
       - introduce basic per netns locking helpers
       - namespacified the IPv4 address hash table
       - remove rtnl_register{,_module}() in favour of
         rtnl_register_many()
       - refactor rtnl_{new,del,set}link() moving as much validation as
         possible out of RTNL lock
       - convert all phonet doit() and dumpit() handlers to RCU
       - convert IPv4 addresses manipulation to per-netns RTNL
       - convert virtual interface creation to per-netns RTNL
     the per-netns lock infrastructure is guarded by the
     CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.

   - Introduce NAPI suspension, to efficiently switching between busy
     polling (NAPI processing suspended) and normal processing.

   - Migrate the IPv4 routing input, output and control path from direct
     ToS usage to DSCP macros. This is a work in progress to make ECN
     handling consistent and reliable.

   - Add drop reasons support to the IPv4 rotue input path, allowing
     better introspection in case of packets drop.

   - Make FIB seqnum lockless, dropping RTNL protection for read access.

   - Make inet{,v6} addresses hashing less predicable.

   - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
     and timestamps

  Things we sprinkled into general kernel code:

   - Add small file operations for debugfs, to reduce the struct ops
     size.

   - Refactoring and optimization for the implementation of page_frag
     API, This is a preparatory work to consolidate the page_frag
     implementation.

  Netfilter:

   - Optimize set element transactions to reduce memory consumption

   - Extended netlink error reporting for attribute parser failure.

   - Make legacy xtables configs user selectable, giving users the
     option to configure iptables without enabling any other config.

   - Address a lot of false-positive RCU issues, pointed by recent CI
     improvements.

  BPF:

   - Put xsk sockets on a struct diet and add various cleanups. Overall,
     this helps to bump performance by 12% for some workloads.

   - Extend BPF selftests to increase coverage of XDP features in
     combination with BPF cpumap.

   - Optimize and homogenize bpf_csum_diff helper for all archs and also
     add a batch of new BPF selftests for it.

   - Extend netkit with an option to delegate skb->{mark,priority}
     scrubbing to its BPF program.

   - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
     programs.

  Protocols:

   - Introduces 4-tuple hash for connected udp sockets, speeding-up
     significantly connected sockets lookup.

   - Add a fastpath for some TCP timers that usually expires after
     close, the socket lock contention.

   - Add inbound and outbound xfrm state caches to speed up state
     lookups.

   - Avoid sending MPTCP advertisements on stale subflows, reducing
     risks on loosing them.

   - Make neighbours table flushing more scalable, maintaining per
     device neigh lists.

  Driver API:

   - Introduce a unified interface to configure transmission H/W
     shaping, and expose it to user-space via generic-netlink.

   - Add support for per-NAPI config via netlink. This makes napi
     configuration persistent across queues removal and re-creation.
     Requires driver updates, currently supported drivers are:
     nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.

   - Add ethtool support for writing SFP / PHY firmware blocks.

   - Track RSS context allocation from ethtool core.

   - Implement support for mirroring to DSA CPU port, via TC mirror
     offload.

   - Consolidate FDB updates notification, to avoid duplicates on
     device-specific entries.

   - Expose DPLL clock quality level to the user-space.

   - Support master-slave PHY config via device tree.

  Tests and tooling:

   - forwarding: introduce deferred commands, to simplify the cleanup
     phase

  Drivers:

   - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
     Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
     IRQs and queues to NAPI IDs, allowing busy polling and better
     introspection.

   - Ethernet high-speed NICs:
      - nVidia/Mellanox:
         - mlx5:
           - a large refactor to implement support for cross E-Switch
             scheduling
           - refactor H/W conter management to let it scale better
           - H/W GRO cleanups
      - Intel (100G, ice)::
         - add support for ethtool reset
         - implement support for per TX queue H/W shaping
      - AMD/Solarflare:
         - implement per device queue stats support
      - Broadcom (bnxt):
         - improve wildcard l4proto on IPv4/IPv6 ntuple rules
      - Marvell Octeon:
         - Add representor support for each Resource Virtualization Unit
           (RVU) device.
      - Hisilicon:
         - add support for the BMC Gigabit Ethernet
      - IBM (EMAC):
         - driver cleanup and modernization
      - Cisco (VIC):
         - raise the queues number limit to 256

   - Ethernet virtual:
      - Google vNIC:
         - implement page pool support
      - macsec:
         - inherit lower device's features and TSO limits when
           offloading
      - virtio_net:
         - enable premapped mode by default
         - support for XDP socket(AF_XDP) zerocopy TX
      - wireguard:
         - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
           packets.

   - Ethernet NICs embedded and virtual:
      - Broadcom ASP:
         - enable software timestamping
      - Freescale:
         - add enetc4 PF driver
      - MediaTek: Airoha SoC:
         - implement BQL support
      - RealTek r8169:
         - enable TSO by default on r8168/r8125
         - implement extended ethtool stats
      - Renesas AVB:
         - enable TX checksum offload
      - Synopsys (stmmac):
         - support header splitting for vlan tagged packets
         - move common code for DWMAC4 and DWXGMAC into a separate FPE
           module.
         - add dwmac driver support for T-HEAD TH1520 SoC
      - Synopsys (xpcs):
         - driver refactor and cleanup
      - TI:
         - icssg_prueth: add VLAN offload support
      - Xilinx emaclite:
         - add clock support

   - Ethernet switches:
      - Microchip:
         - implement support for the lan969x Ethernet switch family
         - add LAN9646 switch support to KSZ DSA driver

   - Ethernet PHYs:
      - Marvel: 88q2x: enable auto negotiation
      - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2

   - PTP:
      - Add support for the Amazon virtual clock device
      - Add PtP driver for s390 clocks

   - WiFi:
      - mac80211
         - EHT 1024 aggregation size for transmissions
         - new operation to indicate that a new interface is to be added
         - support radio separation of multi-band devices
         - move wireless extension spy implementation to libiw
      - Broadcom:
         - brcmfmac: optional LPO clock support
      - Microchip:
         - add support for Atmel WILC3000
      - Qualcomm (ath12k):
         - firmware coredump collection support
         - add debugfs support for a multitude of statistics
      - Qualcomm (ath5k):
         -  Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
      - Realtek:
         - rtw88: 8821au and 8812au USB adapters support
         - rtw89: add thermal protection
         - rtw89: fine tune BT-coexsitence to improve user experience
         - rtw89: firmware secure boot for WiFi 6 chip

   - Bluetooth
      - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
        0x13d3:0x3623
      - add Realtek RTL8852BE support for id Foxconn 0xe123
      - add MediaTek MT7920 support for wireless module ids
      - btintel_pcie: add handshake between driver and firmware
      - btintel_pcie: add recovery mechanism
      - btnxpuart: add GPIO support to power save feature"

* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
  mm: page_frag: fix a compile error when kernel is not compiled
  Documentation: tipc: fix formatting issue in tipc.rst
  selftests: nic_performance: Add selftest for performance of NIC driver
  selftests: nic_link_layer: Add selftest case for speed and duplex states
  selftests: nic_link_layer: Add link layer selftest for NIC driver
  bnxt_en: Add FW trace coredump segments to the coredump
  bnxt_en: Add a new ethtool -W dump flag
  bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
  bnxt_en: Add functions to copy host context memory
  bnxt_en: Do not free FW log context memory
  bnxt_en: Manage the FW trace context memory
  bnxt_en: Allocate backing store memory for FW trace logs
  bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
  bnxt_en: Refactor bnxt_free_ctx_mem()
  bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
  bnxt_en: Update firmware interface spec to 1.10.3.85
  selftests/bpf: Add some tests with sockmap SK_PASS
  bpf: fix recursive lock when verdict program return SK_PASS
  wireguard: device: support big tcp GSO
  wireguard: selftests: load nf_conntrack if not present
  ...
2024-11-21 08:28:08 -08:00

950 lines
23 KiB
C

// SPDX-License-Identifier: GPL-2.0
/*
* Time of day based timer functions.
*
* S390 version
* Copyright IBM Corp. 1999, 2008
* Author(s): Hartmut Penner (hp@de.ibm.com),
* Martin Schwidefsky (schwidefsky@de.ibm.com),
* Denis Joseph Barrow (djbarrow@de.ibm.com,barrow_dj@yahoo.com)
*
* Derived from "arch/i386/kernel/time.c"
* Copyright (C) 1991, 1992, 1995 Linus Torvalds
*/
#define KMSG_COMPONENT "time"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
#include <linux/kernel_stat.h>
#include <linux/errno.h>
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/sched/clock.h>
#include <linux/kernel.h>
#include <linux/param.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/cpu.h>
#include <linux/stop_machine.h>
#include <linux/time.h>
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/init.h>
#include <linux/smp.h>
#include <linux/types.h>
#include <linux/profile.h>
#include <linux/timex.h>
#include <linux/notifier.h>
#include <linux/clockchips.h>
#include <linux/gfp.h>
#include <linux/kprobes.h>
#include <linux/uaccess.h>
#include <vdso/vsyscall.h>
#include <vdso/clocksource.h>
#include <vdso/helpers.h>
#include <asm/facility.h>
#include <asm/delay.h>
#include <asm/div64.h>
#include <asm/vdso.h>
#include <asm/irq.h>
#include <asm/irq_regs.h>
#include <asm/vtimer.h>
#include <asm/stp.h>
#include <asm/cio.h>
#include "entry.h"
union tod_clock tod_clock_base __section(".data");
EXPORT_SYMBOL_GPL(tod_clock_base);
u64 clock_comparator_max = -1ULL;
EXPORT_SYMBOL_GPL(clock_comparator_max);
static DEFINE_PER_CPU(struct clock_event_device, comparators);
ATOMIC_NOTIFIER_HEAD(s390_epoch_delta_notifier);
EXPORT_SYMBOL(s390_epoch_delta_notifier);
unsigned char ptff_function_mask[16];
static unsigned long lpar_offset;
static unsigned long initial_leap_seconds;
static unsigned long tod_steering_end;
static long tod_steering_delta;
/*
* Get time offsets with PTFF
*/
void __init time_early_init(void)
{
struct ptff_qto qto;
struct ptff_qui qui;
int cs;
/* Initialize TOD steering parameters */
tod_steering_end = tod_clock_base.tod;
for (cs = 0; cs < CS_BASES; cs++)
vdso_data[cs].arch_data.tod_steering_end = tod_steering_end;
if (!test_facility(28))
return;
ptff(&ptff_function_mask, sizeof(ptff_function_mask), PTFF_QAF);
/* get LPAR offset */
if (ptff_query(PTFF_QTO) && ptff(&qto, sizeof(qto), PTFF_QTO) == 0)
lpar_offset = qto.tod_epoch_difference;
/* get initial leap seconds */
if (ptff_query(PTFF_QUI) && ptff(&qui, sizeof(qui), PTFF_QUI) == 0)
initial_leap_seconds = (unsigned long)
((long) qui.old_leap * 4096000000L);
}
unsigned long long noinstr sched_clock_noinstr(void)
{
return tod_to_ns(__get_tod_clock_monotonic());
}
/*
* Scheduler clock - returns current time in nanosec units.
*/
unsigned long long notrace sched_clock(void)
{
return tod_to_ns(get_tod_clock_monotonic());
}
NOKPROBE_SYMBOL(sched_clock);
static void ext_to_timespec64(union tod_clock *clk, struct timespec64 *xt)
{
unsigned long rem, sec, nsec;
sec = clk->us;
rem = do_div(sec, 1000000);
nsec = ((clk->sus + (rem << 12)) * 125) >> 9;
xt->tv_sec = sec;
xt->tv_nsec = nsec;
}
void clock_comparator_work(void)
{
struct clock_event_device *cd;
get_lowcore()->clock_comparator = clock_comparator_max;
cd = this_cpu_ptr(&comparators);
cd->event_handler(cd);
}
static int s390_next_event(unsigned long delta,
struct clock_event_device *evt)
{
get_lowcore()->clock_comparator = get_tod_clock() + delta;
set_clock_comparator(get_lowcore()->clock_comparator);
return 0;
}
/*
* Set up lowcore and control register of the current cpu to
* enable TOD clock and clock comparator interrupts.
*/
void init_cpu_timer(void)
{
struct clock_event_device *cd;
int cpu;
get_lowcore()->clock_comparator = clock_comparator_max;
set_clock_comparator(get_lowcore()->clock_comparator);
cpu = smp_processor_id();
cd = &per_cpu(comparators, cpu);
cd->name = "comparator";
cd->features = CLOCK_EVT_FEAT_ONESHOT;
cd->mult = 16777;
cd->shift = 12;
cd->min_delta_ns = 1;
cd->min_delta_ticks = 1;
cd->max_delta_ns = LONG_MAX;
cd->max_delta_ticks = ULONG_MAX;
cd->rating = 400;
cd->cpumask = cpumask_of(cpu);
cd->set_next_event = s390_next_event;
clockevents_register_device(cd);
/* Enable clock comparator timer interrupt. */
local_ctl_set_bit(0, CR0_CLOCK_COMPARATOR_SUBMASK_BIT);
/* Always allow the timing alert external interrupt. */
local_ctl_set_bit(0, CR0_ETR_SUBMASK_BIT);
}
static void clock_comparator_interrupt(struct ext_code ext_code,
unsigned int param32,
unsigned long param64)
{
inc_irq_stat(IRQEXT_CLK);
if (get_lowcore()->clock_comparator == clock_comparator_max)
set_clock_comparator(get_lowcore()->clock_comparator);
}
static void stp_timing_alert(struct stp_irq_parm *);
static void timing_alert_interrupt(struct ext_code ext_code,
unsigned int param32, unsigned long param64)
{
inc_irq_stat(IRQEXT_TLA);
if (param32 & 0x00038000)
stp_timing_alert((struct stp_irq_parm *) &param32);
}
static void stp_reset(void);
void read_persistent_clock64(struct timespec64 *ts)
{
union tod_clock clk;
u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
store_tod_clock_ext(&clk);
clk.eitod -= delta;
ext_to_timespec64(&clk, ts);
}
void __init read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
struct timespec64 *boot_offset)
{
struct timespec64 boot_time;
union tod_clock clk;
u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
clk = tod_clock_base;
clk.eitod -= delta;
ext_to_timespec64(&clk, &boot_time);
read_persistent_clock64(wall_time);
*boot_offset = timespec64_sub(*wall_time, boot_time);
}
static u64 read_tod_clock(struct clocksource *cs)
{
unsigned long now, adj;
preempt_disable(); /* protect from changes to steering parameters */
now = get_tod_clock();
adj = tod_steering_end - now;
if (unlikely((s64) adj > 0))
/*
* manually steer by 1 cycle every 2^16 cycles. This
* corresponds to shifting the tod delta by 15. 1s is
* therefore steered in ~9h. The adjust will decrease
* over time, until it finally reaches 0.
*/
now += (tod_steering_delta < 0) ? (adj >> 15) : -(adj >> 15);
preempt_enable();
return now;
}
static struct clocksource clocksource_tod = {
.name = "tod",
.rating = 400,
.read = read_tod_clock,
.mask = CLOCKSOURCE_MASK(64),
.mult = 4096000,
.shift = 24,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
.vdso_clock_mode = VDSO_CLOCKMODE_TOD,
.id = CSID_S390_TOD,
};
struct clocksource * __init clocksource_default_clock(void)
{
return &clocksource_tod;
}
/*
* Initialize the TOD clock and the CPU timer of
* the boot cpu.
*/
void __init time_init(void)
{
/* Reset time synchronization interfaces. */
stp_reset();
/* request the clock comparator external interrupt */
if (register_external_irq(EXT_IRQ_CLK_COMP, clock_comparator_interrupt))
panic("Couldn't request external interrupt 0x1004");
/* request the timing alert external interrupt */
if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt))
panic("Couldn't request external interrupt 0x1406");
if (__clocksource_register(&clocksource_tod) != 0)
panic("Could not register TOD clock source");
/* Enable TOD clock interrupts on the boot cpu. */
init_cpu_timer();
/* Enable cpu timer interrupts on the boot cpu. */
vtime_init();
}
static DEFINE_PER_CPU(atomic_t, clock_sync_word);
static DEFINE_MUTEX(stp_mutex);
static unsigned long clock_sync_flags;
#define CLOCK_SYNC_HAS_STP 0
#define CLOCK_SYNC_STP 1
#define CLOCK_SYNC_STPINFO_VALID 2
/*
* The get_clock function for the physical clock. It will get the current
* TOD clock, subtract the LPAR offset and write the result to *clock.
* The function returns 0 if the clock is in sync with the external time
* source. If the clock mode is local it will return -EOPNOTSUPP and
* -EAGAIN if the clock is not in sync with the external reference.
*/
int get_phys_clock(unsigned long *clock)
{
atomic_t *sw_ptr;
unsigned int sw0, sw1;
sw_ptr = &get_cpu_var(clock_sync_word);
sw0 = atomic_read(sw_ptr);
*clock = get_tod_clock() - lpar_offset;
sw1 = atomic_read(sw_ptr);
put_cpu_var(clock_sync_word);
if (sw0 == sw1 && (sw0 & 0x80000000U))
/* Success: time is in sync. */
return 0;
if (!test_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags))
return -EOPNOTSUPP;
if (!test_bit(CLOCK_SYNC_STP, &clock_sync_flags))
return -EACCES;
return -EAGAIN;
}
EXPORT_SYMBOL(get_phys_clock);
/*
* Make get_phys_clock() return -EAGAIN.
*/
static void disable_sync_clock(void *dummy)
{
atomic_t *sw_ptr = this_cpu_ptr(&clock_sync_word);
/*
* Clear the in-sync bit 2^31. All get_phys_clock calls will
* fail until the sync bit is turned back on. In addition
* increase the "sequence" counter to avoid the race of an
* stp event and the complete recovery against get_phys_clock.
*/
atomic_andnot(0x80000000, sw_ptr);
atomic_inc(sw_ptr);
}
/*
* Make get_phys_clock() return 0 again.
* Needs to be called from a context disabled for preemption.
*/
static void enable_sync_clock(void)
{
atomic_t *sw_ptr = this_cpu_ptr(&clock_sync_word);
atomic_or(0x80000000, sw_ptr);
}
/*
* Function to check if the clock is in sync.
*/
static inline int check_sync_clock(void)
{
atomic_t *sw_ptr;
int rc;
sw_ptr = &get_cpu_var(clock_sync_word);
rc = (atomic_read(sw_ptr) & 0x80000000U) != 0;
put_cpu_var(clock_sync_word);
return rc;
}
/*
* Apply clock delta to the global data structures.
* This is called once on the CPU that performed the clock sync.
*/
static void clock_sync_global(long delta)
{
unsigned long now, adj;
struct ptff_qto qto;
int cs;
/* Fixup the monotonic sched clock. */
tod_clock_base.eitod += delta;
/* Adjust TOD steering parameters. */
now = get_tod_clock();
adj = tod_steering_end - now;
if (unlikely((s64) adj >= 0))
/* Calculate how much of the old adjustment is left. */
tod_steering_delta = (tod_steering_delta < 0) ?
-(adj >> 15) : (adj >> 15);
tod_steering_delta += delta;
if ((abs(tod_steering_delta) >> 48) != 0)
panic("TOD clock sync offset %li is too large to drift\n",
tod_steering_delta);
tod_steering_end = now + (abs(tod_steering_delta) << 15);
for (cs = 0; cs < CS_BASES; cs++) {
vdso_data[cs].arch_data.tod_steering_end = tod_steering_end;
vdso_data[cs].arch_data.tod_steering_delta = tod_steering_delta;
}
/* Update LPAR offset. */
if (ptff_query(PTFF_QTO) && ptff(&qto, sizeof(qto), PTFF_QTO) == 0)
lpar_offset = qto.tod_epoch_difference;
/* Call the TOD clock change notifier. */
atomic_notifier_call_chain(&s390_epoch_delta_notifier, 0, &delta);
}
/*
* Apply clock delta to the per-CPU data structures of this CPU.
* This is called for each online CPU after the call to clock_sync_global.
*/
static void clock_sync_local(long delta)
{
/* Add the delta to the clock comparator. */
if (get_lowcore()->clock_comparator != clock_comparator_max) {
get_lowcore()->clock_comparator += delta;
set_clock_comparator(get_lowcore()->clock_comparator);
}
/* Adjust the last_update_clock time-stamp. */
get_lowcore()->last_update_clock += delta;
}
/* Single threaded workqueue used for stp sync events */
static struct workqueue_struct *time_sync_wq;
static void __init time_init_wq(void)
{
if (time_sync_wq)
return;
time_sync_wq = create_singlethread_workqueue("timesync");
}
struct clock_sync_data {
atomic_t cpus;
int in_sync;
long clock_delta;
};
/*
* Server Time Protocol (STP) code.
*/
static bool stp_online;
static struct stp_sstpi stp_info;
static void *stp_page;
static void stp_work_fn(struct work_struct *work);
static DECLARE_WORK(stp_work, stp_work_fn);
static struct timer_list stp_timer;
static int __init early_parse_stp(char *p)
{
return kstrtobool(p, &stp_online);
}
early_param("stp", early_parse_stp);
/*
* Reset STP attachment.
*/
static void __init stp_reset(void)
{
int rc;
stp_page = (void *) get_zeroed_page(GFP_ATOMIC);
rc = chsc_sstpc(stp_page, STP_OP_CTRL, 0x0000, NULL);
if (rc == 0)
set_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags);
else if (stp_online) {
pr_warn("The real or virtual hardware system does not provide an STP interface\n");
free_page((unsigned long) stp_page);
stp_page = NULL;
stp_online = false;
}
}
bool stp_enabled(void)
{
return test_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags) && stp_online;
}
EXPORT_SYMBOL(stp_enabled);
static void stp_timeout(struct timer_list *unused)
{
queue_work(time_sync_wq, &stp_work);
}
static int __init stp_init(void)
{
if (!test_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags))
return 0;
timer_setup(&stp_timer, stp_timeout, 0);
time_init_wq();
if (!stp_online)
return 0;
queue_work(time_sync_wq, &stp_work);
return 0;
}
arch_initcall(stp_init);
/*
* STP timing alert. There are three causes:
* 1) timing status change
* 2) link availability change
* 3) time control parameter change
* In all three cases we are only interested in the clock source state.
* If a STP clock source is now available use it.
*/
static void stp_timing_alert(struct stp_irq_parm *intparm)
{
if (intparm->tsc || intparm->lac || intparm->tcpc)
queue_work(time_sync_wq, &stp_work);
}
/*
* STP sync check machine check. This is called when the timing state
* changes from the synchronized state to the unsynchronized state.
* After a STP sync check the clock is not in sync. The machine check
* is broadcasted to all cpus at the same time.
*/
int stp_sync_check(void)
{
disable_sync_clock(NULL);
return 1;
}
/*
* STP island condition machine check. This is called when an attached
* server attempts to communicate over an STP link and the servers
* have matching CTN ids and have a valid stratum-1 configuration
* but the configurations do not match.
*/
int stp_island_check(void)
{
disable_sync_clock(NULL);
return 1;
}
void stp_queue_work(void)
{
queue_work(time_sync_wq, &stp_work);
}
static int __store_stpinfo(void)
{
int rc = chsc_sstpi(stp_page, &stp_info, sizeof(struct stp_sstpi));
if (rc)
clear_bit(CLOCK_SYNC_STPINFO_VALID, &clock_sync_flags);
else
set_bit(CLOCK_SYNC_STPINFO_VALID, &clock_sync_flags);
return rc;
}
static int stpinfo_valid(void)
{
return stp_online && test_bit(CLOCK_SYNC_STPINFO_VALID, &clock_sync_flags);
}
static int stp_sync_clock(void *data)
{
struct clock_sync_data *sync = data;
long clock_delta, flags;
static int first;
int rc;
enable_sync_clock();
if (xchg(&first, 1) == 0) {
/* Wait until all other cpus entered the sync function. */
while (atomic_read(&sync->cpus) != 0)
cpu_relax();
rc = 0;
if (stp_info.todoff || stp_info.tmd != 2) {
flags = vdso_update_begin();
rc = chsc_sstpc(stp_page, STP_OP_SYNC, 0,
&clock_delta);
if (rc == 0) {
sync->clock_delta = clock_delta;
clock_sync_global(clock_delta);
rc = __store_stpinfo();
if (rc == 0 && stp_info.tmd != 2)
rc = -EAGAIN;
}
vdso_update_end(flags);
}
sync->in_sync = rc ? -EAGAIN : 1;
xchg(&first, 0);
} else {
/* Slave */
atomic_dec(&sync->cpus);
/* Wait for in_sync to be set. */
while (READ_ONCE(sync->in_sync) == 0)
__udelay(1);
}
if (sync->in_sync != 1)
/* Didn't work. Clear per-cpu in sync bit again. */
disable_sync_clock(NULL);
/* Apply clock delta to per-CPU fields of this CPU. */
clock_sync_local(sync->clock_delta);
return 0;
}
static int stp_clear_leap(void)
{
struct __kernel_timex txc;
int ret;
memset(&txc, 0, sizeof(txc));
ret = do_adjtimex(&txc);
if (ret < 0)
return ret;
txc.modes = ADJ_STATUS;
txc.status &= ~(STA_INS|STA_DEL);
return do_adjtimex(&txc);
}
static void stp_check_leap(void)
{
struct stp_stzi stzi;
struct stp_lsoib *lsoib = &stzi.lsoib;
struct __kernel_timex txc;
int64_t timediff;
int leapdiff, ret;
if (!stp_info.lu || !check_sync_clock()) {
/*
* Either a scheduled leap second was removed by the operator,
* or STP is out of sync. In both cases, clear the leap second
* kernel flags.
*/
if (stp_clear_leap() < 0)
pr_err("failed to clear leap second flags\n");
return;
}
if (chsc_stzi(stp_page, &stzi, sizeof(stzi))) {
pr_err("stzi failed\n");
return;
}
timediff = tod_to_ns(lsoib->nlsout - get_tod_clock()) / NSEC_PER_SEC;
leapdiff = lsoib->nlso - lsoib->also;
if (leapdiff != 1 && leapdiff != -1) {
pr_err("Cannot schedule %d leap seconds\n", leapdiff);
return;
}
if (timediff < 0) {
if (stp_clear_leap() < 0)
pr_err("failed to clear leap second flags\n");
} else if (timediff < 7200) {
memset(&txc, 0, sizeof(txc));
ret = do_adjtimex(&txc);
if (ret < 0)
return;
txc.modes = ADJ_STATUS;
if (leapdiff > 0)
txc.status |= STA_INS;
else
txc.status |= STA_DEL;
ret = do_adjtimex(&txc);
if (ret < 0)
pr_err("failed to set leap second flags\n");
/* arm Timer to clear leap second flags */
mod_timer(&stp_timer, jiffies + msecs_to_jiffies(14400 * MSEC_PER_SEC));
} else {
/* The day the leap second is scheduled for hasn't been reached. Retry
* in one hour.
*/
mod_timer(&stp_timer, jiffies + msecs_to_jiffies(3600 * MSEC_PER_SEC));
}
}
/*
* STP work. Check for the STP state and take over the clock
* synchronization if the STP clock source is usable.
*/
static void stp_work_fn(struct work_struct *work)
{
struct clock_sync_data stp_sync;
int rc;
/* prevent multiple execution. */
mutex_lock(&stp_mutex);
if (!stp_online) {
chsc_sstpc(stp_page, STP_OP_CTRL, 0x0000, NULL);
del_timer_sync(&stp_timer);
goto out_unlock;
}
rc = chsc_sstpc(stp_page, STP_OP_CTRL, 0xf0e0, NULL);
if (rc)
goto out_unlock;
rc = __store_stpinfo();
if (rc || stp_info.c == 0)
goto out_unlock;
/* Skip synchronization if the clock is already in sync. */
if (!check_sync_clock()) {
memset(&stp_sync, 0, sizeof(stp_sync));
cpus_read_lock();
atomic_set(&stp_sync.cpus, num_online_cpus() - 1);
stop_machine_cpuslocked(stp_sync_clock, &stp_sync, cpu_online_mask);
cpus_read_unlock();
}
if (!check_sync_clock())
/*
* There is a usable clock but the synchronization failed.
* Retry after a second.
*/
mod_timer(&stp_timer, jiffies + msecs_to_jiffies(MSEC_PER_SEC));
else if (stp_info.lu)
stp_check_leap();
out_unlock:
mutex_unlock(&stp_mutex);
}
/*
* STP subsys sysfs interface functions
*/
static const struct bus_type stp_subsys = {
.name = "stp",
.dev_name = "stp",
};
static ssize_t ctn_id_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid())
ret = sysfs_emit(buf, "%016lx\n",
*(unsigned long *)stp_info.ctnid);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(ctn_id);
static ssize_t ctn_type_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid())
ret = sysfs_emit(buf, "%i\n", stp_info.ctn);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(ctn_type);
static ssize_t dst_offset_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid() && (stp_info.vbits & 0x2000))
ret = sysfs_emit(buf, "%i\n", (int)(s16)stp_info.dsto);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(dst_offset);
static ssize_t leap_seconds_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid() && (stp_info.vbits & 0x8000))
ret = sysfs_emit(buf, "%i\n", (int)(s16)stp_info.leaps);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(leap_seconds);
static ssize_t leap_seconds_scheduled_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct stp_stzi stzi;
ssize_t ret;
mutex_lock(&stp_mutex);
if (!stpinfo_valid() || !(stp_info.vbits & 0x8000) || !stp_info.lu) {
mutex_unlock(&stp_mutex);
return -ENODATA;
}
ret = chsc_stzi(stp_page, &stzi, sizeof(stzi));
mutex_unlock(&stp_mutex);
if (ret < 0)
return ret;
if (!stzi.lsoib.p)
return sysfs_emit(buf, "0,0\n");
return sysfs_emit(buf, "%lu,%d\n",
tod_to_ns(stzi.lsoib.nlsout - TOD_UNIX_EPOCH) / NSEC_PER_SEC,
stzi.lsoib.nlso - stzi.lsoib.also);
}
static DEVICE_ATTR_RO(leap_seconds_scheduled);
static ssize_t stratum_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid())
ret = sysfs_emit(buf, "%i\n", (int)(s16)stp_info.stratum);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(stratum);
static ssize_t time_offset_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid() && (stp_info.vbits & 0x0800))
ret = sysfs_emit(buf, "%i\n", (int)stp_info.tto);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(time_offset);
static ssize_t time_zone_offset_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid() && (stp_info.vbits & 0x4000))
ret = sysfs_emit(buf, "%i\n", (int)(s16)stp_info.tzo);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(time_zone_offset);
static ssize_t timing_mode_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid())
ret = sysfs_emit(buf, "%i\n", stp_info.tmd);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(timing_mode);
static ssize_t timing_state_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
ssize_t ret = -ENODATA;
mutex_lock(&stp_mutex);
if (stpinfo_valid())
ret = sysfs_emit(buf, "%i\n", stp_info.tst);
mutex_unlock(&stp_mutex);
return ret;
}
static DEVICE_ATTR_RO(timing_state);
static ssize_t online_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
return sysfs_emit(buf, "%i\n", stp_online);
}
static ssize_t online_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
unsigned int value;
value = simple_strtoul(buf, NULL, 0);
if (value != 0 && value != 1)
return -EINVAL;
if (!test_bit(CLOCK_SYNC_HAS_STP, &clock_sync_flags))
return -EOPNOTSUPP;
mutex_lock(&stp_mutex);
stp_online = value;
if (stp_online)
set_bit(CLOCK_SYNC_STP, &clock_sync_flags);
else
clear_bit(CLOCK_SYNC_STP, &clock_sync_flags);
queue_work(time_sync_wq, &stp_work);
mutex_unlock(&stp_mutex);
return count;
}
/*
* Can't use DEVICE_ATTR because the attribute should be named
* stp/online but dev_attr_online already exists in this file ..
*/
static DEVICE_ATTR_RW(online);
static struct attribute *stp_dev_attrs[] = {
&dev_attr_ctn_id.attr,
&dev_attr_ctn_type.attr,
&dev_attr_dst_offset.attr,
&dev_attr_leap_seconds.attr,
&dev_attr_online.attr,
&dev_attr_leap_seconds_scheduled.attr,
&dev_attr_stratum.attr,
&dev_attr_time_offset.attr,
&dev_attr_time_zone_offset.attr,
&dev_attr_timing_mode.attr,
&dev_attr_timing_state.attr,
NULL
};
ATTRIBUTE_GROUPS(stp_dev);
static int __init stp_init_sysfs(void)
{
return subsys_system_register(&stp_subsys, stp_dev_groups);
}
device_initcall(stp_init_sysfs);