mirror of
https://github.com/wheremyfoodat/Panda3DS.git
synced 2025-07-15 03:37:09 +12:00
Update home menu branch (#759)
* Fix typo (#680) Co-authored-by: Noumi <139501014+noumidev@users.noreply.github.com> * More PTM stuff Co-Authored-By: Noumi <139501014+noumidev@users.noreply.github.com> * Make system language configurable * Fix building crypto++ for x64 target on Apple silicon MacOS * Attempt to switch to M1 runners again * Prevent selecting Vulkan renderer in Qt frontend and present a message * Libretro: Add system language option * Only enable audio by default on libretro for now * CMake: Bump version * Store configuration file in AppData root if not in working directory (#693) * Store configuration file in AppData root if not in working directory This fixes MacOS app bundles, as the emulator cannot write the config file into the app bundle. * Remove duplicate fs calls * I'm an idiot sandwich --------- Co-authored-by: wheremyfoodat <44909372+wheremyfoodat@users.noreply.github.com> * GL: Add usingGLES to driverInfo struct (#694) * Wayland fixes part 1 * Support GLES on desktop * Qt: Fix Wayland support Qt will only create a Wayland surface when show() is called on the main window and on the ScreenWidget. Thus, call the function before creating the GL context. Doesn't cause regressions on XWayland, untested in other platforms. Fixes #586 * No need to call screen->show() twice * Fix disabling Wayland & building on some distros (#700) * GLES: Properly stub out logic ops * Fix git versioning * Android_Build: Implement ccache (#703) * Android_Build: Implement ccache * Update Android_Build.yml * Update Android_Build.yml --------- Co-authored-by: wheremyfoodat <44909372+wheremyfoodat@users.noreply.github.com> * Removed dead Citra link in readme (#706) * CRO: Lighter icache flushes * Implement Luma icache SVCs * Add missing SVC logs * GPU: Add sw texture copies * Use vk::detail::DynamicLoader instead of vk::DynamicLoader (#710) * Use vk::detail::DynamicLoader instead of vk::DynamicLoader * Update renderer_vk.cpp * Vk: Fix typo * Vk: Lock CI runners to SDK version 1.3.301 temporarily * Vk: Fixing CI pt 2 * Vulkan: Fixing CI pt 3 * Vk: Fix typo * Temporarily give 80MB to all processes (#715) * Try to cross-compile Libretro core for arm64 (#717) * Try to cross-compile Libretro core for arm64 * Bonk * Update Hydra_Build.yml * [WIP] Libretro: Add audio support (#714) * Libretro: Add audio support * Adding audio interface part 1 * Audio device pt 2 * More audio device * More audio device * Morea uudi odevice * More audio device * More audio device * More audio device --------- Co-authored-by: wheremyfoodat <44909372+wheremyfoodat@users.noreply.github.com> * Libretro audio device: Fix frame count * Mark audio devices as final * Add toggle for libretro audio device (#719) * Very important work (#720) * Very important work * Most important fix * Add more HLE service calls for eshop (#721) * CI: Fix Vulkan SDK action (#723) * GPU registers: Fix writes to some registers ignoring the mask (#725) Co-authored-by: henry <23128103+atem2069@users.noreply.github.com> * OLED theme * OLED theme config fix (#736) Co-authored-by: smiRaphi <neogt404@gmail.com> * Adding Swedish translation * Fix Metal renderer compilation on iOS * [Core] Improve iOS compilation workflow * [Qt] Hook Swedish to UI * AppDataDocumentProvider: Typo (#740) * More iOS work * More iOS progress * More iOS work * AppDataDocumentProvider: Add missing ``COLUMN_FLAGS`` in the default document projectation (#741) Fixes unable to copy files from device to app's internal storage problem * More iOS work * ios: Simplify MTKView interface (still doesn't work though) * ios: Pass CAMetalLayer instead of void* to Obj-C++ bridging header * Fix bridging cast * FINALLY IOS GRAPHICS * ios: Remove printf spam * Metal: Reimplement some texture formats on iOS * metal: implement texture decoder * metal: check for format support * metal: implement texture swizzling * metal: remove unused texture functions * Shadergen types: Add Metal & MSL * Format * Undo submodule changes * Readme: Add Chonkystation 3 * Metal: Use std::unique_ptr for texture decode * AppDataDocumentProvider: Allow to remove documents (#744) * AppDataDocumentProvider: Allow to remove documents * Typo * Metal renderer fixes for iOS * iOS driver: Add doc comments * iOS: Add frontend & frontend build files (#746) * iOS: Add SwiftUI part to repo * Add iOS build script * Update SDL2 submodule * Fix iOS build script * CI: Update xcode tools for iOS * Update iOS_Build.yml * Update iOS build * Lower XCode version * A * Update project.pbxproj * Update iOS_Build.yml * Update iOS_Build.yml * Update build.sh * iOS: Fail on build error * iOS: Add file picker (#747) * iOS: Add file picker * Fix lock placement * Qt: Add runpog icon (#752) * Update discord-rpc submodule (#753) * Remove cryptoppwin submodule (#754) * Add optional texture hashing * Fix build on new Vk SDK (#757) Co-authored-by: Nadia Holmquist Pedersen <893884+nadiaholmquist@users.noreply.github.com> * CI: Use new Vulkan SDK --------- Co-authored-by: Noumi <139501014+noumidev@users.noreply.github.com> Co-authored-by: Thomas <thomas@thomasw.dev> Co-authored-by: Thomas <twvd@users.noreply.github.com> Co-authored-by: Daniel López Guimaraes <danielectra@outlook.com> Co-authored-by: Jonian Guveli <jonian@hardpixel.eu> Co-authored-by: Ishan09811 <156402647+Ishan09811@users.noreply.github.com> Co-authored-by: Auxy6858 <71662994+Auxy6858@users.noreply.github.com> Co-authored-by: Paris Oplopoios <parisoplop@gmail.com> Co-authored-by: henry <23128103+atem2069@users.noreply.github.com> Co-authored-by: smiRaphi <neogt404@gmail.com> Co-authored-by: smiRaphi <87574679+smiRaphi@users.noreply.github.com> Co-authored-by: Daniel Nylander <po@danielnylander.se> Co-authored-by: Samuliak <samuliak77@gmail.com> Co-authored-by: Albert <45282415+ggrtk@users.noreply.github.com> Co-authored-by: Nadia Holmquist Pedersen <893884+nadiaholmquist@users.noreply.github.com>
This commit is contained in:
parent
d5506f311f
commit
082b6216b3
302 changed files with 55525 additions and 747 deletions
427
third_party/cryptoppwin/include/cryptopp/arm_simd.h
vendored
Normal file
427
third_party/cryptoppwin/include/cryptopp/arm_simd.h
vendored
Normal file
|
@ -0,0 +1,427 @@
|
|||
// arm_simd.h - written and placed in public domain by Jeffrey Walton
|
||||
|
||||
/// \file arm_simd.h
|
||||
/// \brief Support functions for ARM and vector operations
|
||||
|
||||
#ifndef CRYPTOPP_ARM_SIMD_H
|
||||
#define CRYPTOPP_ARM_SIMD_H
|
||||
|
||||
#include "config.h"
|
||||
|
||||
#if (CRYPTOPP_ARM_NEON_HEADER)
|
||||
# include <stdint.h>
|
||||
# include <arm_neon.h>
|
||||
#endif
|
||||
|
||||
#if (CRYPTOPP_ARM_ACLE_HEADER)
|
||||
# include <stdint.h>
|
||||
# include <arm_acle.h>
|
||||
#endif
|
||||
|
||||
#if (CRYPTOPP_ARM_CRC32_AVAILABLE) || defined(CRYPTOPP_DOXYGEN_PROCESSING)
|
||||
/// \name CRC32 checksum
|
||||
//@{
|
||||
|
||||
/// \brief CRC32 checksum
|
||||
/// \param crc the starting crc value
|
||||
/// \param val the value to checksum
|
||||
/// \return CRC32 value
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint32_t CRC32B (uint32_t crc, uint8_t val)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return __crc32b(crc, val);
|
||||
#else
|
||||
__asm__ ("crc32b %w0, %w0, %w1 \n\t"
|
||||
:"+r" (crc) : "r" (val) );
|
||||
return crc;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief CRC32 checksum
|
||||
/// \param crc the starting crc value
|
||||
/// \param val the value to checksum
|
||||
/// \return CRC32 value
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint32_t CRC32W (uint32_t crc, uint32_t val)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return __crc32w(crc, val);
|
||||
#else
|
||||
__asm__ ("crc32w %w0, %w0, %w1 \n\t"
|
||||
:"+r" (crc) : "r" (val) );
|
||||
return crc;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief CRC32 checksum
|
||||
/// \param crc the starting crc value
|
||||
/// \param vals the values to checksum
|
||||
/// \return CRC32 value
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint32_t CRC32Wx4 (uint32_t crc, const uint32_t vals[4])
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return __crc32w(__crc32w(__crc32w(__crc32w(
|
||||
crc, vals[0]), vals[1]), vals[2]), vals[3]);
|
||||
#else
|
||||
__asm__ ("crc32w %w0, %w0, %w1 \n\t"
|
||||
"crc32w %w0, %w0, %w2 \n\t"
|
||||
"crc32w %w0, %w0, %w3 \n\t"
|
||||
"crc32w %w0, %w0, %w4 \n\t"
|
||||
:"+r" (crc) : "r" (vals[0]), "r" (vals[1]),
|
||||
"r" (vals[2]), "r" (vals[3]));
|
||||
return crc;
|
||||
#endif
|
||||
}
|
||||
|
||||
//@}
|
||||
/// \name CRC32-C checksum
|
||||
|
||||
/// \brief CRC32-C checksum
|
||||
/// \param crc the starting crc value
|
||||
/// \param val the value to checksum
|
||||
/// \return CRC32-C value
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint32_t CRC32CB (uint32_t crc, uint8_t val)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return __crc32cb(crc, val);
|
||||
#else
|
||||
__asm__ ("crc32cb %w0, %w0, %w1 \n\t"
|
||||
:"+r" (crc) : "r" (val) );
|
||||
return crc;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief CRC32-C checksum
|
||||
/// \param crc the starting crc value
|
||||
/// \param val the value to checksum
|
||||
/// \return CRC32-C value
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint32_t CRC32CW (uint32_t crc, uint32_t val)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return __crc32cw(crc, val);
|
||||
#else
|
||||
__asm__ ("crc32cw %w0, %w0, %w1 \n\t"
|
||||
:"+r" (crc) : "r" (val) );
|
||||
return crc;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief CRC32-C checksum
|
||||
/// \param crc the starting crc value
|
||||
/// \param vals the values to checksum
|
||||
/// \return CRC32-C value
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint32_t CRC32CWx4 (uint32_t crc, const uint32_t vals[4])
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return __crc32cw(__crc32cw(__crc32cw(__crc32cw(
|
||||
crc, vals[0]), vals[1]), vals[2]), vals[3]);
|
||||
#else
|
||||
__asm__ ("crc32cw %w0, %w0, %w1 \n\t"
|
||||
"crc32cw %w0, %w0, %w2 \n\t"
|
||||
"crc32cw %w0, %w0, %w3 \n\t"
|
||||
"crc32cw %w0, %w0, %w4 \n\t"
|
||||
:"+r" (crc) : "r" (vals[0]), "r" (vals[1]),
|
||||
"r" (vals[2]), "r" (vals[3]));
|
||||
return crc;
|
||||
#endif
|
||||
}
|
||||
//@}
|
||||
#endif // CRYPTOPP_ARM_CRC32_AVAILABLE
|
||||
|
||||
#if (CRYPTOPP_ARM_PMULL_AVAILABLE) || defined(CRYPTOPP_DOXYGEN_PROCESSING)
|
||||
/// \name Polynomial multiplication
|
||||
//@{
|
||||
|
||||
/// \brief Polynomial multiplication
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector product
|
||||
/// \details PMULL_00() performs polynomial multiplication and presents
|
||||
/// the result like Intel's <tt>c = _mm_clmulepi64_si128(a, b, 0x00)</tt>.
|
||||
/// The <tt>0x00</tt> indicates the low 64-bits of <tt>a</tt> and <tt>b</tt>
|
||||
/// are multiplied.
|
||||
/// \note An Intel XMM register is composed of 128-bits. The leftmost bit
|
||||
/// is MSB and numbered 127, while the rightmost bit is LSB and
|
||||
/// numbered 0.
|
||||
/// \since Crypto++ 8.0
|
||||
inline uint64x2_t PMULL_00(const uint64x2_t a, const uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
const __n64 x = { vgetq_lane_u64(a, 0) };
|
||||
const __n64 y = { vgetq_lane_u64(b, 0) };
|
||||
return vmull_p64(x, y);
|
||||
#elif defined(__GNUC__)
|
||||
uint64x2_t r;
|
||||
__asm__ ("pmull %0.1q, %1.1d, %2.1d \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b) );
|
||||
return r;
|
||||
#else
|
||||
return (uint64x2_t)(vmull_p64(
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(a),0),
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(b),0)));
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief Polynomial multiplication
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector product
|
||||
/// \details PMULL_01 performs() polynomial multiplication and presents
|
||||
/// the result like Intel's <tt>c = _mm_clmulepi64_si128(a, b, 0x01)</tt>.
|
||||
/// The <tt>0x01</tt> indicates the low 64-bits of <tt>a</tt> and high
|
||||
/// 64-bits of <tt>b</tt> are multiplied.
|
||||
/// \note An Intel XMM register is composed of 128-bits. The leftmost bit
|
||||
/// is MSB and numbered 127, while the rightmost bit is LSB and
|
||||
/// numbered 0.
|
||||
/// \since Crypto++ 8.0
|
||||
inline uint64x2_t PMULL_01(const uint64x2_t a, const uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
const __n64 x = { vgetq_lane_u64(a, 0) };
|
||||
const __n64 y = { vgetq_lane_u64(b, 1) };
|
||||
return vmull_p64(x, y);
|
||||
#elif defined(__GNUC__)
|
||||
uint64x2_t r;
|
||||
__asm__ ("pmull %0.1q, %1.1d, %2.1d \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (vget_high_u64(b)) );
|
||||
return r;
|
||||
#else
|
||||
return (uint64x2_t)(vmull_p64(
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(a),0),
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(b),1)));
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief Polynomial multiplication
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector product
|
||||
/// \details PMULL_10() performs polynomial multiplication and presents
|
||||
/// the result like Intel's <tt>c = _mm_clmulepi64_si128(a, b, 0x10)</tt>.
|
||||
/// The <tt>0x10</tt> indicates the high 64-bits of <tt>a</tt> and low
|
||||
/// 64-bits of <tt>b</tt> are multiplied.
|
||||
/// \note An Intel XMM register is composed of 128-bits. The leftmost bit
|
||||
/// is MSB and numbered 127, while the rightmost bit is LSB and
|
||||
/// numbered 0.
|
||||
/// \since Crypto++ 8.0
|
||||
inline uint64x2_t PMULL_10(const uint64x2_t a, const uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
const __n64 x = { vgetq_lane_u64(a, 1) };
|
||||
const __n64 y = { vgetq_lane_u64(b, 0) };
|
||||
return vmull_p64(x, y);
|
||||
#elif defined(__GNUC__)
|
||||
uint64x2_t r;
|
||||
__asm__ ("pmull %0.1q, %1.1d, %2.1d \n\t"
|
||||
:"=w" (r) : "w" (vget_high_u64(a)), "w" (b) );
|
||||
return r;
|
||||
#else
|
||||
return (uint64x2_t)(vmull_p64(
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(a),1),
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(b),0)));
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief Polynomial multiplication
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector product
|
||||
/// \details PMULL_11() performs polynomial multiplication and presents
|
||||
/// the result like Intel's <tt>c = _mm_clmulepi64_si128(a, b, 0x11)</tt>.
|
||||
/// The <tt>0x11</tt> indicates the high 64-bits of <tt>a</tt> and <tt>b</tt>
|
||||
/// are multiplied.
|
||||
/// \note An Intel XMM register is composed of 128-bits. The leftmost bit
|
||||
/// is MSB and numbered 127, while the rightmost bit is LSB and
|
||||
/// numbered 0.
|
||||
/// \since Crypto++ 8.0
|
||||
inline uint64x2_t PMULL_11(const uint64x2_t a, const uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
const __n64 x = { vgetq_lane_u64(a, 1) };
|
||||
const __n64 y = { vgetq_lane_u64(b, 1) };
|
||||
return vmull_p64(x, y);
|
||||
#elif defined(__GNUC__)
|
||||
uint64x2_t r;
|
||||
__asm__ ("pmull2 %0.1q, %1.2d, %2.2d \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b) );
|
||||
return r;
|
||||
#else
|
||||
return (uint64x2_t)(vmull_p64(
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(a),1),
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(b),1)));
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief Polynomial multiplication
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector product
|
||||
/// \details PMULL() performs vmull_p64(). PMULL is provided as
|
||||
/// GCC inline assembly due to Clang and lack of support for the intrinsic.
|
||||
/// \since Crypto++ 8.0
|
||||
inline uint64x2_t PMULL(const uint64x2_t a, const uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
const __n64 x = { vgetq_lane_u64(a, 0) };
|
||||
const __n64 y = { vgetq_lane_u64(b, 0) };
|
||||
return vmull_p64(x, y);
|
||||
#elif defined(__GNUC__)
|
||||
uint64x2_t r;
|
||||
__asm__ ("pmull %0.1q, %1.1d, %2.1d \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b) );
|
||||
return r;
|
||||
#else
|
||||
return (uint64x2_t)(vmull_p64(
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(a),0),
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(b),0)));
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief Polynomial multiplication
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector product
|
||||
/// \details PMULL_HIGH() performs vmull_high_p64(). PMULL_HIGH is provided as
|
||||
/// GCC inline assembly due to Clang and lack of support for the intrinsic.
|
||||
/// \since Crypto++ 8.0
|
||||
inline uint64x2_t PMULL_HIGH(const uint64x2_t a, const uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
const __n64 x = { vgetq_lane_u64(a, 1) };
|
||||
const __n64 y = { vgetq_lane_u64(b, 1) };
|
||||
return vmull_p64(x, y);
|
||||
#elif defined(__GNUC__)
|
||||
uint64x2_t r;
|
||||
__asm__ ("pmull2 %0.1q, %1.2d, %2.2d \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b) );
|
||||
return r;
|
||||
#else
|
||||
return (uint64x2_t)(vmull_p64(
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(a),1),
|
||||
vgetq_lane_u64(vreinterpretq_u64_u8(b),1))));
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief Vector extraction
|
||||
/// \tparam C the byte count
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return vector
|
||||
/// \details VEXT_U8() extracts the first <tt>C</tt> bytes of vector
|
||||
/// <tt>a</tt> and the remaining bytes in <tt>b</tt>. VEXT_U8 is provided
|
||||
/// as GCC inline assembly due to Clang and lack of support for the intrinsic.
|
||||
/// \since Crypto++ 8.0
|
||||
template <unsigned int C>
|
||||
inline uint64x2_t VEXT_U8(uint64x2_t a, uint64x2_t b)
|
||||
{
|
||||
// https://github.com/weidai11/cryptopp/issues/366
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return vreinterpretq_u64_u8(vextq_u8(
|
||||
vreinterpretq_u8_u64(a), vreinterpretq_u8_u64(b), C));
|
||||
#else
|
||||
uint64x2_t r;
|
||||
__asm__ ("ext %0.16b, %1.16b, %2.16b, %3 \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b), "I" (C) );
|
||||
return r;
|
||||
#endif
|
||||
}
|
||||
|
||||
//@}
|
||||
#endif // CRYPTOPP_ARM_PMULL_AVAILABLE
|
||||
|
||||
#if CRYPTOPP_ARM_SHA3_AVAILABLE || defined(CRYPTOPP_DOXYGEN_PROCESSING)
|
||||
/// \name ARMv8.2 operations
|
||||
//@{
|
||||
|
||||
/// \brief Three-way XOR
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \param c the third value
|
||||
/// \return three-way exclusive OR of the values
|
||||
/// \details VEOR3() performs veor3q_u64(). VEOR3 is provided as GCC inline assembly due
|
||||
/// to Clang and lack of support for the intrinsic.
|
||||
/// \details VEOR3 requires ARMv8.2.
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint64x2_t VEOR3(uint64x2_t a, uint64x2_t b, uint64x2_t c)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return veor3q_u64(a, b, c);
|
||||
#else
|
||||
uint64x2_t r;
|
||||
__asm__ ("eor3 %0.16b, %1.16b, %2.16b, %3.16b \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b), "w" (c));
|
||||
return r;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief XOR and rotate
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \param c the third value
|
||||
/// \return two-way exclusive OR of the values, then rotated by c
|
||||
/// \details VXARQ() performs vxarq_u64(). VXARQ is provided as GCC inline assembly due
|
||||
/// to Clang and lack of support for the intrinsic.
|
||||
/// \details VXARQ requires ARMv8.2.
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint64x2_t VXAR(uint64x2_t a, uint64x2_t b, const int c)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return vxarq_u64(a, b, c);
|
||||
#else
|
||||
uint64x2_t r;
|
||||
__asm__ ("xar %0.2d, %1.2d, %2.2d, %3 \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b), "I" (c));
|
||||
return r;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief XOR and rotate
|
||||
/// \tparam C the rotate amount
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return two-way exclusive OR of the values, then rotated by C
|
||||
/// \details VXARQ() performs vxarq_u64(). VXARQ is provided as GCC inline assembly due
|
||||
/// to Clang and lack of support for the intrinsic.
|
||||
/// \details VXARQ requires ARMv8.2.
|
||||
/// \since Crypto++ 8.6
|
||||
template <unsigned int C>
|
||||
inline uint64x2_t VXAR(uint64x2_t a, uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return vxarq_u64(a, b, C);
|
||||
#else
|
||||
uint64x2_t r;
|
||||
__asm__ ("xar %0.2d, %1.2d, %2.2d, %3 \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b), "I" (C));
|
||||
return r;
|
||||
#endif
|
||||
}
|
||||
|
||||
/// \brief XOR and rotate
|
||||
/// \param a the first value
|
||||
/// \param b the second value
|
||||
/// \return two-way exclusive OR of the values, then rotated 1-bit
|
||||
/// \details VRAX1() performs vrax1q_u64(). VRAX1 is provided as GCC inline assembly due
|
||||
/// to Clang and lack of support for the intrinsic.
|
||||
/// \details VRAX1 requires ARMv8.2.
|
||||
/// \since Crypto++ 8.6
|
||||
inline uint64x2_t VRAX1(uint64x2_t a, uint64x2_t b)
|
||||
{
|
||||
#if defined(CRYPTOPP_MSC_VERSION)
|
||||
return vrax1q_u64(a, b);
|
||||
#else
|
||||
uint64x2_t r;
|
||||
__asm__ ("rax1 %0.2d, %1.2d, %2.2d \n\t"
|
||||
:"=w" (r) : "w" (a), "w" (b));
|
||||
return r;
|
||||
#endif
|
||||
}
|
||||
//@}
|
||||
#endif // CRYPTOPP_ARM_SHA3_AVAILABLE
|
||||
|
||||
#endif // CRYPTOPP_ARM_SIMD_H
|
Loading…
Add table
Add a link
Reference in a new issue